How the ambiguous nature of reality makes it difficult to model in a database application
I was reading the book “Data and Reality” (highly recommended) about how to model reality in a database application. In its first chapter, the book talks about the difficulty in doing so because of the inherent ambiguity of reality. However, you don’t want your database to be ambiguous – it has to be structured in a way that lets you efficiently categorize, separate, search, perform operations, etc.
Or does it?
How reality is ambiguous (from a data modelling perspective)
Reality is ultimately not a set of categories, states, etc. Humans interpret and model reality that way, using computers, language and other constructs, in order to interpret it in the most efficient way for our purposes.
At a very low level (enough for most of our purposes, if we are talking about practical everyday or business applications), the best model to approximate reality is simply physics. From the basic laws of physics, we can derive everything else (assuming higher level than quantum physics for now). We could, in theory, model reality as a set of atoms and molecules with physical properties that interact with each other. This makes reality incredibly fluid.
Any time you want to simplify that fluid reality into a database model, you loose that ambiguity – and with it, the true mapping of your database to reality (the more you simplify it, the more rough your approximation of reality will be, and this means real problems in your business applications).
As a data modeler, you trade flexibility for CPU
In reality, a collection of molecules which together create a metal rod is just that – a collection of molecules organized in a specific fashion with endless amounts of physical properties based on that assembly. But your database, depending on what it is designed to do, can categorize that reality as either a metal rod, a pipe, a baton or something else. And it may look at different properties of this thing depending on what the application is designed to do. For example in some cases the weight will be important, in other cases the ability to lead electrical current. This is the fundamental problem of modelling reality in a database. Which aspects do you model? Which aspects do you ignore? How do you categorize the thing you are modelling?
So, as a data modeler, you are making choices. These choices will limit that incredibly flexible “collection of atoms” thing into a more narrow field of use. Modelling it in a certain way, making those choices, necessarily limits what you see that thing as, and that reduces your application’s flexibility. If you call it “a collections of atoms and molecules of X type grouped in Y way”, you have incredible flexibility. Only your imagination and physical properties limits what you can use that thing as. If you categorize it as a rod, you disregard all those physical properties, and you see it as a rod, and a rod only.
By categorizing that thing upfront (based on your database design which assumes certain things), you are in fact making a trade-off: You give up flexibility for clarity and speed. The less defined your categorization is, the more processing power and intelligence you require to find, group, understand and make decisions based on the things you are modelling. Basically, by creating a category, you are making a decision upfront that will make your processing power more scalable (because you have made one decision that you can easily query for many times in the future – if you created the category “rod” and put objects into this category, it will be very easy to find all rods in your database) but also more rigid (simply because you HAVE made a decision upfront, and decisions are often irreversible).
Whereas if you don’t categorize, but instead put objects into the database based on their properties alone (for example a lower level of categorization of a rod would be “thing created of metal with length x and width y”), you have more flexibility in the future (you can define those things as either rods or something else), but the approach is less scalable. Each time you want to find all things in the system that can be used as a rod, you have to first define which properties “a rod” as per your definition has, and then compare each object in your database to that definition.
Can we model and search based on sensory information rather than properties or categories?
We can follow the above logic further. If we go even lower level, you may not need to even provide any properties yourself. Perhaps the properties can be derived from more basic information – such as sensory information. Perhaps you can feed into your database images of rods and other sensory information such as temperatures of rods, roughness of rods, sounds rods make when they hit something, and so on. From that sensory information, it should be possible to derive any of your possible requirements.
Perhaps, we can even take your use case to a lower level. Perhaps you won’t search for rods by providing search properties, but simply another piece of sensory information (image, feel, etc.). Thus, without any categorization inherent in your database, you could let the system find all the rods similar to the rod you scanned in. You could then improve the search by giving the system feedback on the rods it presented.
Of course, this doesn’t meant that the items in the system aren’t categorized. They may be – but the categorization is internal in the system, not externally visible to you.
But doesn’t that mean that we haven’t actually solved the problem of how to categorize reality better? Because what we have done is to move the responsibility of categorizing from ourselves to the system itself. The categorization, simplification, and “upfront decision making” of how to model reality is still going on behind the scenes, and we really haven’t gained any flexibility at all (I will go as far as to say that such a system can impossibly be flexible in the way we want it – and the way we want it is that it models reality without making any upfront decisions about which category something belongs to – because such decisions is actually the opposite of creativity, and creativity can be described as the ability to not pre-categorize objects, thus finding new ways to use existing tools).
What we are searching for is a way to save and retrieve information quickly without sacrificing flexibility. How can we do that? It seems like as soon as we start categorizing things, we gain speed but loose flexibility.
What if we copy and simulate reality instead of “represent” reality?
Well, actually there is another way. A different way if storing information which doesn’t even use the notion of categorization. We can model the thing we are saving physically and directly, by simulating it in another system. Basically, we can copy the physical world physically into another physical object, the “other” physical object having some properties which the first physical object lacks, namely 1) ability to change and adapt quickly and 2) ability to iterate quickly (actually follows from the first point). Bear with me for now, and I will describe what I mean below.
The “simulator” gets sensory input about the “real” object, and creates a replica of it inside itself. There are no categories, properties, data points. Just a simulation. This “simulator” can store many such replicas inside itself, each corresponding to real sensory information that it gets from its “reality sensors”.
It can also invent imaginary simulations, because it has the ability to copy, manipulate, and play with its own simulations inside itself. Basically, it is a simplified, more dynamic simulation of all the sensory input it has ever received from the real world, plus it’s own ability to manipulate those simulations (which corresponds to human imagination). When manipulating (or let’s call it “playing with”) it’s simulated physical world, our “reality simulator” may see recurring patterns. It may then save those recurring patterns as a separate “pattern object”. In the future, when saving new objects that are similar to the pattern object, it can simply save how those objects differ from the pattern objects, thereby saving space and CPU (because it can save how the pattern object usually interacts with other objects, and derive from that how the actual object it is simulating should behave – including its differences from the pattern object).
So we are going form a “representing” paradigm to a “copying and simulating” paradigm. Depending on the physical implementation of this simulator, it may have both the efficiency of the “modelling” paradigm, and the flexibility of the real world.
Interestingly, such a “reality simulator” exists. Evolution has chosen it as the best way to model reality. About every decision making organic entity on earth (including humans) uses it every day to copy and store a version of reality based on sensory input inside their nervous system. Humans are the most advanced species doing this, using our brains. We use this reality simulator to copy a version of reality inside our heads, predict the future based on simulations of this copy of reality inside our heads, and to query and retrieve information (report on) reality to make (business or life) decisions.
How would a reality simulator be implemented?
A reality simulator (and our brains) doesn’t actually create an actual replica of actual objects. Instead, physical objects are mapped to sensory information, and that sensory information represents actual physical objects, and is stored. This is equivalent as simulating the physical object, since reality is mapped to specific sensory input.
This has some implications to our theory. It means that instead of creating a rigid model of reality which consists of set categories, we instead present reality to our computer as a set of sensory signals. Instead of representing “object with properties x or y” we present “vision sensory signal a, sound sensory signal b, touch sensory signal c” and so on. This combination if sensory signals can then represent anything that exists in reality, and the computer can replicate, duplicate, iterate on, combine etc those representations in its internal world. In that way, the computer can simulate all of reality in its internal systems.
For example, when the computer senses a rod, it stores the rod (the sensory signals that represent the rod) inside its simulated reality. After sensing multiple rods a number of times, the computer may start to recognize that these objects are very similar to each other, and seem to be used for the same purposes. So it may create a “generic rod” pattern in order to optimize CPU and space, and predict reality based on how a generic rod usually interacts with other objects.
How to take it to computers
As long as the human created digital computer world does not move from a “representing” paradigm to a “simulation” paradigm, we will not have strong AI, regardless of how powerful CPU gets. Because the representing paradigm presents inherent limitations, expressed somewhat above, that cannot be solved for the trade-offs discussed above (and probably other reasons which I haven’t discussed).
However, if we embrace the simulation paradigm, we can create the same strong AI found in our brains, but improve on the inherent weaknesses of a living organism. As organic implementations, our brains consists if living tissue that deteriorates fast, and has limitations in scale, power, and size. By representing the same simulation paradigm but implemented in a non-organic physical system, we can overcome these weaknesses and create a brain unconstrained.