As much as we have come far in the field of AI, there is this sure gap between how generic humans capabilities are and how focused the AI is, regardless of how better they are at those specific areas of expertise. Following is my perception of what could be missing and how would that gap be filled to achieve Artificial Generic Intelligence. All the tools of AI: be it probabilistic graphical models, regressions or symbolic solvers, have one thing in common and that thing is bad input. To understand this let’s take an analogy, what if I write it article and then zip it before sharing? Everyone who would download it will know to use decompression before viewing the actual text, which is a common knowledge. Nobody would read the zip file as is, by opening it using text editor and trying to make sense of what is shown. What this analogy does is explain why computers are having hard time understanding human world and it environment. The input we give to computers are texts, images, other medias, so on and so forth. What they really lack is the context of that information. All that data we feed in is compressed. And I do not mean JPEG compression, but rather the language itself is compressed. We as humans have done a great job of abstracting the concepts in words and when we transmit the text through words, we tend to compress the abstract concepts in those words that the recipient would use imagination to decompress and then understand it. Even our brain remembers understanding it not the exact details. What is happening in the field of AI is like training a child to learn English language using Morse codes.
What can be suggested as a better approach is to use real world environment to train the AI machines to understand the imagery, audio and textual inputs. Instead of using 2d images, it would be better to give 3d ones to help computers understand the the concept of near and far through the depth which is present in 3d images. Even better mode of giving input is the use 3d movies with subtitles to train the neutral networks to understand objects in 3d images and subtitles to associate words with audio and imagery for developing concepts. Much like a real child would learn. It can be simple concepts, but once the machine starts abstracting objects it views as instance with history and belonging to a particular class of objects through the existing tools of object recognition, it can enrich the machine’s capability to interact with it’s environment. None of these would require any new algorithms or techniques, in fact, the existing tools are more than enough. It is the encoding of input signals that need to be made more understandable by the machines. Add generic programming to the mix and we can have self evolving machines that that learn and adapt to its environment.
Although, this is a small write up for me to clarify my own thought, I intend to explore this further and get responses to further refine this approach. All inputs are welcome. And since this is my first time blogging or writing in a public space, I seek forgiveness for my ineptitude in my ability to express appropriately and concisely.