Seeing Like a Baby

You are here

Infants soon learn to make sense of the complex world around them: Their understanding far surpasses any of the current attempts to design intelligent computerized systems. How do such young infants arrive at this understanding?


Answering this question has been a challenge for cognitive psychology researchers and computer scientists alike. On the one hand, babies cannot explain how they first learn to comprehend the world around them, and on the other, computers, for all their sophistication, need human help with labeling and sorting objects to make learning possible.  Many scientists believe that for computers to “see” the world as we do, they must first learn to classify and identify objects in much the same way that a baby does.

Mover event detected in the red cell: Motion (left) flows into the cell, (middle) stays briefly in the cell, and (right) leaves the cell, changing its appearance. Warmer colors indicate faster motion
Prof. Shimon Ullman and research students Daniel Harari and Nimrod Dorfman of the Institute’s Computer Science and Applied Mathematics Department set out to explore the learning strategies of the young brain by designing computer models based on the way that babies observe their environment. The team first focused on hands: Within a few months, babies can distinguish a randomly viewed hand from other objects or body parts, despite the fact that hands are actually very complex – they can take on quite a range of visual shapes and move in different ways. Ullman and his team created a computer algorithm for learning hand recognition. The aim was to see if the computer could learn independently to pick hands out of video footage – even when those hands took on different shapes or were seen from different angles. That is, nothing in the program said to the computer: “Here is a hand.” Instead, it had to discover from repeated viewing of the video clips what constitutes a hand.

The algorithm began with some basic insight into the stimuli that attract the attention of young infants. The scientists knew, for instance, that babies track movement from the moment they open their eyes, and that motion can be a visual cue for picking objects out of the scenery. The researchers then asked if certain types of movement might be more instructive to the infant mind than others, and whether these could provide enough information to form a visual concept. For instance, a hand makes a change in the baby’s visual field, generally by manipulating an object. Eventually the child might extrapolate, learning to connect the idea of causing-object-to-move with that of a hand. The team named such actions “mover events.”
Predicted direction of gaze: results of algorithm (red arrows) and two human observers (green arrows). Faces, top to bottom, belong to Prof. Shimon Ullman, Daniel Harari and Nimrod Dorfman
After designing and installing an algorithm for learning through detecting mover events, the team showed the computer a series of videos. In some, hands were seen manipulating objects. For comparison, others were filmed as if from the point of view of a baby watching its own hand move, or else movement that did not involve a mover event, or typical visual patterns such as videos of common body parts or people. These trials clearly showed that mover events – watching another’s hand move objects – were not only sufficient for the computer to learn to identify hands, they far outshone any of the other methods, including that of “self-movement.”

But the model was not yet complete. With mover events, alone, the computer could learn to detect hands but still had trouble with different poses. Again, the researchers went back to insights into early perception: Infants can not only detect motion, they can track it; they are also very interested in faces. Adding in mechanisms for observing the movements of already detected hands to learn new poses, and for using the face and body as reference points to locate hands, improved the learning process.

In the next part of their study, the researchers looked at another, related concept that babies learn early on but computers have trouble grasping – knowing where another person is looking. Here, the scientists took the insights they had already gained – mover events are crucial and babies are interested in faces – and added a third: People look in the direction of their hands when they first grasp an object. On the basis of these elements, the researchers created another algorithm to test the idea that babies first learn to identify the direction of a gaze by connecting faces to mover events. Indeed, the computer learned to follow the direction of even a subtle glance – for instance, the eyes alone turning toward an object – nearly as well as an adult human.

The researchers believe these models show that babies are born with certain pre-wired patterns – such as a preference for certain types of movement or visual cues. They refer to this type of understanding as proto-concepts – the building blocks with which one can begin to build an understanding of the world. Thus the basic proto-concept of a mover event can evolve into the concept of hands and of direction of gaze, and eventually give rise to even more complex ideas such as distance and depth.

This study is part of a larger endeavor known as the Digital Baby Project. The idea, says Harari, is to create models for very early cognitive processes. “On the one hand,” says Dorfman, “such theories could shed light on our understanding of human cognitive development. On the other hand, they should advance our insights into computer vision (and possibly machine learning and robotics).” These theories can then be tested in experiments with infants, as well as in computer systems.
Prof. Ullman is the incumbent of the Ruth and Samy Cohn Professorial Chair of Computer Sciences.