Computer Science Colloquium: Do Machines See What I See?

In under 60 years, computer programming has gone from excruciately producing punch cards to instantly creating algorithms that can recognize people in photos. What lies ahead? At Cornell’s weekly computer science colloquium Philip Isola, a postdoc in the electrical engineering and computer sciences department at UC Berkeley, attempted to shed light on that very topic.

“My goal is to make systems that can understand the visual world and see the same kind of richness and structure that we see. So, in the talk I was trying to convey one approach to that: learning without having expert knowledge of what you are trying to imitate,” Isola said.

“Computer vision is going through a revolution. Five years ago, people were working on classic computer vision which is based more on principles that psychologists had developed about how human vision works. How if you look at an image you know where one aspect ends and another one starts. In the last five years, the question has become how do the classic theories fit with modern machine learning methods and how relevant the pre-deep learning era is to the new era of machine learning,” said Prof. Noah Snavely, computer science, and the event’s Cornell host.

One of these more conventional techniques — supervised object recognition — is in widespread use. A learner, or computer in this case, is provided with an exhaustive array of images that share the same label. By analyzing the patterns that cause images to be associated with such a label, a computer is able to associate a similar unlabeled image with the correct label.

The issue with such a technique and indeed all techniques that require hand-curated training data and hand-designed tasks is that they are in stark contrast to our natural way of learning, by way of raw sensory experiences and by generating our own objectives.

“There’s this element of curiosity about people that we actively learn things without having to be force-fed knowledge, so those kinds of abilities to generalize from a small number of examples or to curiously explore the world are missing from computer vision and people are trying to get closer to that,” Snavely said.

Picture a zebra on a green field. How do you distinguish the zebra from all the grass around it? If you have never seen a zebra before, then the contrast between colors should be sufficient. Isola aims to build the same intuition into machines. He does so using a measure known as Pointwise Mutual Information, which estimates how often things co-occur if they are actually independent. In this case, a machine could contrast the high PMI association between white and black with the low PMI association between black and green to infer that the picture contains two distinct structures.

“We assign objects to the same pixel segment if they have high PMI [and] to different object segments if they have low PMI. The goal is to design an affinity measure that is adaptive to each image it is applied to and that’s what using PMI enables,” Isola said.

To complement this ability to process images as we do, machines need to understand and reason about the structures they ‘see’. According to Isola, one way forward is to use a method known as representation learning. An algorithm, the ‘autoencoder’, interprets the image and encodes vital information into an image code vector. These can then be used by algorithms to reconstruct the image. The goal is to have machines recognize structures contained within a larger image, for example, recognizing fish in an image of a coral reef.

Despite the fair accuracy of such algorithms, key obstacles remain. Researchers are still trying to pinpoint the amount of compression needed; too much and crucial information is lost but too little and unnecessary features are retained.

“My ultimate goal is to make systems that really have the kind of abilities that human babies have and I think this requires understanding the type of input that biological systems get, the structure of the environment and also the structure of our brains,” Isola said. “I think this requires agents that are adaptive, that explore the world and seek out doing new constructive tasks on their own. In the end, I hope this helps us understand what it means to learn to see, that this will provide a more complete understanding of intelligence.”