Psychology, physics, and geometry used to make robots more intelligent

Computer scientists have used psychology, physics, and geometry to make intelligent, advanced home robots a reality

Robots are all around us, from drones filming videos in the sky to serving food in restaurants and diffusing bombs in emergencies.

Slowly but surely, robots are improving the quality of human life by augmenting our abilities, freeing up time, and enhancing our personal safety and wellbeing.

While existing robots are becoming more proficient with simple tasks, handling more complex requests will require more development in both mobility and intelligence.

Columbia Engineering and Toyota Research Institute computer scientists are delving into psychology, physics, and geometry to create algorithms so that robots can adapt to their surroundings and learn how to do things independently.

This work is vital to enabling robots to address new challenges stemming from an ageing society and provide better support, especially for seniors and people with disabilities.

Object permanence

A longstanding challenge in computer vision is object permanence, a well-known concept in psychology that involves understanding that the existence of an object is separate from whether it is visible at any moment.

It is fundamental for robots to understand our ever-changing, dynamic world, but most applications in computer vision ignore occlusions entirely and tend to lose track of objects that become temporarily hidden from view.

Carl Vondrick, associate professor of computer science and a recipient of the Toyota Research Institute Young Faculty award, said: “Some of the hardest problems for artificial intelligence are the easiest for humans.”

An example is how toddlers play peek-a-boo and learn that their parent does not disappear when they cover their faces. Computers, on the other hand, lose track once something is blocked or hidden from view and cannot process where the object went or recall its location.

To tackle this issue, Vondrick and his team taught neural networks the basic physical concepts that come naturally to adults and children.

Similar to how a child learns physics by watching events unfold in their surroundings, the team created a machine that watches many videos to learn physical concepts.

The key idea is to train the computer to anticipate what the scene would look like in the future. By training the machine to solve this task across many examples, the machine automatically creates an internal model of how objects physically move in typical environments.

For example, when a soda can disappears from sight inside the refrigerator, the machine learns to remember it still exists because it appears again once the refrigerator door reopens.

Basile Van Hoorick, a third-year PhD student who worked with Vondrick to develop the framework that can understand occlusions as they occur, said: “I have worked with images and videos before, but getting neural networks to work well with 3D information is surprisingly tricky.”

Unlike humans, an understanding of the three-dimensionality of our world does not come naturally to computers.

Home robots

The second leap in the project was not only to convert data from cameras into 3D seamlessly, but also to reconstruct the entire configuration of the scene beyond what can be seen. This work could expand the perception capabilities of home robots widely.

In any indoor environment, things become hidden from view all the time. Hence, robots need to interpret their surroundings intelligently and the ‘soda can inside the refrigerator’ situation is one of many examples.

Still, it is easy to see how any application that uses vision will benefit if robots can draw upon their memory and object-permanence reasoning skills to keep track of both objects and humans as they move around the house.

The Columbia Artificial Intelligence and Robotics (CAIR) Lab, led by Computer Science Assistant Professor Shuran Song, has been researching robotic movement in a different way.

Her research focuses on deformable, non-rigid objects that fold, bend, and change shape. When working with deformable objects, roboticists can no longer rely on the rigid body assumption, forcing them to think about physics again.

Song, also a Toyota Research Institute Young Faculty awardee, said: “In our work, we are trying to investigate how humans intuitively do things.”

Socioeconomic challenges

Dr Eric Krotkov, advisor to the University Research Program, said: “The progress that Carl Vondrick and Shuran Song have made with their research contributes directly to Toyota Research Institute’s mission.

“TRI’s research in robotics and beyond, focuses on developing the capabilities and tools to address the socioeconomic challenges of an ageing society, labour shortage, and sustainable production.

“Endowing robots with the capabilities to understand occluded objects and handle deformable objects will enable them to improve the quality of life for all.”

Song and Vondrick plan to collaborate to combine their respective expertise in robotics and computer vision to create robots that assist people in the home.

By teaching machines to understand everyday objects in homes, such as clothes, food, and boxes, the technology could enable robots to assist people with mobility disabilities and improve the quality of everyday life for people.

By increasing the number of objects and physical concepts that can be learned by robots, the team aims to make these applications possible in the future.

Image: The self-supervised learning framework Columbia Engineers call DextAIRity learns to effectively perform a target task through a sequence of grasping or air-based blowing actions. Using visual feedback, the system uses a closed-loop formulation that continuously adjusts its blowing direction. © Zhenjia Xu/ Columbia Engineering.