To receive industry-leading AI updates and exclusive content, sign up for our daily and weekly newsletters. Learn more
Robot Startup 1X Technologies The company has developed a new generative model that significantly improves the efficiency of training robotic systems in simulation. The model, which the company announced, New Blog Postis tackling one of the key challenges in robotics: learning a “world model” that can predict how the world will change in response to a robot’s actions.
Given the cost and risks of training a robot directly in a physical environment, roboticists typically use simulated environments to train their control models before deploying them in the real world. However, differences between simulated and physical environments pose challenges.
“Robotics engineers typically hand-craft scenes that become ‘digital twins’ of the real world and then simulate their dynamics using rigid-body simulators like Mujoco, Bullet, or Isaac,” Eric Jang, vice president of AI at 1X Technologies, told VentureBeat. “However, digital twins are subject to physical and geometric inaccuracies, which means you end up training in one environment and deploying in another, resulting in a ‘sim2real gap.’ For example, the spring stiffness of the handle of a model of a door you downloaded from the internet likely won’t be the same as the real door you’ll be testing your robot on.”
Generative World Model
To fill this gap, 1X’s new models are trained on raw sensor data collected directly from robots, learning to simulate the real world. By viewing thousands of hours of video and actuator data collected from the company’s robots, the models can see current observations of the world and predict what will happen if the robot takes a certain action.
Data was collected from the EVE humanoid robot performing various mobile manipulation tasks and interacting with people in homes and offices.
“We collected all the data in each of 1X’s offices and assembled a team of Android operators to help us annotate and filter the data,” says Jang. “By having the simulator learn directly from real data, the dynamics should become even closer to the real world as the amount of interaction data increases.”
The learned world model is particularly useful for simulating object interactions: a video shared by the company shows the model successfully predicting a video sequence of a robot grabbing a box. According to 1X, the model can also predict “important object interactions such as rigid bodies, object drop effects, partial observability, deformable objects (curtains, laundry), and articulated objects (doors, drawers, curtains, chairs).”
Part of the video shows the model simulating complex long-term tasks with deformable objects, such as folding a shirt. The model also simulates environmental dynamics, such as how to avoid obstacles and maintain a safe distance from people.
Challenges with generative models
Changing environments remain a challenge: As with all simulators, generative models need to be updated as the environment in which the robot operates changes. The researchers believe that updating the models will become easier as they learn how to simulate the world.
“If the training data is out of date, the generative model itself can have sim2real gaps,” Jang says, “but because this is a fully trained simulator, the idea is that by inputting new data from the real world, we can correct the model without having to manually tune the physics simulator.”
1X’s new system is inspired by innovations like OpenAI Sora and Runway, which show that with the right training data and techniques, generative models can learn some kind of world model and maintain consistency over time.
But whereas these models were designed to generate video from text, 1X’s new model is part of a trend of generative systems that can react to actions during the generation phase. For example, researchers at Google recently used a similar technique to train a generative model that can simulate the game DOOM. Interactive generative models offer a variety of possibilities for training robot control models and reinforcement learning systems.
However, some of the challenges inherent to generative models are still evident in the system presented by 1X. Because the model does not have an explicitly defined world simulator, it can generate unrealistic situations. In the example shared by 1X, the model sometimes fails to predict that an object will fall if it is floating in the air. Also, objects sometimes disappear from one frame to another. Significant efforts are still needed to address these challenges.
One solution is to keep collecting more data and training better models. “Over the past few years, video generative modeling has made dramatic advances, and results like OpenAI Sora suggest that considerable progress in scaling data and compute is possible,” Jang said.
At the same time, 1X encourages the community to get involved. Model and WeightThe company also plans to hold contests to improve the model, with cash prizes awarded to winners.
“We are actively researching multiple methods for world modeling and video generation,” Jiang said.