
The Toyota Research Institute (TRI) recently introduced a generative AI method utilizing what they call diffusion policy to teach robots new dexterous skills. The aim is to create “Large Behavior Models” (LBMs) for robots like the transformative large language models (LLMs) upon which conversational AI is built.
The institute has already taught robots more than 60 complex skills, such as pouring liquids and manipulating deformable objects, without coding, by leveraging the Diffusion Policy approach to gen AI. This approach allows robots to autonomously learn from dozens of demonstrations, making it both rapid and highly performant.
Diffusion Policy is a particular type of generative AI inspired by diffusion-based image generators, such as stable diffusion. It uses a stochastic denoising diffusion process to distill noise into predicted robot actions in much the same way that image diffusion creates images.
Diffusion is naturally stable to train, can learn from demonstrations that achieve goals in different ways, and is well suited to high-dimensional continuous action spaces, and provides a scalable architecture for learned robot behaviors TRI is building upon to create LBMs.
“Using gen AI to teach, rather than program, behaviors offers an alternative that promises potential to scale to complex tasks without requiring infeasible amounts of human hand-engineering,” says Ben Burchfiel, research scientist for TRI.
Using generative AI to create general-purpose large behavior models could offer an alternative and significantly more flexible and scalable approach to achieving generally capable robots. Burchfiel says the organization plans to expand this to hundreds of new skills by year-end and 1,000 by the end of 2024.
The custom-built robot platform emphasizes tactile sensing and haptic feedback, supported by Drake, a model-based robotics design.
“Structured approaches for explicitly programming robots–sometimes called classical robotics– have been successful in highly controlled environments, such as manufacturing, or when the robot is relatively simple, such as a vacuum robot,” he explains.
However, explicit programming requires a lot of effort; human experts must not only create good behaviors, but they must also anticipate how the robot might fail and program contingencies for “edge cases”.
“This difficulty is compounded as robots must do wider varieties of tasks in more diverse environments,” Burchfiel says.
The research expands robot capabilities for versatile interaction in unpredictable environments, potentially enabling them to support various real-world applications.
Davi Ottenheimer, vice president of trust and digital ethics at Inrupt, says there are perceived cost and time efficiencies of AI (like any automation), but the greater societal concept of what’s “important” should be treated appropriately.
“Teaching robots with GenAI brings to mind risks from unintended outcomes,” he cautions. “Loss of control, unauthorized crossing of boundaries and norms–doing harm–are the kinds of ethical challenges more important than saving money.”
He says it’s important to scope the issue down to why people believe they will save money and time teaching robots with AI.
“Then we can address the narrower set of issues such as complexity and skills shortages often creeping in to erase any expected savings,” he says. “The ‘automated teller machine’ revolution of banks was supposed to be important because it would save so much money it paid for itself – instead, people interacting with the machine are charged a fee to withdraw their money, unlike any human teller.”
He adds in his experience the greatest challenge to designing the new robot behavior model is a function of gaps within the design team.
“When social science is missing, humanities foundation lacking, the new robot behavior model is almost certainly going to run directly into causing societal harm and inappropriate actions,” he says. “A safety sentry bot on patrol, for example, started running over small children it was supposed to protect. In another case it threw itself into a fountain.”
Burchfiel adds one of the most challenging parts of robotics in general is that robot performance tends to be limited by the weakest link in the system, be that the hardware reliability, the robustness of the control stack, or the algorithms powering behavior.
“That means that a good generative approach to behavior learning is, alone, not sufficient for highly performant robots,” he explains. “The rest of the system must be robust as well, and much of our success rests on the rigorous engineering of our entire system.”
He notes there has also been a bit of a learning curve for the team as they have learned how to best teach robots.
“Just like with people, a good teacher that anticipates confusion and proactively crafts their teaching to clarify significantly improves performance,” Burchfiel says.