Shaping the Future of General-Purpose Robotics: A Conversation with Google DeepMind’s Robotics Chief

Google DeepMind team recently unveiled Open X-Embodiment,

Google DeepMind a collaborative effort with 33 research institutes to create a groundbreaking database of robotics functionalities. This initiative has drawn parallels to ImageNet, the iconic image database founded in 2009, housing over 14 million images.

Quan Vuong and Pannag Sanketi, researchers at Google DeepMind, underlined the transformative potential of Open X-Embodiment. They likened it to ImageNet’s impact on computer vision, emphasizing that it could similarly advance the field of robotics. Their vision focuses on assembling a diverse dataset of robot demonstrations, a fundamental step in training a versatile model capable of controlling a variety of robots, interpreting diverse instructions, executing intricate tasks, and adapting comprehensively.

Google

At the time of the announcement, Open X-Embodiment already boasted 500+ skills and 150,000 tasks from 22 unique robot embodiments. Though not as extensive as ImageNet, this marked an auspicious beginning. DeepMind leveraged this dataset to train its RT-1-X model, achieving a 50% success rate compared to the in-house methods devised by various teams.

The current landscape of robotics research is characterized by exhilarating progress. Numerous teams are addressing the challenge of robotics from diverse angles, each enhancing their efficacy. Although bespoke robots still hold prominence, the emergence of general-purpose robots seems increasingly tangible.

Simulation and generative AI are anticipated to be pivotal components in this journey. However, some companies may have prioritized hardware development before fully exploring the underlying technology. The prospect of general-purpose robots is on the horizon, awaiting further exploration. Vincent Vanhoucke, Head of Robotics at Google DeepMind, offers insights into the company’s robotics endeavors and the path they’ve traversed.

Could you shed light on the genesis of the robotics team at DeepMind?

Google

Vincent Vanhoucke:  My association with Google DeepMind is relatively recent. Initially, I was part of Google Research, which later amalgamated with DeepMind. However, the roots of robotics research at Google DeepMind run deeper. It sprouted from the realization that perception technology was rapidly advancing, particularly in computer vision and audio processing, approaching human-level capabilities. It led us to ponder the implications of this progress in the years to come. One clear consequence was the imminent possibility of deploying robots in real-world environments, contingent upon robust perception capabilities. Initially, my work encompassed general AI, computer vision, and speech recognition. Yet, the potential of robotics as the next frontier in our research was unmistakable.

It appears that several members of the Everyday Robots team transitioned to your team. Could you elucidate this transition?

For approximately seven years, we fostered a robust collaboration with the Everyday Robots team. During this period, we embarked on a skunkworks project involving robot arms that had been discontinued and were lying idle. This venture aimed to teach these arms how to grasp objects, a concept relatively uncharted at the time. We rewarded successful attempts and registered failures. This initiative marked the first successful application of machine learning to solve the generalized grasping problem, signifying a significant breakthrough. This project catalyzed our focus on using machine learning to control robots, prompting a strategic shift in our research toward robotics.

You mentioned that the Everyday Robots team became part of your team. Can you elaborate on this transition?

A segment of the Everyday Robots team seamlessly transitioned into my team. We inherited their robot assets, which we continue to leverage. Our ongoing work builds upon the technology they pioneered and developed. Our focus has slightly evolved from the original vision of the team, with a heightened emphasis on intelligence over robot construction.

The decision to relocate your team to the Alphabet X offices is intriguing. Is there a deeper rationale for this move?

The decision to move to the Alphabet X offices primarily stemmed from practical considerations. The new location offered several advantages, including reliable Wi-Fi, a stable power supply, and spacious facilities. While one would expect all Google buildings to have robust Wi-Fi, the pragmatic benefits significantly influenced our decision. Additionally, our previous office had faced grievances about the quality of its cafeteria. Hence, the presence of a superior cafe on-site was an enticing aspect of the move. There is no concealed agenda in this choice, but we relish the close collaboration with other teams at Alphabet X and nurture our alliances with entities like Intrinsic. The synergies and the quality of the workspace rendered it a logical choice, not to mention the appealing aesthetics of the building itself.

There appears to be some convergence with Intrinsic’s focus on no-code robotics and robotics learning. Could you delve deeper into this synergy? Robotics has transitioned from a realm where each project necessitated distinct requirements and a unique skill set to a landscape where we endeavor to create general-purpose robotics. These general methods can be applied across diverse types of robots, be it in industrial, home, or sidewalk contexts. Our approach does not hinge on achieving a singular, universal robot embodiment. Instead, we aim to refine specific robot embodiments to effectively address particular problems. Consequently, the question of whether general-purpose robots will materialize is multifaceted. While bespoke robots have achieved success thus far, the technology required for more general-purpose robots may not be fully matured. The future trajectory in this domain is intricately linked to technological advancements, and we are actively steering our efforts in that direction. Our recent RTX project is a testament to our commitment, as we engage with academic labs to collect data and train a substantial model to unlock novel possibilities.

Generative AI is expected to be a linchpin in the realm of robotics. Could you provide more insights into this?

Generative AI is poised to assume a central role in the field of robotics. The emergence of large language models has prompted inquiries into their applicability to robotics. Initially perceived as a passing trend, this development has evolved into a profound transformation. Large language models transcend mere language and encompass common-sense reasoning and a profound understanding of everyday scenarios. For instance, a large language model can discern the logical placement of a coffee cup in a kitchen cupboard or on a table, differentiating between sensible actions and nonsensical ones. These models possess knowledge that is challenging to encode in traditional robotics systems. We have successfully integrated this common-sense reasoning into tasks such as robot planning, interactions, manipulations, and human-robot communication. An agent equipped with common-sense reasoning, capable of evaluating events in a simulated environment, constitutes a pivotal aspect of robotics.

Simulation is expected to play a pivotal role in data collection and analysis. Could you shed more light on this?

Simulation indeed constitutes a vital component, albeit accompanied by the challenge of bridging the gap between simulation and reality. Simulations offer approximations of real-world conditions, but achieving precision and faithful representation can be challenging. This entails ensuring that the simulator’s physics and visual rendering closely mirror reality. Generative AI is making strides in this realm. Rather than relying solely on physics simulators, we can generate simulations using image generation or other generative models. For instance, Amazon’s use of simulation to generate packages is a rational application. Looking ahead, generative AI could be leveraged to create potential future scenarios. Consider a scenario where a robot undertakes a specific action. Using generative models to explore and validate such scenarios before real-world implementation is an intriguing prospect.

 

Read more

Related posts

Leave a Reply

Your email address will not be published. Required fields are marked *