EMBODIED MACHINE LEARNING

Machine learning models in robotics and autonomous systems are often trained on small, task-specific datasets and struggle with skill transfer to new tasks. In contrast, foundation models, pretrained on large datasets, show better generalization and can solve problems not directly represented in their training data. These models, especially when multimodal, have the potential to significantly improve robot autonomy, from perception and human-robot interaction to planning. Vision-language models, for example, enhance visual recognition and generalizable action planning. Furthermore, robots that autonomously interact with their environment and update their models through real-time data collection are key to building more informed foundation models. A combination of reinforcement learning, representational learning, and language grounding could help solve many current challenges. The arena benefits from expertise in this area through its collaboration with Danica Kragic and KTH’s Robotics, Perception, and Learning lab, and sees a potential for further interaction with WARA Robotics.