ABOUT WARA M&L – WARA Media and Language

A Bridge Between Research and Industry

WASP Research Arenas (WARA) were established to offer increased research impact and potential for industrially significant breakthroughs. Each WARA offers unique opportunities to validate and refine scientific theories in real-world settings that are relevant to industries in different fields. WARA Media & Language works closely with leading actors in creative and generative AI. By facilitating knowledge transfer between academia and industry we contribute to greater research impact and breakthroughs for the industry.

Increasing value by strengthening collaboration

WARA Media & Language was established in late 2019 and has since collaborated with more than 25 private and public organisations, including core partners like Nvidia, Ericsson, Bonnier, SEB, and EA. More than 20 doctoral students are affiliated with the arena.We have participated in projects leading to globally visible benchmarks, publications in top journals and conferences, co-developed the first series of Swedish LLMs, and brought together professionals and researchers from STEM, Humanities, and Social Sciences around challenges and opportunities related to multimodal conversational agents. Through explorative workshops with our partners, four areas of research have been selected as focal points for our continued efforts.

Embodied machine learning

Machine learning models in robotics and autonomous systems are often trained on small, task-specific datasets and struggle with skill transfer to new tasks. In contrast, foundation models, pretrained on large datasets, show better generalization and can solve problems not directly represented in their training data. These models, especially when multimodal, have the potential to significantly improve robot autonomy, from perception and human-robot interaction to planning. Vision-language models, for example, enhance visual recognition and generalizable action planning. Furthermore, robots that autonomously interact with their environment and update their models through real-time data collection are key to building more informed foundation models. A combination of reinforcement learning, representational learning, and language grounding could help solve many current challenges. The arena benefits from expertise in this area through its collaboration with Danica Kragic and KTH’s Robotics, Perception, and Learning lab, and sees a potential for further interaction with WARA Robotics.

Research Focus Area

Graph-based models

Generative AI is a popular research area with diffusion models and normalizing flows being applied to diverse tasks, such as language, image generation, source code, gestures, and music. However, current methods often generate media without symbolic representation, such as raster images instead of vector-based ones, limiting user flexibility and making it harder for systems to maintain semantics when editing images. For example, generative systems like Midjourney may misinterpret images when asked to create variations, leading to distorted results. To address this, a two-step generation process, first producing a graph-based representation and then the surface form, could improve accuracy. Key researchers in this area include Frank Drewes, Anastasia Varava, Henrik Björklund, and Ruibo Tu.

Research Focus Area

Multimodal foundation models

The recent collaboration with RISE, NVIDIA, and AI Sweden on Language Models (LLMs) has been valuable for both practical and academic insights, particularly around the GPT-SW3 model series (access the model here). The project has deepened understanding of user needs and challenges, with many favoring model-agnostic systems to integrate the best cost-performance LLMs. Some organizations, however, may require private, cloud-based instances to protect intellectual property. Public bodies with sensitive data, such as the Swedish Tax Agency and Swedish Armed Forces, need open models that can be hosted on premises. Looking ahead, the focus is on developing small to medium-sized foundation models, especially for multimodal data like time-series or graphs, where significant scientific and practical gains are expected. Ongoing collaboration with AI Sweden will complement this by providing larger, more versatile models. Key researchers in this area include Love Börjesson (KB Labs) and Marco Kuhlmann (LiU), with an emphasis on attracting international talent.

Research Focus Area

Interactive and Creative AI

AI is fundamentally transforming how we interact with data, necessitating the parallel development of the fields of Human-Computer Interaction and User Experience. On one hand, challenges arise in ensuring Trustworthy and Explainable AI. For instance, there is the question of how to convey to the user the information and assumptions on which an AI bases its decisions, help the user understand what tasks fall outside the scope of the AI’s capabilities.

Another important question is human-AI teaming, which requires effective methods for interpreting and controlling semi-automatic systems. The context can be autonomous mining or forestry, but can also encompass gaming and media production workflows. To this end, we want to collaborate with international contacts acquired through the Gaming Stream, and with the researchers linked to WASP-HS. Arena members with specific expertise in this domain include Gustav Eje Henter and Konrad Tollmar. Relevant partners include Electronic Arts, King and Motorica with which we organize events and projects, including the GENEA Challenge.