The latest generation of large language models (LLMs) is extremely resource-hungry, to the point where training and even deploying models has become infeasible for most actors. This project proposes tackleling this problem by dissecting LLMs into modules that encapsulate functional capabilities such as languages and knowledge retrieval and that can be trained and updated independently of other parts of the model. The focus will be on NLP for medium-resource languages such as Swedish and Estonian and the evaluation of how chosen approaches affect systematic generalisation and catastrophic forgetting, in addition to assessing performance on relevant benchmarks.
The plan is to approach the problem from two different perspectives: implicit and explicit modularisation. The first will explore inductive biases that have the potential to help LLMs discover a suitable modular structure by themselves. In the latter case, the goal is to identify or develop human-designed modular frameworks that improve systematic generalisation and reduce catastrophic forgetting without compromising performance. To achieve this, the process will include experimenting with different adapter types, Mixture of Experts variants as well as model growing techniques.
PhD student: Kätrin Kukk, Linköping University