Language Models for Software Security

This research project uses language models to understand software artifacts such as source code, bytecode, and metadata. The aim is to strengthen software security by detecting malicious or tampered software earlier, reducing false negatives, and helping developers focus reviews where they matter most.

The current focus is on Android malware detection and Java obfuscation detection. For Android, information is combined from the app manifest, API call sequences, and Dalvik opcodes to build detectors with clear explanations and compact models that can run on mobile devices. For Java obfuscation, language model based detectors are built, which recognize patterns such as identifier renaming, string encryption, and control flow rewriting. This helps organizations and app stores spot hidden behavior, protect the software supply chain, and decide when deeper analysis or deobfuscation is needed. Ultimately, the goal is trustworthy, interpretable, and efficient security tooling for developers.

PhD student: Hantang Zhang, Umeå University

Leave a Reply

Your email address will not be published. Required fields are marked *