Malware Classification With Machine Learning Enhanced by Windows Kernel Emulation

Conference: Black Hat USA 2022

2022-08-11

Summary

The presentation discusses the use of machine learning in cybersecurity and how it can be applied to a wider set of data. It emphasizes that ML is a tool and not a replacement for human specialists.

Machine learning can be used in cybersecurity to analyze data and improve detection of threats
ML models can be trained using different types of data, not just P files as byte blobs
Combining different models, such as file path and emulation models, can improve performance
ML is a tool that can be used in addition to rule-based approaches, not a replacement for human specialists
ML is not a silver bullet and should be used in conjunction with other tools and approaches

The presentation mentions that ML can be applied to data like sysmon or odd, which represents a wider spectrum of security practitioners. This means that security operations centers can benefit from ML models like this, not just security vendors.

Abstract

This session will present a hybrid machine learning architecture that simultaneously utilizes static and dynamic malware analysis methodologies. We employ the Windows kernel emulator published by Mandiant for dynamic analysis and process emulation reports with a 1D convolutional neural network. On the contrary, static analysis is based on the state-of-the-art ensemble model publicly released by Endgame. It surpasses the capabilities of the modern AI classifiers. We use threat intelligence data consisting of in-the-wild telemetry from 100k samples and record a detection rate of 96.70% with a fixed False Positive rate of 0.1%. Additionally, we will show that contextual telemetry from a system, such as an executable's file path, can further increase detection rates. Finally, unaffiliated with any organization, we open-source our hybrid model with a convenient scikit-learn-like API for public use.

Materials:

Tags: