logo

Malware Classification With Machine Learning Enhanced by Windows Kernel Emulation

Conference:  Black Hat USA 2022

2022-08-11

Summary

The presentation discusses the use of machine learning in cybersecurity and how it can be applied to a wider set of data. It emphasizes that ML is a tool and not a replacement for human specialists.
  • Machine learning can be used in cybersecurity to analyze data and improve detection of threats
  • ML models can be trained using different types of data, not just P files as byte blobs
  • Combining different models, such as file path and emulation models, can improve performance
  • ML is a tool that can be used in addition to rule-based approaches, not a replacement for human specialists
  • ML is not a silver bullet and should be used in conjunction with other tools and approaches
The presentation mentions that ML can be applied to data like sysmon or odd, which represents a wider spectrum of security practitioners. This means that security operations centers can benefit from ML models like this, not just security vendors.

Abstract

This session will present a hybrid machine learning architecture that simultaneously utilizes static and dynamic malware analysis methodologies. We employ the Windows kernel emulator published by Mandiant for dynamic analysis and process emulation reports with a 1D convolutional neural network. On the contrary, static analysis is based on the state-of-the-art ensemble model publicly released by Endgame. It surpasses the capabilities of the modern AI classifiers. We use threat intelligence data consisting of in-the-wild telemetry from 100k samples and record a detection rate of 96.70% with a fixed False Positive rate of 0.1%. Additionally, we will show that contextual telemetry from a system, such as an executable's file path, can further increase detection rates. Finally, unaffiliated with any organization, we open-source our hybrid model with a convenient scikit-learn-like API for public use.

Materials:

Tags: