The presentation discusses the problem of discrepancies in the PE file format and proposes a systematic approach to finding them through reverse engineering and modeling. The authors suggest the need for more formal specification and a publicly available reference implementation.
- Developers of security tools should model more than one version of the Windows loader to account for small quirks and edge cases that are handled differently
- The authors created a language and framework for reverse engineering efforts to build models and systematically find discrepancies in the PE file format
- The language includes input declaration statements, symbol definition statements, and predicates to model constraints on the original input
- The authors suggest the need for more clear and formal specification for the PE file format and a publicly available reference implementation
The authors found that there are three types of operations that software usually performs on input files: structural checks, compliance checks, and memory mapping. They created a language to model these operations and constraints on the original input, including input declaration statements, symbol definition statements, and predicates. The authors suggest that developers of security tools should model more than one version of the Windows loader to account for small quirks and edge cases that are handled differently. They also suggest the need for more clear and formal specification for the PE file format and a publicly available reference implementation.
This Briefing presents our research on parser differentials for the PE format. We defined a custom language to write "formal models" of various PE loaders, for different versions of Windows and reverse-engineering tools. We then built a framework that, using these models, can perform a number of analyses that aid reverse-engineering tasks.First, given a PE executable, it can determine whether a PE loader would consider it valid. This feature provides a filtering stage for dynamic malware analysis, as it can identify broken samples before running them in sandboxes. Our framework is also able to automatically generate SMT models of the various PE loaders, and it can automatically perform several powerful tasks: given a PE loader, generate a valid executable that can be loaded by it; or, it can perform "differential analysis" and automatically generate PE files that are valid for one PE loader but not for another one. This makes our framework powerful enough to perform complex tasks like "creating valid executables that reverse-engineering tools/AVs cannot correctly parse/consider invalid", thus bypassing them. Our framework can also *systematically* explore differences among various PE loaders, and it can perform "differences enumeration" and "corner-cases generation".We wrote models for the loaders Windows XP, 7, and 10, radare2, ClamAV, and yara (which we'll publicly release, alongside the framework). Surprisingly, we discovered that they all handle the PE format differently. We also performed a live hunt on VirusTotal, and we identified malware samples that actively employ these discrepancies, likely to bypass AV products.Finally, we conclude by pointing out that there is no one "correct way to parse an executable," and that analysis tools should start allowing users to select "which loader to simulate" when loading a binary.