logo

Automatic Protocol Reverse Engineering

Conference:  Black Hat USA 2022

2022-08-10

Summary

The presentation discusses a tool that automates the reverse engineering of protocols from executables using the E-star algorithm and symbolic execution.
  • The tool aims to extract the state machine protocol and message formats from an executable
  • It does not assume past traffic captures or an active protocol peer
  • It relies only on the binary code and uses the E-star algorithm and symbolic execution to learn the protocol state machine
  • The E-star algorithm is an automated learning algorithm that turns state machines
  • The algorithm asks questions about message exchanges to identify an unknown regular set from examples of members and non-members
  • The number of membership queries is polynomial in the number of states of the protocol
  • The tool was tested on a RAT and was able to reverse engineer the state machine of the protocol in two minutes
The presenters used a toy example of a client implementing a simple protocol where the client sends a login message to the server and responds with a logout message based on the server's response. They manually reverse engineered the protocol and then automated the process using the E-star algorithm and symbolic execution. They also worked with an SMTP client and a RAT to test the tool's performance.

Abstract

Protocol reverse engineering is the process of extracting the specification of a network protocol from a binary code that implements it. Extraction of protocol specification is useful in several security-related contexts, such as finding implementation bugs, determining conformance to a standard, or discovering a botnet's command and control (C&C) protocol.Manual reverse engineering of a protocol can be time-consuming. We present a tool that automatically reverse engineers a protocol directly from the binary. Namely, given a binary sample, the tool automatically extracts the protocol specification, including message formats and protocol state machine! The tool leverages symbolic execution and automata learning algorithms. This is the first tool that extract a protocol’s specification without relying on captures of the protocol’s traffic, with no prior knowledge of message formats and without assuming there is an active remote protocol peer (such as a C&C server).This is a joint work with Prof. Orna Grumberg from the Technion.

Materials:

Tags: