Greybox Program Synthesis: A New Approach to Attack Dataflow Obfuscation

Conference:  BlackHat USA 2021



The presentation discusses the use of program synthesis for obfuscation in cybersecurity and the challenges it poses.
  • Program synthesis is the creation of a program given high-level specifications and constraints.
  • Obfuscation in cybersecurity involves scrambling expressions using bitwise and arithmetic operators.
  • Locating the data to obfuscate is a manual process, while the obfuscation itself is the main challenge.
  • There are multiple approaches to obfuscation through synthesis, including templates, stochastic search, and enumerative search.
  • The presentation highlights the gray box synthesis algorithm, which uses symbolic execution and a synthesizer to simplify expressions.
  • The main contribution of the algorithm is the input-output oracle, which compares the input and output behavior of the program.
  • The presentation emphasizes the importance of obfuscation in protecting valuable assets in a program, such as algorithms and data.
One example of obfuscation in cybersecurity is the use of mixed boolean arithmetic to scramble expressions. The challenge for reverse engineers is to recover the original expression, which is difficult compared to control flow obfuscations where the answer is boolean. Obfuscation poses two distinct research problems: locating the data to obfuscate and the obfuscation itself. Program synthesis can be used to address the latter issue.


Obfuscation is getting broadly adopted for a wide range of applications and especially to protect intellectual property (IP) in mobile ecosystem (Android, iOS) and embedded systems at large. It is now ubiquitous, and everyone is unwillingly and unknowingly executing obfuscated code. Throughout adoption it also gained maturity, potency making assessing such protection incrementally harder.It is used in a variety of contexts from malware to famous and widely used mobile applications. In either case, the goal is to protect software secrets, communication protocol, APIs, and its inner working from reverse engineering. Thus, finding new ways to defeat evolving obfuscation schemes is getting more and more important in this endless cat and mouse game.This talk presents the latest advances in program synthesis applied for deobfuscation. It aims at demystifying this analysis technique by showing how it can be put into action on obfuscation. Especially the implementation Qsynthesis released for this talk shows a complete end-to-end workflow to deobfuscate assembly instructions back in optimized (deobfuscated) instructions reassembled back in the binary.More specifically the talk presents the greybox synthesizer developed combining two core components, an I/O-based black-box synthesis using precomputed tables and a white-box AST search algorithm backed by symbolic execution. This new approach provides a very good trade-off between accuracy and speed. Various experiments to improve it like expression linearization, expression learning or table evaluation JITing will be presented with both their strengths and weaknesses to address obfuscation schemes attacked.Among existing schemes to impede program understanding, we show results obtained on various transformations like Mixed-Boolean-Arithmetic (MBA), arithmetic encoding, or virtualization that originates from multiple obfuscators like Tigress, YANSOllvm, or commercial applications.Finally, we will highlight limitations of the approach, open research problems yielded, and various insights on how to improve the algorithm to bypass roadblocks in order to better leverage program synthesis for deobfuscation.