logo

Asura: A huge PCAP file analyzer for anomaly packets detection using massive multithreading

Conference:  Defcon 26

2018-08-01

Summary

The presentation discusses the challenges of handling the increasing internet traffic and the importance of flexibility in data mining for effective packet inspection.
  • Internet traffic is increasing exponentially, leading to challenges in packet inspection
  • Open-source data mining may not work effectively in many cases as people try to hide their activities
  • Flexibility is important in data mining and packet inspection
  • Astra is a compact and flexible application that can process large amounts of packets
  • Future work includes improving container size and incorporating GPU and TBB
The presenter shared their experience of having to inspect a large file of 200 to 300 rows during a 20-minute presentation, with 3 to 5 white wires to be inspected. They emphasized the importance of their operation and the challenges of handling the unpredictable and unorganized nature of real-world data.

Abstract

Recently, the inspection of huge traffic log is imposing a great burden on security analysts. Unfortunately, there have been few research efforts focusing on scalablility in analyzing very large PCAP file with reasonable computing resources. Asura is a portable and scalable PCAP file analyzer for detecting anomaly packets using massive multithreading. Asura's parallel packet dump inspection is based on task-based decomposition and therefore can handle massive threads for large PCAP file without considering tidy parameter selection in adopting data decomposition. Asura is designed to scale out in processing large PCAP file by taking as many threads as possible. Asura takes two steps. First, Asura extracts feature vector represented by associative containers of <sourceIP, destIP> pair. By doing this, the feature vector can be drastically small compared with the size of original PCAP files. In other words, Asura can reduce packet dump data into the size of unique <sourceIP, destIP> pairs (for example, in experiment, Asura's output which is reduced in first step is about 2% compared with the size of original libpcap files). Second, a parallel clustering algorithm is applied for the feature vector which is represented as {<sourceIP, destIP>, V[i]} where V[i] is aggregated flow vector. In second step, Asura adopts an enhanced Kmeans algorithm. Concretely, two functions of Kmeans which are (1)calculating distance and (2)relabeling points are improved for parallel processing. In experiment, in processing public PCAP datasets, Asura can identified 750 packets which are labeled as malicious from among 70 million (about 18GB) normal packets. In a nutshell, Asura successfully found 750 malicious packets in about 18GB packet dump. For Asura to inspect 70 million packets, it took reasonable computing time of around 350-450 minutes with 1000-5000 multithreading by running commodity workstation. Asura will be released under MIT license and available at author's GitHub site on the first day of DEF CON 26.

Materials:

Tags:

Post a comment

Related work