logo

No More Secret Sauce!: How We Can Power Real Security Machine Learning Progress Through Open Algorithms and Benchmarks

Conference:  BlackHat USA 2021

2021-11-11

Summary

The presentation discusses the importance of openness and open benchmarks in the field of AI and cybersecurity. It argues that the lack of open benchmarks in cybersecurity is hindering progress and making it difficult to compare results and know which ideas work best.
  • Open benchmarks have played a crucial role in the progress of non-security machine learning fields
  • The lack of open benchmarks in cybersecurity is hindering progress and making it difficult to compare results and know which ideas work best
  • The current state of practice in security and machine learning research is falling short of the standards of actual science
  • Companies claiming to use AI in cybersecurity should publish their scientific work
  • The inability to compare results is leading to substance-less marketing claims and making users cynical about security AI
The speaker presents a plot from paperswithcode.com that shows the progress and accuracy against the ImageNet benchmark dataset in computer vision over the last decade. This plot dramatizes the helpfulness of having open benchmarks in a field. However, in cybersecurity, there is a lack of community benchmarks, making it difficult to track progress and compare results.

Abstract

While we've recently seen game-changing machine learning breakthroughs in the domains of language, vision, and robotics, it's no secret that security ML progress remains fettered by unverifiable product claims and misleading marketing. In my talk I'll argue that to address this, we need to build a new culture of research transparency in security ML, fostering the same openness that we already bring to subfields like cryptography. Rather than claims of product "secret sauce," we need a culture of publishing our ML models, so they can be openly critiqued. And, instead of making non-reproducible claims about ML model accuracy, we should curate community benchmarks against which we demonstrate the relative efficacy of our ML approaches. In my talk, I'll lay out this argument and introduce the 20 million sample SOREL dataset which my team has released in conjunction with a team at Reversing Labs.

Materials:

Tags:

Post a comment

Related work