The presentation discusses the use of abstract syntax tree (AST) features and deep learning for code attribution and de-anonymization on GitHub. It also explores the impact of the number of files and snippets on accuracy and confidence levels.
- AST features and deep learning can improve code attribution and de-anonymization accuracy
- The number of files and snippets used for training impacts accuracy and confidence levels
- Calibration curves can help determine the confidence level of the classifier
- Collaborative coding presents challenges for code attribution and de-anonymization
The presenters were interested in validating their work on code attribution and de-anonymization in the real world, particularly in collaborative coding scenarios. They built a calibration curve to determine the confidence level of the classifier and found that the number of files and snippets used for training impacted accuracy and confidence levels. They also discussed the challenges of identifying individual authors in collaborative coding environments.