Aggregating Semgrep Results: Top Rules, Files, and Clusters (MVP Demo)
Introduction
Turning security tool outputs into actionable insights is one of the biggest challenges for developers and security engineers. In this post, Iām sharing a minimal viable product (MVP) that takes Semgrep scan outputs and visualizes: top rules, the most affected files, and clusters of related findings.
Watch the Demo
How the MVP Works
- Aggregates Semgrep Results: JSON outputs from multiple scans are collected into a single dataset, ready for analysis.
- Highlights Top Rules & Top Files: Quickly identifies the rules triggered most often and the files with the highest number of findings. This helps prioritize what to fix first.
- Clusters Related Findings: Findings are grouped into logical clusters to reveal patterns and correlations between rules and code contexts.
Why This Matters
- Provides a fast overview of large codebases.
- Reduces noise by focusing on the findings that matter most.
- Serves as the foundation for a Signal Engine: ingest ā normalize ā rank ā export/report.
Next Steps
This MVP is just the start. The ultimate goal is a full tool, I will call it Signal Engine, that can:
- Ingest results from multiple security tools
- Normalize and deduplicate findings
- Rank risks per finding
- Export actionable reports for developers and security engineers
The demo shows a minimal but immediately usable implementation of this approach.
Try it Yourself
If you want to explore this workflow with your own Semgrep outputs, this MVP provides a fast way to see patterns, prioritize rules, and focus on what really matters in your code security scans.
The code is AGPLv3 licensed and released here: https://github.com/thesp0nge/mvp_semgrep