Aggregating Semgrep Results: Top Rules, Files, and Clusters (MVP Demo)

Introduction

Turning security tool outputs into actionable insights is one of the biggest challenges for developers and security engineers. In this post, I’m sharing a minimal viable product (MVP) that takes Semgrep scan outputs and visualizes: top rules, the most affected files, and clusters of related findings.

Watch the Demo

How the MVP Works

Aggregates Semgrep Results: JSON outputs from multiple scans are collected into a single dataset, ready for analysis.
Highlights Top Rules & Top Files: Quickly identifies the rules triggered most often and the files with the highest number of findings. This helps prioritize what to fix first.
Clusters Related Findings: Findings are grouped into logical clusters to reveal patterns and correlations between rules and code contexts.

Why This Matters

Provides a fast overview of large codebases.
Reduces noise by focusing on the findings that matter most.
Serves as the foundation for a Signal Engine: ingest → normalize → rank → export/report.

Next Steps

This MVP is just the start. The ultimate goal is a full tool, I will call it Signal Engine, that can:

Ingest results from multiple security tools
Normalize and deduplicate findings
Rank risks per finding
Export actionable reports for developers and security engineers

The demo shows a minimal but immediately usable implementation of this approach.

Try it Yourself

If you want to explore this workflow with your own Semgrep outputs, this MVP provides a fast way to see patterns, prioritize rules, and focus on what really matters in your code security scans.

The code is AGPLv3 licensed and released here: https://github.com/thesp0nge/mvp_semgrep