Open Source
Free
- Full AI features
- All 300+ integrations
- Apache 2.0 license
- Self-hosted
IncidentFox is an open-source AI SRE platform that automatically investigates production incidents end-to-end. Part of YC W2026, it was founded by Chiehmin (Jimmy) Wei (ex-Roblox, ex-Meta FAIR) and Long Yi (ex-Roblox), both with experience building distributed systems serving millions of users.
When an alert fires, IncidentFox kicks off an investigation within Slack threads — querying logs, checking pod status, correlating with recent deployments — and delivers root cause analysis with executable fix scripts. The platform ships with 300+ prebuilt integrations covering Kubernetes, AWS, Grafana, Prometheus, Datadog, Elasticsearch, PagerDuty, and GitHub. It auto-discovers each team's stack and generates needed integrations, reducing setup from months to under a day.
The system uses multi-agent orchestration routing specialist agents to sub-problems, intelligent log sampling (statistical analysis before targeted fetching), and 3-layer alert correlation (temporal, topology, semantic) that reduces alert noise by 85-95%. It supports 24+ LLM providers and can be deployed as SaaS, on-prem/VPC, or fully self-hosted. The core is Apache 2.0 licensed with full feature parity on the free tier.
Core capabilities this platform advertises.
What this tool does well, and the limitations to keep in mind.
Pros
Cons
What's included in each plan, and how the tiers compare.
Free
Custom
Contact sales for a quote
Custom
Contact sales for a quote
SRE and DevOps teams
IncidentFox investigates production incidents across infrastructure, while Respan monitors AI/LLM-specific issues. Together they provide comprehensive incident response covering both traditional infrastructure and AI application layers.
Top companies in Engineering Analytics you can use instead of IncidentFox.
Side-by-side comparisons with other tools in this category.
Companies from adjacent layers in the AI stack that work well with IncidentFox.
Last verified: March 27, 2026