Context Gateway
Free (open source)
- Go proxy
- Agent integration
- Basic compression
Compresr provides an API and open-source proxy for compressing LLM context at two levels: coarse-grained (selecting relevant chunks) and fine-grained (token-level compression within chunks). Part of YC W2026, it was founded by a team of four EPFL researchers: Ivan Zakazov (CEO, PhD dropout, published at EMNLP and NeurIPS), Oussama Gabouj (CTO, EMNLP 2025 paper on prompt compression), Berke Argin (CAIO, ex-UBS), and Kamel Charaf (COO, ex-Bell Labs).
The system claims up to 200x compression on aggressive RAG workloads without quality loss, with a default 50% token reduction. Their Context Gateway is an open-source Go proxy that sits between AI agents and LLM providers, compressing tool outputs and conversation history before tokens reach the model. It integrates with Claude Code, OpenClaw, and Codex.
On their SEC filing benchmark (141 questions across 79 filings up to 230K tokens each), Compresr compressed ~106K tokens to ~10.5K while improving accuracy from 72.3% to 74.5% using GPT-5.2 — a 76% cost reduction with better results. The team's peer-reviewed publications at NeurIPS and EMNLP on prompt compression give them the strongest academic credentials in the compression space.
Core capabilities this platform advertises.
What this tool does well, and the limitations to keep in mind.
Pros
Cons
What's included in each plan, and how the tiers compare.
Free (open source)
Contact for pricing
Teams building RAG systems with long contexts
Compresr reduces LLM input costs through context compression while Respan monitors output quality and performance. Together they optimize both sides of the LLM call.
Top companies in RAG Frameworks you can use instead of Compresr.
RAGFlow
Deep document understanding — tables, images, multi-language
Unstructured
Ingests 65+ file formats: PDFs, DOCX, PPTX, HTML, images, emails
LlamaIndex
Data framework for LLM applications
Haystack
Modular RAG framework
Reducto
Vision parsing
Pathway
Rust-powered streaming engine — millions of data points/sec
Carbon (Perplexity)
Data connectors
Vectara
R2R
RAG engine
Docling
Converts PDFs, DOCX, PPTX, HTML, images to structured JSON/markdown
Chunkr
Captain
Scalable knowledge search
WhyHow
Side-by-side comparisons with other tools in this category.
Companies from adjacent layers in the AI stack that work well with Compresr.
Last verified: March 27, 2026