Compresr — RAG Frameworks Platform

Founded 2026|San Francisco, CA|2-10 people|Unknown

What is Compresr?

Compresr provides an API and open-source proxy for compressing LLM context at two levels: coarse-grained (selecting relevant chunks) and fine-grained (token-level compression within chunks). Part of YC W2026, it was founded by a team of four EPFL researchers: Ivan Zakazov (CEO, PhD dropout, published at EMNLP and NeurIPS), Oussama Gabouj (CTO, EMNLP 2025 paper on prompt compression), Berke Argin (CAIO, ex-UBS), and Kamel Charaf (COO, ex-Bell Labs).

The system claims up to 200x compression on aggressive RAG workloads without quality loss, with a default 50% token reduction. Their Context Gateway is an open-source Go proxy that sits between AI agents and LLM providers, compressing tool outputs and conversation history before tokens reach the model. It integrates with Claude Code, OpenClaw, and Codex.

On their SEC filing benchmark (141 questions across 79 filings up to 230K tokens each), Compresr compressed ~106K tokens to ~10.5K while improving accuracy from 72.3% to 74.5% using GPT-5.2 — a 76% cost reduction with better results. The team's peer-reviewed publications at NeurIPS and EMNLP on prompt compression give them the strongest academic credentials in the compression space.

Key features

Core capabilities this platform advertises.

Context compression
Accuracy preservation
Long context optimization
RAG enhancement

Strengths and tradeoffs

What this tool does well, and the limitations to keep in mind.

Pros

Strongest academic credentials in compression with NeurIPS and EMNLP publications
Four-person founding team from EPFL reduces single-founder risk
Open-source Context Gateway creates community adoption funnel
Two-level compression (coarse + fine-grained) is more sophisticated than token-only approaches
SEC filing benchmark demonstrates real enterprise RAG improvement with measurable results

Cons

No disclosed pricing for the paid API tier
No named customers or revenue metrics shared publicly
Competes directly with The Token Company on overlapping value proposition
200x compression claim is for aggressive workloads — default is 50%

Plans & pricing

What's included in each plan, and how the tiers compare.

Context Gateway

Free (open source)

Go proxy
Agent integration
Basic compression

API

Contact for pricing

Coarse + fine-grained compression
Up to 200x compression
Python SDK
Enterprise support

View official pricing page

Common use cases

Teams building RAG systems with long contexts

Long document processing
RAG context optimization
Token-efficient retrieval
Context window management

Using Compresr with Respan

Compresr reduces LLM input costs through context compression while Respan monitors output quality and performance. Together they optimize both sides of the LLM call.

Verify compression maintains output quality using Respan evaluations
Track cost savings from Compresr alongside total LLM spend in Respan
Monitor end-to-end RAG pipeline performance from compression to response via Respan

Monitor compressed RAG pipelines with Respan