Docling is IBM Research's open-source document conversion toolkit, designed for AI-driven workflows that need clean, structured data from messy documents. It converts PDFs, DOCX, PPTX, HTML, images, and more into JSON or markdown while preserving layout, tables, equations, code blocks, and lists.
In 2026, IBM released Granite-Docling-258M — an ultra-compact open-source vision-language model purpose-built for document conversion under Apache 2.0. Granite-Docling delivers significantly better recognition accuracy than traditional OCR by retaining the original layout structure and identifying complex elements like tables, math, and code blocks. The output uses DocTags, a universal markup format developed by IBM Research that captures every page element and its contextual relationships.
Strategically, IBM has positioned Docling for production use: launched the Docling OpenShift Operator with Red Hat (targeting banks), donated the project to the Linux Foundation's Agentic AI Foundation alongside BeeAI and Data Prep Kit, and is integrating it across Red Hat and IBM Cloud document workflows. Free, fully open-source, and self-hostable.
Free trial available
RAG and AI engineering teams that need accurate, structured ingest of PDFs, DOCX, and complex documents into LLM pipelines
Top companies in RAG Frameworks you can use instead of Docling.
Companies from adjacent layers in the AI stack that work well with Docling.
Last verified: April 29, 2026