Docling is an open-source document processing toolkit developed by IBM Research Zurich's AI for knowledge team. Released in July 2024, Docling simplifies document processing by parsing diverse formats—including advanced PDF understanding—and providing seamless integrations with the generative AI ecosystem. The library can parse PDF, DOCX, PPTX, XLSX, HTML, WAV, MP3, WebVTT, and image formats, with advanced capabilities including page layout analysis, reading order detection, table structure recognition, code extraction, and formula parsing. In early 2026, IBM released Granite-Docling-258M, a production-grade vision-language model under Apache 2.0 license. Since its first public release in August 2024, the library has logged over 100 releases and crossed 37,000 GitHub stars, becoming a leading solution for preparing documents for generative AI applications. Docling is hosted as a project in the LF AI & Data Foundation.
Developers and researchers who need accurate document parsing with layout and table understanding
Top companies in RAG Frameworks you can use instead of Docling.
Companies from adjacent layers in the AI stack that work well with Docling.