MLflow — Observability, Prompts & Evals Platform

Observability, Prompts & EvalsLayer 4Open Source

Founded 2018|San Francisco, CA (Linux Foundation governed)|900+ contributors

What is MLflow?

MLflow is the leading open-source platform for managing the end-to-end machine learning lifecycle, now expanded into a comprehensive GenAI engineering platform. Created by Matei Zaharia (also the creator of Apache Spark) at Databricks in 2018 and donated to the Linux Foundation in 2020, MLflow has grown to over 20,000 GitHub stars and 60 million monthly downloads, making it one of the most widely adopted ML tools in the world.

With the release of MLflow 3.0 in June 2025, the platform underwent a major pivot to become a unified AI engineering platform for agents, LLMs, and ML models. The GenAI capabilities include OpenTelemetry-compatible tracing for LLM observability, 50+ built-in evaluation metrics with LLM-as-judge support, prompt versioning and optimization, and a built-in AI Gateway providing unified API access to all major LLM providers with rate limiting and cost control. The platform auto-traces 50+ AI frameworks including OpenAI, Anthropic, LangChain, LlamaIndex, and DSPy.

MLflow is used by over 19,000 companies globally, including Fortune 500 organizations like Amazon, Microsoft, Google, and BNP Paribas. While it is 100% free and open source under the Apache 2.0 license, Databricks offers a fully managed MLflow experience integrated into their cloud data platform. MLflow's unique strength is combining traditional MLOps capabilities (experiment tracking, model registry, deployment) with modern GenAI observability — something no other tool in the category offers.

Key Features

✓OpenTelemetry-native tracing
✓50+ built-in eval metrics & LLM judges
✓Prompt versioning & management
✓Built-in AI gateway
✓Full MLOps lifecycle (experiments, model registry, deployment)

Pros & Cons

Pros

+Truly open source with Linux Foundation governance — no vendor lock-in, Apache 2.0 license
+Massive ecosystem with 900+ contributors and integrations with 100+ AI frameworks across Python, TypeScript, Java, and R
+Comprehensive GenAI platform with OpenTelemetry tracing, 50+ eval metrics, prompt management, and built-in AI Gateway
+Unmatched adoption at 60M+ monthly downloads and 19,000+ companies globally
+Unique combination of traditional MLOps and modern GenAI observability in a single platform

Cons

-No built-in user management or RBAC in the open-source version — teams need Databricks or custom solutions for access control
-Steep setup complexity for shared team deployments requiring proper storage backends, auth, and networking
-Best features like Unity Catalog integration and serverless deployment require Databricks, creating soft vendor lock-in
-GenAI-specific UI and developer experience less polished than LLM-native tools like Langfuse or LangSmith

MLflow Pricing

Free trial available

Open SourceFree

✓Full platform access
✓Self-hosted
✓Apache 2.0 license
✓All GenAI features included

Databricks Standard$0.40/DBUusage-based

✓Managed MLflow
✓Cloud-hosted tracking server
✓Integrated with Databricks

Databricks Premium$0.55/DBUusage-based

✓Everything in Standard
✓Serverless compute
✓Unity Catalog integration

Databricks Enterprise$0.65/DBUusage-based

✓Everything in Premium
✓Advanced security
✓Compliance controls

View official pricing page

Common Use Cases

ML engineers and AI teams, especially those in the Databricks ecosystem

•LLM observability & tracing
•Automated evaluation
•Prompt optimization
•Model deployment
•Production monitoring

Using MLflow with Respan

MLflow and Respan complement each other in the AI observability stack. While MLflow provides experiment tracking, model registry, and GenAI tracing for development workflows, Respan adds production-grade LLM gateway management, cost optimization, and real-time monitoring. Teams can use MLflow for development-time evaluation and Respan for production observability.