What is RunAnywhere?

RunAnywhere provides infrastructure for deploying AI models directly on mobile and edge devices. Part of YC W2026, it was founded by Sanchit Monga (CEO, ex-Intuit, products used by 50M+ users) and Shubham Malhotra (CTO, who built MetalRT — the first complete multi-modal inference engine for Apple Silicon, ex-Amazon EC2 Spot M+ ARR).

The offering is a unified SDK (iOS Swift, Android Kotlin, React Native, Flutter) plus an enterprise control plane for managing model rollouts, policies, and analytics across thousands of devices. The SDK includes MetalRT achieving 668 tokens/second LLM decode on Apple Silicon, 101ms speech-to-text latency, and 287 tokens/second vision inference — all running locally on device.

A key differentiator is hybrid routing: when a device cannot handle a model locally, requests automatically fall back to cloud inference. The control plane enables OTA model updates without app store releases. The open-source SDKs have 10.1K GitHub stars with demo apps live on both iOS App Store and Google Play.

Key Features

✓On-device AI deployment
✓Edge orchestration
✓Scale management
✓Device-agnostic runtime

Pros & Cons

Pros

+Strong open-source traction with 10.1K GitHub stars and working mobile apps
+CTO built MetalRT from scratch with impressive on-device benchmarks
+Hybrid routing solves the key reliability problem of on-device AI
+Multi-platform SDK covers the entire mobile ecosystem
+Enterprise control plane with OTA updates creates clear monetization path

Cons

-No disclosed paying customers or revenue yet
-Dependent on device hardware capabilities and Android fragmentation challenges
-Enterprise pricing not transparent which may slow adoption
-Competes with Apple Core ML and Google ML Kit which have distribution advantages

RunAnywhere Pricing

Free trial available

Open Source SDKFree

✓iOS, Android, React Native, Flutter
✓MetalRT runtime
✓On-device inference

Enterprise Control PlaneContact for pricing

✓OTA model updates
✓Policy-based routing
✓Fleet analytics
✓Hybrid cloud fallback

View official pricing page

Common Use Cases

Teams deploying AI models on edge devices

•Edge AI deployment
•On-device inference
•IoT AI applications
•Offline AI

Using RunAnywhere with Respan

RunAnywhere deploys models on edge devices while Respan monitors cloud-based LLM calls. For hybrid routing scenarios, Respan can track the cloud inference fallbacks.

✓Monitor cloud fallback LLM calls from RunAnywhere devices via Respan
✓Track hybrid routing costs across on-device and cloud inference
✓Evaluate output quality differences between on-device and cloud inference via Respan

Monitor cloud AI fallbacks with Respan