RunAnywhere provides infrastructure for deploying AI models directly on mobile and edge devices. Part of YC W2026, it was founded by Sanchit Monga (CEO, ex-Intuit, products used by 50M+ users) and Shubham Malhotra (CTO, who built MetalRT — the first complete multi-modal inference engine for Apple Silicon, ex-Amazon EC2 Spot M+ ARR).
The offering is a unified SDK (iOS Swift, Android Kotlin, React Native, Flutter) plus an enterprise control plane for managing model rollouts, policies, and analytics across thousands of devices. The SDK includes MetalRT achieving 668 tokens/second LLM decode on Apple Silicon, 101ms speech-to-text latency, and 287 tokens/second vision inference — all running locally on device.
A key differentiator is hybrid routing: when a device cannot handle a model locally, requests automatically fall back to cloud inference. The control plane enables OTA model updates without app store releases. The open-source SDKs have 10.1K GitHub stars with demo apps live on both iOS App Store and Google Play.
Free trial available
Teams deploying AI models on edge devices
RunAnywhere deploys models on edge devices while Respan monitors cloud-based LLM calls. For hybrid routing scenarios, Respan can track the cloud inference fallbacks.
Top companies in Inference & Compute you can use instead of RunAnywhere.
Companies from adjacent layers in the AI stack that work well with RunAnywhere.
Last verified: March 27, 2026