Chapter 1 covered the five core layers every production AI app needs: gateway, prompts, tracing, evals, the right agent shape. This chapter covers the next decision: which extra tools to layer on top.
The honest truth about most AI stacks: teams over-engineer at v0. They add LangChain, a vector database, fine-tuned models, and an agent framework before they have a single working evaluator. Then they cannot debug, cannot evaluate, and cannot move fast.
Each section here gives you a decision rule for one piece of the stack. The default position is "you do not need this yet." You add it only when measurements (from your tracing and evals) show you do.
The seven sections
- Do you need an agent framework? LangChain, LlamaIndex, OpenAI Agents SDK, CrewAI, AutoGen. When a framework saves time, when it adds debug overhead.
- LLM memory: when your agent needs to remember Stateless, session memory, cross-session memory. When each is worth the complexity.
- RAG and vector databases When semantic retrieval pays off, when stuffing the prompt is faster, which vector DB to pick.
- Should you fine-tune an LLM? When prompts and RAG hit a wall, when fine-tuning is premature, what fine-tuning costs.
- Web search for LLM apps When real-time information matters, when your KB is enough, which providers exist.
- Local vs frontier models When on-prem is required, when frontier APIs win, the cost-and-latency math.
- The default minimum AI stack What works for 95% of v0 builds, why simpler is faster.
Read order
You can read in order or jump to the decision you face today. Section 7 (the default minimum stack) is the recommended starting point if you have not built anything yet.
After this chapter, Chapter 3: AI for Customer Support is the first worked industry example using the foundation from Chapter 1 plus the choices from this chapter.
The shared theme across all seven sections: instrument first (Chapter 1.4), measure with evals (Chapter 1.5), and add complexity only when the data says you need it.
