Name: Building with Retrieval
Price: 1499 USD
Availability: InStock

The book

Most RAG systems that fail in production don't fail because of the model — they fail because of chunking decisions made in a notebook, retrieval pipelines that looked fine on twenty documents, or indexes that quietly go stale. Building with Retrieval works through the full stack: embedding models, vector stores, hybrid search, reranking, and the prompt patterns that keep answers grounded in what was actually retrieved. Marcus Hale draws on real deployment decisions — comparing costs in dollars at production query volumes across Pinecone, pgvector, and Qdrant, weighing LlamaIndex against LangChain for orchestration, and treating provenance as a first-class concern through the Anthropic citations API. It also tells you, with specifics, when RAG is the wrong solution entirely.

What you'll learn

Why chunking strategy has more impact on retrieval quality than any other single decision, and which patterns hold up under production load
The operational and cost tradeoffs between pgvector, Pinecone, Weaviate, and Qdrant — what each buys you and what each costs you at scale
When hybrid search (BM25 combined with dense vector retrieval) outperforms either approach alone, and how a reranking pass sharpens results further
How to write prompts for grounded answers, surface citations using the Anthropic citations API, and make provenance something users can actually verify
Update cadences, expiration policies, and the failure modes that follow when index freshness falls out of sync with your data
Evaluation methods that go beyond manual review: measuring recall, faithfulness, and answer relevance in a repeatable, automated way

Who it's for

Backend and ML engineers who have a RAG prototype working and are now facing the problems tutorials skip: retrieval precision at scale, keeping the index current, measuring whether the system is actually good, and deciding when long context or fine-tuning would serve better than retrieval.