Stage 8 — Prompts for RAG and Data Workflows
RAG is required when model pretraining alone is insufficient and fresh context is needed.
Stage topics
- Context injection
- Chunking
- Grounding
- Reducing hallucinations via sources
Context injection
Context should be:
- relevant,
- compact,
- structured (with source metadata).
“Dump everything into prompt” is a poor strategy.
Chunking
Data segmentation strongly affects retrieval quality.
Bad chunking:
- breaks semantic units,
- loses dependencies,
- reduces probability that key fact appears in top-k.
Good chunking balances:
- size,
- semantic coherence,
- retrieval speed.
Grounding
Grounding means model should rely on provided sources instead of inventing facts.
Useful rules:
- cite source,
- expose confidence level,
- explicitly state when evidence is missing.
Anti-hallucination pattern
- Retrieve
- Rank
- Generate with citations
- Validate against sources
What the prompt must say in RAG
RAG only improves reliability when the prompt tells the model how to use retrieved material. The model must know that sources are evidence, not free background inspiration. It should answer from the supplied context, cite the source for important claims, and admit when the retrieved context is insufficient. Without these rules, retrieval can make hallucinations look more credible because the answer contains citations while the claim itself is weakly supported.
Chunking must also be explained to the model indirectly through metadata. A chunk should carry a title, source id, date or version when relevant, and enough surrounding context to make the passage understandable. The prompt should ask the model to prefer direct support over broad semantic similarity. That keeps generation closer to evidence.
| RAG component | Prompt responsibility | Common failure |
|---|---|---|
| Retrieved chunk | Treat as evidence with source id | Uses context as vague inspiration |
| Citation rule | Tie claims to exact source | Decorative citations |
| Missing evidence policy | Say insufficient data | Guessing with confidence |
| Validation step | Compare answer against sources | Unsupported conclusion |
Stage takeaway
RAG is not only retrieval infra; it also requires prompt design that forces source-bound responses.
Beginner explanation
RAG means Retrieval-Augmented Generation. The idea is simple: before answering, the application searches relevant documents, places the found fragments into model context, and the model answers from those fragments. This is useful when the answer must rely on current or internal data: company docs, knowledge base, contracts, changelog, or FAQ.
Retrieval is the search for useful fragments. Chunking splits large documents into pieces so search can find the right semantic block. If a chunk is too small, context is lost. If it is too large, prompt context gets noisy. Ranking chooses the best fragments from retrieved candidates.
Grounding means claims in the answer should be supported by sources. If a source does not support the conclusion, the model should not answer confidently. Citations are not decoration; they make verification possible. A human or system should know which document supports each important fact.
Answer only from the provided sources.
If the sources do not support the answer, say what data is missing.
Attach a source to each key claim.
Do not use knowledge outside context for factual claims.
Mini scenarios from real projects
- Response sounds convincing but does not match sources: prompt does not require strict citation behavior.
- Full long document is injected as context and quality drops: chunking ignores semantic boundaries.
- Model cites a source but conclusion is not grounded in it: pipeline lacks grounding checks.
Fast decision rules
- For factual responses, require explicit claim-to-source grounding.
- Design chunking around semantic units, not only fixed character counts.
- If sources do not support the answer, return “insufficient data” instead of guessing.
Self-check questions
- Why does a strong LLM response remain unreliable without grounding?
- Which failures are typical when chunking is poorly designed?
- When is refusal more correct than a “likely” answer?