Stage 1 — How LLMs Work and Why Prompt Engineering Exists

Prompt engineering is not a collection of magic phrases. It is a way to clearly communicate the task, context, constraints, and expected output format to the model. If a user only writes “explain Docker”, the model has to guess the reader level, goal, depth, format, and quality criteria. Sometimes it guesses well, sometimes it does not. A good prompt reduces guessing.

An LLM, or large language model, is a model that generates text from visible context. It does not “think” like a human and does not verify every fact on the internet by default. The model receives input text, splits it into tokens, and chooses the next token by probability. Then the next one, then the next one. The answer is produced from that sequence.

Prompt pipeline

What tokens and context are

A token is a small piece of text. Sometimes it looks like a word, sometimes like part of a word, punctuation, or whitespace. The model does not directly operate on “human words”; it operates on token sequences. That is why both meaning and the amount of text inside the context window matter.

The context window is the model's visible working area: system instructions, user request, conversation history, inserted documents, and previous answers. The model answers from what is inside this window plus learned patterns. If a required fact is missing from context, the model may try to complete a likely answer. This is where hallucinations appear.

A long prompt is not always better than a short one. If context contains random details, contradictions, and old decisions, the model may lose focus. Good context is not the largest possible context; it is selected context: only what is needed for the current task.

Why the model can be confidently wrong

A hallucination is an answer that sounds plausible but is not grounded in a reliable fact from context or sources. The model does not necessarily “know that it does not know”. If the prompt demands a confident answer and gives no policy for missing information, the model often continues generating likely text.

To reduce risk, specify uncertainty behavior explicitly: “if data is missing, say what is missing”, “do not invent sources”, “separate fact from inference”. For high-accuracy tasks, provide documents, links, tables, or other verifiable input data.

What a good prompt contains

A practical prompt is like a small contract. It tells the model what to do, for whom, using which material, under which constraints, and in what shape to return the result. The clearer the contract, the less the model chooses randomly.

Prompt part	What it explains	Example
Task	What must be done	“Explain the topic to a beginner”
Audience	Who the answer is for	“Knows Java, but not Spring”
Context	Which facts to use	“Use this documentation excerpt”
Constraints	What is forbidden or important	“Do not use lists as the main explanation”
Format	How to return the result	“Markdown, 3 sections, table, and example”

A weak prompt is: “tell me about JWT”. A better prompt is: “Explain JWT to a beginner backend developer. First explain token, header, payload, and signature. Then show where the token is sent in an HTTP request. Do not go deep into OAuth. Return the answer with a table and a short Authorization header example.” The second prompt does not make the model smarter, but it makes the task clearer.

Temperature and top-p

Generation parameters control answer freedom. temperature affects randomness in token selection. A low value makes answers more stable and conservative. A high value may produce more variety and creativity, but also increases instability risk. top-p restricts sampling to a set of likely tokens.

For production tasks where accuracy and repeatability matter, you usually want a strict prompt, low or moderate temperature, and a fixed output format. For brainstorming, you can give the model more freedom, but the final result still needs review.

System, user, and assistant roles

In chat models, messages have roles. system defines high-level behavior rules: style, safety, and product constraints. user contains the concrete task. assistant contains model responses. If history contains a conflict, higher-priority instructions should control behavior.

For example, system may say: “answer briefly and do not invent facts.” User may ask: “make up a link to a study.” The model should follow the system constraint and not create a fake link.

Practical example

Suppose you need an explanation of Redis for a junior developer. A weak request like “what is Redis” may produce an overly broad answer. A controlled prompt would be:

Explain Redis to a junior backend developer.
First explain key-value storage and in-memory storage.
Then show 3 common scenarios: cache, session storage, rate limiting.
Do not go into clustering.
Format: short introduction, table, one pseudocode example.
If a term needs explanation, define it before using it.

This prompt defines level, boundaries, structure, and a limit on unnecessary depth. The answer becomes more useful not because the model “understood the world”, but because the task became engineered.

Stage takeaway

An LLM generates the next token from context and probabilities. Prompt engineering exists to control that process: define goal, audience, context, constraints, and format. A good prompt does not guarantee absolute truth, but it reduces ambiguity, lowers hallucination risk, and makes the result usable.

Understanding checklist

I can explain that an LLM generates text token by token.
I understand that the context window limits visible data.
I understand why a model can sound confident and still be wrong.
I can build a prompt from task, audience, context, constraints, and format.
I understand when to reduce model freedom through settings and strict format.

Prompt Engineering Introduction