Prompting 1
Interactive Quiz - Only Have MCQs
Q1. (MCQ) A prompt reads: "Translate the following English text to French: 'Good morning'". Which element of a prompt is absent here?
A) Instruction B) Context C) Input Data D) Output Indicator
Answer: D
- A) Instruction — Incorrect. "Translate the following English text to French" is a clear instruction telling the model what task to perform.
- B) Context — Incorrect. While there's no heavy external background, the specification of source and target language ("English to French") provides the necessary steering context for the model.
- C) Input Data — Incorrect. "Good morning" is the explicit input the model needs to act upon.
- D) Output Indicator — Correct. The prompt never specifies the desired format or type of output. It doesn't indicate whether the response should be plain text, a JSON object, a bullet list, or include transliteration. There's no structural expectation set for the output.
Q2. (MSQ — Select ALL that apply) Which of the following are part of the anatomy of a contextual prompt?
A) The Context / Source Material B) The Output Indicator C) The Constraints D) The Query / Task E) The Role Definition
Answer: A, C, D
- A) The Context / Source Material — Correct. This is the documentation, chat history, database snippets, or situational background injected into the prompt.
- B) The Output Indicator — Incorrect. This belongs to the general "Elements of a Prompt" framework, not the specific three-part anatomy of a contextual prompt.
- C) The Constraints — Correct. Instructions on how to use or not use the provided context (e.g., "answer ONLY from the snippet") form the second component.
- D) The Query / Task — Correct. The specific question or action the user wants performed based on the context is the third component.
- E) The Role Definition — Incorrect. While a role can appear in a contextual prompt (as in the HR Assistant example), it is not one of the three core anatomical components. Role definition belongs to role/system prompting techniques.
Q3. (MCQ) A developer structures their prompt caching setup as follows:
Current Time: {{dynamic_timestamp}}
[10,000-token company policy document]
User Question: "What is the leave policy?"
On the second request with a different timestamp but identical policy and a new question, what happens?
A) The cache hits on the policy document, and only the question is reprocessed B) The entire prompt is reprocessed from scratch because the cache is invalidated C) The cache hits partially — the policy is cached but the timestamp segment is reprocessed separately D) The cache hits because the policy document is the largest static block and caching targets the heaviest segment
Answer: B
- A) — Incorrect. This would only be true if the static document appeared before the dynamic timestamp. Prompt caching uses exact prefix matching — the identical sequence must start at the absolute beginning.
- B) — Correct. Because the dynamic
{{timestamp}}is placed at the very top of the prompt, even a one-character change in it breaks the prefix match entirely. The cache is completely invalidated, and all 10,000+ tokens must be reprocessed from scratch. - C) — Incorrect. Prompt caching does not work segment-by-segment. It matches a single continuous prefix from the start. There is no partial or selective segment caching.
- D) — Incorrect. Caching does not target the "heaviest" or largest block. It strictly relies on prefix identity, regardless of token count in individual sections.
Q4. (MCQ) In prompt tuning, what exactly gets updated during the training process?
A) The attention weights across all transformer layers B) Only the final classification head of the model C) A set of continuous virtual token embeddings prepended to the input D) The token-to-vector mapping in the model's vocabulary lookup table
Answer: C
- A) — Incorrect. This describes full fine-tuning, where every layer's parameters are adjusted. In prompt tuning, the entire core LLM is frozen.
- B) — Incorrect. This describes a different PEFT approach (head tuning / linear probing), not prompt tuning. Prompt tuning operates at the input embedding layer, not the output head.
- C) — Correct. Prompt tuning prepends learnable, continuous "virtual" token vectors to the input sequence. During training, backpropagation updates only these soft prompt vectors while the rest of the model remains completely frozen.
- D) — Incorrect. The vocabulary lookup table is part of the frozen base model. Soft prompts are separate vectors that live outside the fixed vocabulary — they don't correspond to any real word and don't modify the lookup table.
Q5. (MSQ — Select ALL that apply) Which of the following are valid advantages of system prompting over embedding the same instructions inside a user prompt?
A) System prompts are immune to prompt injection attacks B) System prompts persist across an entire multi-turn conversation C) LLMs are trained to treat system-level instructions with higher priority D) System prompts make the application more robust against adversarial manipulation E) System prompts allow the model to access external APIs
Answer: B, C, D
- A) — Incorrect. The material states system prompts make the application "more robust" against prompt injection, not immune. No prompting technique provides absolute immunity.
- B) — Correct. System prompts remain active and influential across the entire session, unlike earlier user prompts which fade in relevance as the context window grows.
- C) — Correct. LLMs are trained to treat system-level instructions with higher priority than user-level instructions.
- D) — Correct. Placing constraints in the system prompt adds a layer of resilience against adversarial users attempting to override instructions.
- E) — Incorrect. System prompts define behavior and constraints — they do not grant the model new capabilities like API access. That requires tool-use or function-calling configurations.
Q6. (MCQ) A developer wants to build 50 distinct AI capabilities (sentiment analysis, code generation, legal drafting, etc.) for an enterprise platform. Which approach allows deploying all 50 using a single base model instance?
A) Full fine-tuning with 50 separate model checkpoints B) Hard prompt engineering with 50 different system prompts C) Prompt tuning with 50 swappable soft prompt files D) Retrieval-Augmented Generation with 50 separate knowledge bases
Answer: C
- A) — Incorrect. This would work functionally, but it requires hosting 50 separate multi-gigabyte model copies — the exact problem the question is trying to avoid.
- B) — Incorrect. While this uses one model, hard prompt engineering requires significant manual trial-and-error for each task and may not match the performance of trained approaches at scale. More importantly, the question targets the specific advantage described for prompt tuning: swapping lightweight files on a single frozen model.
- C) — Correct. Prompt tuning's core deployment advantage is hosting one frozen base model and swapping out tiny soft prompt vector files (often just kilobytes) per task, enabling massive multi-task deployment without duplicating the model.
- D) — Incorrect. RAG augments a model with external knowledge retrieval, but it doesn't specialize the model's core behavior for 50 fundamentally different task types like sentiment analysis vs. code generation. It addresses knowledge, not task adaptation.
Q7. (MCQ) When using contextual prompting, you instruct the model: "Answer ONLY from the provided text. If the answer is not found, say 'Not found.'" This instruction is an example of which best practice?
A) Using clear delimiters B) Enforcing a grounding guardrail C) Managing the context window D) Hyper-personalization
Answer: B
- A) — Incorrect. Clear delimiters refer to using tags like
[CONTEXT],""", or<source>to structurally separate background data from instructions. The instruction described here sets a behavioral rule, not a structural boundary. - B) — Correct. A grounding guardrail explicitly tells the model what to do when the context doesn't contain the answer. This prevents the model from hallucinating or falling back on its pre-training knowledge.
- C) — Incorrect. Managing the context window involves keeping injected information dense and relevant to avoid the "lost in the middle" effect. This instruction doesn't address context length or relevance.
- D) — Incorrect. Hyper-personalization involves feeding user profiles or historical preferences into the context. This instruction is a constraint mechanism, not a personalization one.
Q8. (MSQ — Select ALL that apply) Which of the following are true about soft prompts in prompt tuning?
A) They can be reverse-engineered into readable human language B) They exist as continuous floating-point vectors C) They map to specific tokens in the model's vocabulary D) They require the base model's weights to remain frozen during training E) They are prepended to the input sequence
Answer: B, D, E
- A) — Incorrect. This is the "Interpretability Paradox." Soft prompts exist as numbers in N-dimensional embedding space and cannot be translated back into clear human language — they appear as gibberish if decoded.
- B) — Correct. Soft prompts are raw, continuous vectors of floating-point numbers, unlike discrete hard prompt tokens.
- C) — Incorrect. This describes hard prompts. Soft prompts explicitly do not map to real words in any human language — that's their defining characteristic.
- D) — Correct. During prompt tuning, 100% of the core LLM's parameters are frozen. Only the soft prompt vectors are updated via backpropagation.
- E) — Correct. The virtual soft prompt tokens are prepended to the beginning of the input sequence before it's fed through the model.
Q9. (MCQ) Which prompting technique is described as the "manual precursor to automation" that, when scaled with code to dynamically search a database and inject relevant data, becomes a RAG pipeline?
A) System Prompting B) Role Prompting C) Contextual Prompting D) Prompt Tuning
Answer: C
- A) — Incorrect. System prompting defines behavioral rules and personas for the model. While a RAG pipeline might use system prompts, the act of injecting retrieved documents into a prompt is contextual prompting.
- B) — Incorrect. Role prompting assigns a persona to steer tone and expertise. It has no direct relationship to database retrieval or RAG.
- C) — Correct. Contextual prompting is explicitly described as the manual precursor to RAG. When you automate the process of searching a database and injecting the results as context into the prompt, you've built a RAG pipeline.
- D) — Incorrect. Prompt tuning is a parameter-efficient fine-tuning technique involving learned vector embeddings. It operates at the model training level, not at the retrieval/injection level.
Q10. (MCQ) A prompt reads:
Explain the concept of prompt engineering. Keep the explanation short,
only a few sentences, and don't be too descriptive.
What is the primary issue with this prompt?
A) It lacks an output indicator B) It contains contradictory instructions C) It is imprecise in its constraints D) It doesn't assign a role to the model
Answer: C
- A) — Incorrect. "A few sentences" and "short" are attempts at an output indicator, however vague. The absence of a formal indicator isn't the primary issue being highlighted.
- B) — Incorrect. While "explain" and "don't be too descriptive" create tension, they aren't strictly contradictory — the issue is that the boundaries are vague, not logically opposed.
- C) — Correct. "A few sentences," "short," and "don't be too descriptive" are all imprecise. How many is "a few"? How short is "short"? What counts as "too descriptive"? A better version specifies exact sentence count and target audience (e.g., "Use 2–3 sentences to explain to a high school student").
- D) — Incorrect. Not every prompt requires a role. The absence of a role is not the problem being illustrated here.
Q11. (MSQ — Select ALL that apply) Which of the following are ideal use cases for prompt caching?
A) A one-time summarization of a short email B) A coding assistant querying the same large codebase repeatedly C) A chatbot querying the same product documentation across millions of sessions D) A creative writing tool generating unique stories with no shared context E) A multi-step AI agent reusing the same tool definitions across a long execution run
Answer: B, C, E
- A) — Incorrect. A one-time short request has no repeated prefix to cache. The overhead of caching would provide zero benefit.
- B) — Correct. A coding assistant that maintains awareness of a large, unchanging code repository while answering rapid successive questions is a textbook cache-friendly scenario.
- C) — Correct. RAG applications querying the same knowledge base across millions of sessions benefit enormously — the static document context is cached and reused.
- D) — Incorrect. If every prompt has unique context with no shared prefix, no cache hit can ever occur.
- E) — Correct. Multi-step agents that pass the same extensive tool definitions, constraints, and memories back and forth during execution are ideal caching candidates.
Q12. (MCQ) In Anthropic's explicit prompt caching implementation, what does the cache_control: {"type": "ephemeral"} attribute signal to the model?
A) That the marked content should be permanently stored in the model's long-term memory B) That the marked content is the static portion whose computed token state should be temporarily cached C) That the marked content should be excluded from the model's attention computation D) That the marked content is dynamic and should never be cached
Answer: B
- A) — Incorrect. "Ephemeral" means temporary by definition. Prompt caching stores pre-computed Key-Value states for a limited time, not permanently in any "long-term memory."
- B) — Correct. The
cache_controlattribute with"type": "ephemeral"explicitly marks where the static text ends and tells the system to temporarily cache the computed token representations for that segment, so subsequent requests can reuse them. - C) — Incorrect. The cached content is still fully processed and attended to by the model. Caching optimizes re-computation, not attention exclusion.
- D) — Incorrect. This is the exact opposite — the attribute marks content for caching, not against it. The word "ephemeral" refers to the cache's temporary lifespan, not the content's dynamism.
Q13. (MCQ) You ask a model to evaluate a business proposal first as a Venture Capitalist, then as Legal Counsel, then as a Target Consumer. This technique leverages which specific advantage of role prompting?
A) Contextualizing domain expertise B) Enforcing tone and stylistic consistency C) Facilitating perspective shifting D) Adversarial evaluation
Answer: C
- A) — Incorrect. While each role does activate domain-specific knowledge, the core technique being demonstrated here is analyzing a single problem from multiple angles by swapping roles — that's perspective shifting, not just expertise activation.
- B) — Incorrect. Tone consistency is about maintaining a uniform communication style within a single role. This example deliberately changes the style across three different roles.
- C) — Correct. Perspective shifting is the explicit practice of analyzing the same problem from multiple vantage points by swapping the assigned role. Three different roles applied to one proposal is the textbook example.
- D) — Incorrect. Adversarial evaluation (red teaming) involves assigning a hostile or skeptical role to find flaws. A Venture Capitalist or Target Consumer isn't inherently adversarial — they each bring a different evaluative lens.
Q14. (MCQ) A model dumped with 200,000 tokens of loosely related documents starts ignoring critical instructions placed in the middle of the prompt. This degradation is best described as:
A) Catastrophic forgetting B) The lost in the middle effect C) Prompt injection D) The interpretability paradox
Answer: B
- A) — Incorrect. Catastrophic forgetting is a training-time phenomenon where a model loses previously learned generic capabilities after being fine-tuned on new data. It has nothing to do with runtime prompt processing.
- B) — Correct. The "lost in the middle" effect describes how a model's attention degrades on content placed in the middle of an excessively long context window. Irrelevant information clutters the prompt and causes the model to miss vital instructions that aren't at the beginning or end.
- C) — Incorrect. Prompt injection is an adversarial attack where a user deliberately embeds instructions to override the model's system prompt. The scenario describes an architectural problem, not an attack.
- D) — Incorrect. The interpretability paradox is specific to prompt tuning — it refers to the inability to translate soft prompt vectors back into human-readable language.
Q15. (MSQ — Select ALL that apply) Which of the following require a machine learning infrastructure (training pipelines, GPUs, labeled datasets) that cannot be performed in a browser-based chat interface?
A) Hard prompt engineering B) Prompt tuning C) Full fine-tuning D) Role prompting E) Contextual prompting
Answer: B, C
- A) — Incorrect. Hard prompt engineering is manual text-based iteration that works entirely within a chat interface or API playground. Zero training infrastructure required.
- B) — Correct. Prompt tuning requires labeled training datasets, ML pipelines (PyTorch, Hugging Face PEFT), and active GPU training runs. It cannot be done in a browser chat interface.
- C) — Correct. Full fine-tuning updates billions of parameters and demands substantial GPU compute, large training datasets, and dedicated infrastructure.
- D) — Incorrect. Role prompting is a text-based technique where you simply prepend a persona instruction. No training needed.
- E) — Incorrect. Contextual prompting involves manually injecting background text into your prompt — entirely achievable in any chat interface.
Q16. (MCQ) OpenAI and DeepSeek implement prompt caching as automatic/implicit, while Anthropic and Google implement it as explicit. What is the key practical difference for a developer?
A) Automatic caching is faster; explicit caching is more cost-effective B) Automatic caching requires no code changes; explicit caching requires the developer to flag static breakpoints in the prompt C) Automatic caching works only for system prompts; explicit caching works for all prompt components D) Automatic caching stores tokens permanently; explicit caching stores them ephemerally
Answer: B
- A) — Incorrect. Speed and cost depend on implementation details, not the automatic vs. explicit distinction. Both approaches achieve similar latency and cost benefits on cache hits.
- B) — Correct. With automatic caching, the provider silently applies caching when a prompt exceeds a token threshold and matches a recent prefix — no code changes needed. With explicit caching, the developer must manually flag breakpoints using attributes like
cache_controlto tell the model where static content ends. - C) — Incorrect. Neither approach is limited to system prompts. Both can cache any prefix content — system prompts, documents, conversation history, etc.
- D) — Incorrect. Both approaches use temporary caching. The permanence of cache storage is an implementation detail, not the defining distinction between automatic and explicit modes.
Q17. (MCQ) Which statement correctly captures the relationship between prompt tuning performance and model scale?
A) Prompt tuning consistently outperforms full fine-tuning regardless of model size B) Prompt tuning only works on models smaller than 1 billion parameters C) As the base model exceeds roughly 10 billion parameters, prompt tuning matches full fine-tuning performance D) Prompt tuning performance degrades as model size increases due to the frozen parameter constraint
Answer: C
- A) — Incorrect. Prompt tuning does not outperform full fine-tuning — at large scale, it matches it. On smaller models, full fine-tuning typically still has the edge.
- B) — Incorrect. This is the opposite of reality. Prompt tuning becomes more effective as models get larger, not less.
- C) — Correct. Research demonstrates that as the underlying model surpasses approximately 10+ billion parameters, prompt tuning performs comparably to traditional full fine-tuning.
- D) — Incorrect. The frozen parameter constraint does not cause degradation. Larger models have richer internal representations, which means the soft prompt vectors have more expressive power to steer — performance improves with scale, not the reverse.
Q18. (MCQ) A role prompt instructs a model to act as a "hardboiled 1940s detective." The model then produces overly dramatic, cliché-ridden prose that undermines the task's usefulness. What is the recommended mitigation?
A) Remove the role entirely and use a generic prompt B) Switch from role prompting to prompt tuning C) Combine the role prompt with strict formatting constraints or negative constraints D) Move the role definition from the user prompt into the system prompt
Answer: C
- A) — Incorrect. Removing the role discards the benefits of persona-driven output. The goal is to temper the style, not abandon the technique.
- B) — Incorrect. Prompt tuning is a completely different paradigm requiring ML infrastructure. It's not a practical mitigation for a stylistic issue in a text prompt.
- C) — Correct. The material explicitly recommends combining role prompts with strict formatting constraints or negative constraints (e.g., "Do not use melodramatic language") to keep output grounded while preserving the role's domain benefits.
- D) — Incorrect. Moving the role to the system prompt improves persistence and authority, but it doesn't inherently solve the stylistic excess problem. The detective persona will still produce dramatic prose from either location without additional constraints.