Embeddings

Interactive Quiz - Only Have MCQs

Q1. (MCQ) A vector database returns semantically identical results for "Q4 revenues in the 2025 report" and "Q4 revenues in the 2023 report" because the meaning of "Q4 revenues" is the same across both years. What RAG component solves this problem?

A) Re-ranking with a Cross-Encoder B) Metadata filtering on fields like year or last_updated C) Switching from cosine similarity to Euclidean distance D) Increasing the embedding model's vector dimensionality

Answer: B

A) — Incorrect. A re-ranker evaluates semantic relevance more deeply, but if both documents are equally semantically relevant to "Q4 revenues," a re-ranker has no way to distinguish the correct year either — it doesn't understand chronology any better than the base embeddings without structured metadata.
B) — Correct. Vectors don't understand chronology well. Metadata enables hard filtering (e.g., year == 2025) that weeds out irrelevant documents before or during the vector search, ensuring the correct temporal match. This is a core reason metadata exists in RAG pipelines.
C) — Incorrect. Switching distance metrics changes how similarity is computed, not what the model understands. No distance metric can inject temporal awareness into an embedding that doesn't encode dates as distinct semantic features.
D) — Incorrect. Higher dimensionality captures richer semantic nuance but doesn't inherently encode structured knowledge like publication year. "Q4 revenues" in 2023 and 2025 would still produce nearly identical embeddings regardless of dimension count.

Q2. (MSQ — Select ALL that apply) Which of the following are primary reasons LLMs need Retrieval-Augmented Generation?

A) LLMs have a static knowledge cutoff and cannot access information beyond their training data B) RAG eliminates the need for embedding models entirely C) LLMs tend to hallucinate when they lack knowledge, and RAG grounds responses in retrieved facts D) RAG enables the system to provide verifiable sources and citations

Answer: A, C, D

A) — Correct. Training an LLM takes months and millions of dollars. By the time a model is released, its knowledge is already outdated. RAG allows the model to access live databases or the internet without full retraining.
B) — Incorrect. RAG depends on embedding models. The retrieval step uses embeddings to convert queries and documents into vectors for semantic search. RAG doesn't eliminate embeddings — it's built on top of them.
C) — Correct. When a standard LLM doesn't know an answer, it tends to confidently hallucinate. RAG forces the model to ground its response in actual retrieved facts, significantly increasing accuracy.
D) — Correct. Because RAG actively pulls specific documents or web pages, it can "show its work" by providing direct links and citations so users can verify the information.

Q3. (MCQ) Cosine similarity between two embedding vectors returns a score of 0.95. If the same two vectors are normalized to unit length, what happens when you compute their dot product instead?

A) The dot product will be significantly lower than 0.95 because it accounts for magnitude B) The dot product will be mathematically equivalent to 0.95 because normalized vectors make dot product and cosine similarity identical C) The dot product cannot be computed on normalized vectors D) The dot product will always return exactly 1.0 for any pair of normalized vectors

Answer: B

A) — Incorrect. This would be true for unnormalized vectors, where dot product considers magnitude and could differ from cosine similarity. But the question specifies the vectors are already normalized.
B) — Correct. If embeddings are normalized to unit length, the dot product is mathematically equivalent to cosine similarity but is computationally cheaper and faster to process. Since many modern embedding APIs output normalized vectors, this equivalence is commonly exploited in production.
C) — Incorrect. Dot product can absolutely be computed on normalized vectors — the operation (multiply corresponding elements and sum) works on any vectors regardless of their magnitude.
D) — Incorrect. A dot product of 1.0 on unit vectors would mean the vectors are identical. Two different vectors normalized to unit length will produce a dot product equal to the cosine of the angle between them — which is 0.95 in this case, not 1.0.

Q4. (MCQ) A production system stores 1 million embeddings of 1,536 dimensions each in Float32 format, consuming approximately 6 GB of RAM. The team applies binary quantization. What is the approximate resulting memory footprint?

A) 1.5 GB (4x reduction) B) 750 MB (8x reduction) C) ~187 MB (32x reduction) D) ~94 MB (64x reduction)

Answer: C

A) — Incorrect. A 4x reduction would correspond to scalar quantization (Float32 → Int8), not binary quantization.
B) — Incorrect. An 8x reduction doesn't correspond to any standard quantization technique described in the material.
C) — Correct. Binary quantization converts each dimension to a single bit (0 or 1). A 1,536-dimensional Float32 vector requires ~6 KB; as binary, it requires only ~192 bytes — a 32x memory reduction. Applied to 1 million vectors: 6 GB ÷ 32 ≈ 187.5 MB. Binary quantization also boosts search speeds by up to 40x.
D) — Incorrect. A 64x reduction corresponds to Product Quantization (PQ), which splits vectors into sub-vectors and represents them with centroids — a different technique from binary quantization.

Q5. (MCQ) In the embedding workflow, after applying binary quantization for fast initial search, the system retrieves 8 candidates when the user requested 4. It then recalculates exact scores using the original uncompressed vectors. This two-step refinement process is called:

A) Approximate Nearest Neighbor search followed by re-ranking B) Oversampling followed by rescoring and reranking C) Pre-filtering followed by post-filtering D) Batch processing followed by scalar quantization

Answer: B

A) — Incorrect. While ANN is used during the initial search, "re-ranking" in the RAG context typically refers to a Cross-Encoder model evaluating query-document relevance — not the vector-level rescoring described here. The process described is a quantization refinement step, not a semantic re-ranking.
B) — Correct. Oversampling retrieves a larger pool of candidates than requested (e.g., 8 results when the user asked for 4). Rescoring and reranking then looks up the original, uncompressed vectors for that small candidate pool and recalculates exact similarity scores to produce the final, highly accurate ranking. This compensates for the precision lost during quantization.
C) — Incorrect. Pre-filtering and post-filtering refer to metadata-based narrowing of search scope, not quantization refinement. They deal with which documents to consider, not how precisely to score them.
D) — Incorrect. Batch processing is about sending multiple texts to an embedding model simultaneously, and scalar quantization is a compression technique. Neither describes the refinement workflow.

Q6. (MSQ — Select ALL that apply) Which of the following correctly describe the role of a Re-ranker in a RAG pipeline?

A) It replaces the initial vector search entirely with a more accurate Cross-Encoder B) It uses a Cross-Encoder that analyzes the query and document together, catching deep contextual nuances C) It fixes the "lost in the middle" problem by placing the most relevant results at the top D) It reduces noise and token costs by allowing confident trimming from many candidates to a few

Answer: B, C, D

A) — Incorrect. A re-ranker does not replace the initial vector search. It acts as a second stage that operates on the small subset returned by Stage 1. Running a Cross-Encoder across an entire database would be computationally prohibitive due to latency. The two-stage architecture is essential.
B) — Correct. Vector databases use Bi-Encoders where queries and documents are embedded separately, missing fine-grained details. Re-rankers use Cross-Encoders that analyze the query and document together, catching deep contextual nuances that separate embeddings miss.
C) — Correct. LLMs pay heavy attention to the beginning and end of their context, often ignoring information in the middle. Re-ranking ensures the most relevant results are placed at the very top, right where the LLM is paying attention.
D) — Correct. Instead of feeding 20 messy documents to an LLM (wasting tokens and confusing the model), a re-ranker allows confident trimming down to the top 3–5 highly precise chunks.

Q7. (MCQ) A developer embeds an entire 50-page legal contract as a single vector. A user asks about a specific clause on page 42. The system returns irrelevant results. What is the root cause?

A) The embedding model has insufficient dimensionality B) The vector database is using the wrong distance metric C) Embedding the entire document into one vector flattens out nuance — the specific clause's meaning gets averaged out and lost D) The contract exceeds the model's context window, causing truncation at page 10

Answer: C

A) — Incorrect. Even with extremely high-dimensional vectors, a single embedding of an entire 50-page document will represent the general topic of the contract, not any specific clause. More dimensions capture richer nuance per concept, but they can't preserve the granularity of 50 pages in one vector.
B) — Incorrect. Switching distance metrics (cosine vs. Euclidean) wouldn't solve this. The problem is that the embedding itself doesn't contain clause-specific information, not that similarity is measured incorrectly.
C) — Correct. Embedding an entire document into a single vector flattens all the nuance — specific details get averaged out and lost. Chunking solves this by cutting text into digestible blocks so each individual block becomes its own highly specific vector embedding. The user's query about page 42 would then match the specific chunk containing that clause.
D) — Incorrect. While context window limitations are real, the material's primary point is about semantic dilution, not truncation. Even if the model could process all 50 pages, the resulting single vector would still average out the specific clause's meaning.

Q8. (MCQ) An embedding model outputs a 1,536-dimensional vector for a single text chunk. In tensor terminology, this vector is:

A) A Rank 0 Tensor (Scalar) B) A Rank 1 Tensor (Vector) C) A Rank 2 Tensor (Matrix) D) A Rank 3 Tensor (Cube)

Answer: B

A) — Incorrect. A Rank 0 Tensor is a single number (scalar), like the value 5. A 1,536-dimensional embedding is a list of 1,536 numbers, not a single number.
B) — Correct. A Rank 1 Tensor is a list of numbers — a vector. A single 1,536-dimensional embedding is technically a 1D Tensor. When you send a batch of 32 sentences and each yields a 1,536-dimensional vector, the result is a 2D Tensor with shape (32, 1536).
C) — Incorrect. A Rank 2 Tensor (matrix) would be a grid with rows and columns — like a batch of embeddings. A single embedding vector has only one axis (its dimensions), not two.
D) — Incorrect. A Rank 3 Tensor has three dimensions (like a color image: height × width × channels). A single embedding vector has only one dimension.

Q9. (MCQ) HNSW (Hierarchical Navigable Small World) is described as the most popular vector search algorithm. How does it organize and search vectors?

A) It hashes vectors into discrete buckets using locality-sensitive hash functions B) It organizes vectors into a multi-layered graph where search starts at the top with long "highway" links and progressively navigates to denser lower layers C) It groups vectors into clusters around centroids and searches only the nearest cluster D) It recursively splits vectors into branches like a flowchart decision tree

Answer: B

A) — Incorrect. This describes Locality-Sensitive Hashing (LSH), where similar vectors are hashed into the same buckets. HNSW uses a graph structure, not hash functions.
B) — Correct. HNSW organizes vectors into a multi-layered graph. The search starts at the top layer using long "highway" links for a fast broad overview, then progressively drops to lower, denser layers to finely navigate to the closest matches. This multi-scale navigation makes HNSW both fast and accurate.
C) — Incorrect. This describes Inverted File (IVF) indexing, which uses k-means clustering. HNSW doesn't use centroids or clusters — it uses navigable graph links.
D) — Incorrect. This describes tree-based algorithms like k-d trees or ANNOY. These struggle to scale in high-dimensional spaces due to the "curse of dimensionality," which is one reason HNSW is preferred.

Q10. (MCQ) A SaaS company building an AI legal assistant for 500 law firms uses a single vector database. They store all chunks in one global HNSW index and assign a tenant_id in each chunk's metadata. During retrieval, the system performs a vector search across the entire index, then filters out non-matching tenant IDs from the results. This approach has a critical vulnerability. What is it?

A) The HNSW graph cannot store metadata alongside vectors B) Post-filtering may return empty or irrelevant results if the correct documents weren't in the top-N of the global search C) Metadata filters are computationally more expensive than the vector search itself D) Tenant IDs in metadata are visible to all users by default

Answer: B

A) — Incorrect. Modern vector databases store metadata alongside vectors as a standard feature. This is not a limitation of HNSW.
B) — Correct. The described pattern is post-filtering: search the whole database first, then discard other tenants' results. If the vector search's top-N didn't include the correct documents for the target tenant (because they were outranked by similar documents from other tenants), post-filtering leaves you with zero or irrelevant results. The material explicitly warns: "Ensure your database natively supports Pre-Filtering or Single-Stage Filtering to lock down the search path during graph traversal."
C) — Incorrect. Metadata filtering is typically lightweight compared to vector similarity computation. The problem isn't computational cost — it's the ordering of operations (filtering after search vs. during search).
D) — Incorrect. Metadata values aren't exposed to end users through the API by default. The security risk is at the vector search level (accidentally traversing other tenants' nodes in the graph), not metadata visibility.

Q11. (MSQ — Select ALL that apply) Fixed-size chunking with no overlap can produce which of the following problems?

A) Chunks may cut off in the middle of a critical sentence, destroying its meaning B) The embedding model will refuse to process chunks below a minimum size C) Adjacent chunks lose contextual continuity at their boundaries D) The resulting embeddings will have inconsistent dimensionality

Answer: A, C

A) — Correct. Fixed-size chunking completely ignores human grammar, so a chunk might cut off right in the middle of a critical sentence, destroying the meaning.
B) — Incorrect. Embedding models don't refuse short inputs. They'll embed whatever text they receive, even if it's a sentence fragment. The issue is semantic quality, not model rejection.
C) — Correct. Without overlap, the end of one chunk and the beginning of the next share no content. If a critical concept spans the boundary, neither chunk captures the full meaning. This is why developers use sliding windows with overlap (e.g., 200-token chunks with 50-token overlap) to keep sentences intact across boundaries.
D) — Incorrect. Embedding dimensionality is determined by the model architecture, not chunk size. All chunks produce vectors of the same dimension regardless of their text length.

Q12. (MCQ) In the three-step RAG process (Retrieval → Augmentation → Generation), what happens during the "Augmentation" step?

A) The LLM is fine-tuned on the retrieved documents before generating a response B) The original user query is appended with the retrieved information in the background before being sent to the LLM C) The retrieved documents are re-embedded with a higher-dimensional model for better accuracy D) The user is shown the retrieved documents and asked to select the relevant ones

Answer: B

A) — Incorrect. RAG explicitly avoids fine-tuning. The entire point is that the model's internal knowledge can be updated efficiently without retraining. Augmentation is a prompt-level operation, not a weight-level one.
B) — Correct. The system takes the original question and appends the freshly retrieved information to it in the background. The LLM then reads this augmented package (your prompt + the newly found facts) and synthesizes a natural, coherent, and highly accurate answer. This is the "open-book exam" analogy.
C) — Incorrect. Documents are not re-embedded during augmentation. Embedding happens once during ingestion and indexing. Augmentation is about composing the prompt, not reprocessing vectors.
D) — Incorrect. The augmentation happens transparently in the background — the user never sees or manually selects retrieved documents. The system automates the entire retrieval-to-prompt pipeline.

Q13. (MCQ) Euclidean distance is described as having a specific weakness compared to cosine similarity in high-dimensional embedding spaces. What is this weakness?

A) Euclidean distance cannot be computed on floating-point vectors B) Euclidean distance is sensitive to vector magnitude and becomes less reliable in very high dimensions due to the "curse of dimensionality" C) Euclidean distance always returns negative values for dissimilar vectors D) Euclidean distance is computationally more expensive than dot product by several orders of magnitude

Answer: B

A) — Incorrect. Euclidean distance works perfectly on floating-point vectors. It's a standard mathematical operation (square root of sum of squared differences) applicable to any numerical vectors.
B) — Correct. Unlike cosine similarity, which ignores magnitude and focuses only on direction, Euclidean distance is highly sensitive to vector magnitude. Additionally, in very high-dimensional spaces, the "curse of dimensionality" causes vectors to converge in distance, making Euclidean distance less reliable for distinguishing similar from dissimilar items. Cosine similarity avoids this by measuring only the angle.
C) — Incorrect. Euclidean distance is always non-negative (≥ 0), since it measures a physical straight-line distance. Lower values mean higher similarity.
D) — Incorrect. While Euclidean distance involves a square root computation that dot product doesn't, the difference is not "several orders of magnitude." Both are feasible at scale. The weakness is about reliability in high dimensions, not computational cost.

Q14. (MCQ) A developer uses semantic chunking on a technical manual. Instead of splitting by character count or punctuation, the system reads sentences sequentially and creates a new chunk only when the meaning shifts significantly between consecutive sentences. What determines where these chunk boundaries are drawn?

A) The number of tokens in each sentence B) The embedding distance between consecutive sentences, with boundaries at significant semantic shifts C) Predefined heading-level markers in the document's HTML structure D) A fixed overlap window that slides across the text

Answer: B

A) — Incorrect. Token count is the basis of fixed-size chunking, the simplest and least intelligent method. Semantic chunking explicitly ignores character/token counts.
B) — Correct. Semantic chunking uses an embedding model to read text line by line, calculates the semantic distance between consecutive sentences, and draws a boundary (creates a new chunk) only when the meaning or topic shifts significantly. The boundaries are determined by semantic similarity, not structural markers.
C) — Incorrect. This describes markdown/recursive chunking, which splits by structural boundaries (paragraphs, headings). Semantic chunking uses meaning-based boundaries, not document structure.
D) — Incorrect. A sliding overlap window is a feature of fixed-size chunking to prevent context loss at boundaries. Semantic chunking doesn't use fixed windows — its chunk sizes are variable, determined by where topics naturally shift.

Q15. (MSQ — Select ALL that apply) Which of the following are valid architectural categories of vector databases?

A) Native vector databases built from the ground up for vector workloads B) Extended traditional databases (SQL/NoSQL) with added vector search capabilities C) Embedded databases that run inside the application process without a separate server D) Federated databases that distribute vectors across blockchain nodes

Answer: A, B, C

A) — Correct. Native vector databases (like Pinecone, Qdrant, Milvus, Weaviate) are built from the ground up specifically to manage, search, and scale vector data.
B) — Correct. Extended databases are traditional databases that have added vector search. Examples include pgvector (PostgreSQL), MongoDB, Cassandra, and Redis with vector index support. They allow storing embeddings alongside regular application data.
C) — Correct. Embedded databases (like Chroma and LanceDB) run directly inside the application's process without requiring a separate server, ideal for local development, edge computing, and rapid prototyping.
D) — Incorrect. Blockchain-based federated vector databases are not mentioned as a category. The three categories are native, extended, and embedded.

Q16. (MCQ) In a production RAG pipeline, the "Metadata Enrichment" trick involves appending critical metadata directly into the text string before generating the embedding vector. For example: "Document: IT Manual | Section: Router Reset | Text: To reset the corporate router...". Why is this done?

A) To increase the token count of the chunk so it exceeds the embedding model's minimum threshold B) To ensure the embedding model bakes the document context directly into the mathematical vector, improving retrieval relevance C) To replace the need for a separate metadata dictionary in the vector database D) To compress the metadata into fewer dimensions during quantization

Answer: B

A) — Incorrect. Embedding models don't have minimum token thresholds that need to be exceeded. The trick is about enriching semantic content, not meeting size requirements.
B) — Correct. By prepending context like document name and section header directly into the text before embedding, the embedding model explicitly bakes the document context right into the mathematical vector. This means when someone searches for "router reset IT manual," the vector itself captures that contextual association, improving retrieval accuracy.
C) — Incorrect. The metadata dictionary is still stored separately for filtering purposes. Enriching the text before embedding complements — not replaces — structured metadata. You still need filterable fields like last_updated and department.
D) — Incorrect. Quantization is a post-embedding compression step that operates on the vector values. Text prepended before embedding doesn't affect quantization behavior.

Q17. (MCQ) A Bi-Encoder (used in vector databases) and a Cross-Encoder (used in re-rankers) process queries and documents differently. What is the architectural distinction?

A) Bi-Encoders are larger models while Cross-Encoders are smaller and faster B) Bi-Encoders embed queries and documents separately, while Cross-Encoders process the query and document together as a joint input C) Bi-Encoders work only on text while Cross-Encoders work on multimodal data D) Bi-Encoders produce continuous vectors while Cross-Encoders produce binary classifications

Answer: B

A) — Incorrect. The opposite is true regarding speed. Cross-Encoders are computationally heavier and slower because they process query-document pairs together. Bi-Encoders are faster because they can pre-compute document embeddings independently.
B) — Correct. Bi-Encoders embed queries and documents separately into independent vectors, then compare them using distance metrics. This is fast but misses fine-grained query-document interactions. Cross-Encoders analyze the query and document together as a single concatenated input, enabling deep contextual understanding of how specifically the document answers the query.
C) — Incorrect. Both encoder types can theoretically process various modalities. The distinction is architectural (separate vs. joint encoding), not modality-based.
D) — Incorrect. Cross-Encoders typically produce a relevance score (a continuous value), not just binary classifications. The output is a ranking score that enables re-ordering results by relevance.

Q18. (MCQ) An Auto-Retrieval / Self-Querying system receives the user input: "Show me the security protocols updated after February 2026." An LLM parses this into a structured query payload with both a semantic query and a metadata filter. What advantage does this pattern offer over purely manual filter construction?

A) It eliminates the need for a vector database entirely B) It allows non-technical users to leverage precise metadata filtering through natural language without filling out complex search forms C) It guarantees the LLM will never misinterpret the user's filter criteria D) It replaces the embedding-based search with keyword-only search

Answer: B

A) — Incorrect. The structured query is sent to the vector database — it's a query construction layer, not a replacement for the database itself.
B) — Correct. Auto-Retrieval places an LLM in front of the vector database to parse natural human speech into structured query payloads. This ensures flawless precision without forcing users to fill out complex search forms or understand metadata schemas. The user speaks naturally; the LLM handles the translation.
C) — Incorrect. LLMs can absolutely misinterpret filter criteria — natural language is inherently ambiguous. The pattern improves usability, not guarantees correctness. Edge cases and ambiguous queries may still produce incorrect filters.
D) — Incorrect. The parsed payload includes both a semantic query vector ("security protocols") and a metadata filter (last_updated > 2026-02-01). Embedding-based search is preserved, not replaced.

Q19. (MCQ) Matryoshka Representation Learning (MRL) is mentioned as an optimization for embedding models. What does it allow?

A) Training multiple separate embedding models of decreasing size B) Truncating embedding vectors to much smaller dimensions while barely losing search accuracy C) Compressing vectors into binary format without any accuracy loss D) Automatically selecting the best distance metric for a given dataset

Answer: B

A) — Incorrect. MRL doesn't train separate models. It's a technique applied to a single model that produces vectors structured so that their leading dimensions capture the most important information.
B) — Correct. MRL allows you to truncate vectors to much smaller sizes (like 256 dimensions from 3072) to drastically reduce storage costs while barely losing any search accuracy. This is listed as a key optimization under vector dimensionality and storage selection criteria.
C) — Incorrect. Binary quantization (not MRL) compresses to 0s and 1s, and it does involve some accuracy trade-off. MRL is about dimensional truncation, not binary compression.
D) — Incorrect. MRL has nothing to do with distance metric selection. It's about creating vectors where meaningful information is concentrated in the leading dimensions.

Q20. (MCQ) A company needs multi-tenant isolation in their RAG pipeline for government clients with strict SLAs. Which isolation pattern is most appropriate?

A) Metadata filter-based isolation using tenant_id in a shared index B) Namespace/partition isolation within a single database instance C) Database-level separation with a dedicated database instance per tenant D) Post-filtering after a global vector search

Answer: C

A) — Incorrect. Metadata filtering provides only logical separation in a shared index. If a code bug occurs, cross-tenant data leakage is possible. This is classified as "Medium" security, best suited for B2C apps or large pools of small users — not government clients with strict SLAs.
B) — Incorrect. Namespace isolation is stronger (virtual separation at the storage layer) and suited for standard B2B SaaS products. However, it still shares underlying infrastructure, which may not satisfy government-level compliance or eliminate the "noisy neighbor" problem.
C) — Correct. Database-level separation provides complete physical isolation with zero chance of data bleed. It also solves the "noisy neighbor" problem where one tenant's heavy API usage slows down the system for others. Despite being expensive to scale, it's the recommended pattern for enterprise/government clients with strict SLAs and compliance requirements.
D) — Incorrect. Post-filtering is explicitly warned against as the worst pattern — it can return empty or irrelevant results and offers no security guarantees. It's the opposite of what government clients require.

Q21. (MSQ — Select ALL that apply) Dense vectors and sparse vectors serve different purposes in a vector database. Which of the following correctly distinguish them?

A) Dense vectors capture abstract semantic meaning and intent, finding relevant results even without exact keyword matches B) Sparse vectors represent traditional keyword-based search with most dimensions being zero C) Dense vectors are always smaller in dimensionality than sparse vectors D) Combining both in a hybrid search approach maximizes retrieval accuracy

Answer: A, B, D

A) — Correct. Dense vectors are generated by embedding models and excel at capturing abstract semantic meaning and intent. They find relevant results even when exact keywords aren't used, because they operate in learned semantic space.
B) — Correct. Sparse vectors represent keyword-based techniques (like term-frequency algorithms). They may have tens of thousands of dimensions representing an entire vocabulary, but only a tiny fraction contain non-zero values representing the specific words present in a document.
C) — Incorrect. The opposite is typically true. Dense vectors commonly have hundreds to a few thousand dimensions (e.g., 1,536), while sparse vectors can have tens of thousands of dimensions (representing the full vocabulary). "Dense" refers to most dimensions being non-zero, not to having fewer dimensions.
D) — Correct. Modern search systems often utilize a combination of both dense and sparse vectors to maximize retrieval accuracy — an approach known as hybrid search. Dense captures semantics; sparse handles exact keyword matching.

Q22. (MCQ) A PyTorch developer sends a batch of 32 sentences to an embedding model. Each sentence yields a 1,536-dimensional vector. The returned object has .shape of (32, 1536). A subsequent matrix multiplication expects input shape (1536, 32). The operation crashes. In tensor terminology, what is this common bug called?

A) A quantization error B) A shape mismatch C) The curse of dimensionality D) A residual vector error

Answer: B

A) — Incorrect. Quantization errors arise from compressing vector precision (Float32 → Int8). The crash here is about incompatible tensor dimensions, not precision loss.
B) — Correct. Shape mismatches — where tensor dimensions don't align with the expected mathematical operations — are described as a very common source of bugs in PyTorch, TensorFlow, and NumPy. Checking .shape (and sometimes .ndim) is one of the most important debugging techniques. The developer needs to transpose (32, 1536) to (1536, 32) before the multiplication.
C) — Incorrect. The curse of dimensionality refers to distance metrics becoming less reliable in very high-dimensional spaces. It's a statistical phenomenon, not a runtime error from mismatched tensor shapes.
D) — Incorrect. Residual vectors are the mathematical difference between a data vector and its cluster centroid in IVF indexing. They have nothing to do with tensor shape errors during batch processing.