Instruction: Explain how you would divide searchable content from structured metadata constraints.
Context: Checks whether the candidate can explain the core concept clearly and connect it to real production decisions. Explain how you would divide searchable content from structured metadata constraints.
The way I'd think about it is this: I put information in the index when I want semantic retrieval to reason over it as content. I keep information in metadata filters when it represents a hard constraint or structured dimension like tenant, language, region, document type, effective date, or publication state.
The mistake is trying to make metadata do semantic work or making embeddings carry routing and permission logic. If the question is what a policy means, that text belongs in the index. If the question is which policies this user is allowed to see, that belongs in filters.
I also think about update patterns. Metadata often changes more frequently and should be adjustable without re-embedding. The indexed text should be the part whose meaning needs vector or lexical retrieval. A good rule is that metadata narrows the search space, while the index determines relevance inside the allowed space.
A weak answer is, "Put everything in embeddings and let the retriever figure it out." That collapses semantic relevance, routing, and access control into one layer and makes the system harder to reason about.
easy
easy
easy
easy
easy
easy