What should live in the index and what should stay in metadata filters?

Instruction: Explain how you would divide searchable content from structured metadata constraints.

Context: Checks whether the candidate can explain the core concept clearly and connect it to real production decisions. Explain how you would divide searchable content from structured metadata constraints.

Example Answer

The way I'd think about it is this: I put information in the index when I want semantic retrieval to reason over it as content. I keep information in metadata filters when it represents a hard constraint or structured dimension like tenant, language, region, document type, effective date, or publication state.

The mistake is trying to make metadata do semantic work or making embeddings carry routing and permission logic. If the question is what a policy means, that text belongs in the index. If the question is which policies this user is allowed to see, that belongs in filters.

I also think about update patterns. Metadata often changes more frequently and should be adjustable without re-embedding. The indexed text should be the part whose meaning needs vector or lexical retrieval. A good rule is that metadata narrows the search space, while the index determines relevance inside the allowed space.

Common Poor Answer

A weak answer is, "Put everything in embeddings and let the retriever figure it out." That collapses semantic relevance, routing, and access control into one layer and makes the system harder to reason about.

Related Questions