What makes indirect prompt injection hard to catch?

Question

Checks whether the candidate can explain the core concept clearly and connect it to real production decisions. Explain why indirect prompt injection is a difficult class of failure.

Accepted Answer

Example Answer

The way I'd think about it is this: Indirect prompt injection is hard because the dangerous instruction is not coming from the user in a clean obvious way. It is embedded in documents, web pages, files, or other retrieved content that the system may treat as useful context.

That creates ambiguity about what is data and what is control. The content may look perfectly relevant to the task while still trying to steer the system toward an unsafe tool call or policy violation.

It is also hard because the malicious effect may emerge only after the content is combined with memory, tools, or another step in the workflow. The injection is often compositional, not isolated.

Common Poor Answer

A weak answer is saying indirect prompt injection is just harder because it is more subtle. The core issue is that the attack hides inside seemingly relevant context.

What makes indirect prompt injection hard to catch?

Example Answer

Common Poor Answer

Related Questions