A RAG system for analysts becomes unusable when documents contain charts, tables, and scanned PDFs. What would you change first?

Instruction: Explain how you would handle multimodal and low-quality source material in RAG.

Context: Tests how the candidate diagnoses the problem, chooses the safest next step, and reasons through recovery. Explain how you would handle multimodal and low-quality source material in RAG.

Official answer available

Preview the opening of the answer, then unlock the full walkthrough.

I would start by fixing document representation, not by demanding a smarter generator. Charts, tables, and scanned PDFs are exactly where many RAG pipelines collapse because the retrieval system never gets a usable representation of the evidence.

For scanned PDFs, I would evaluate OCR quality and page-structure extraction...

Related Questions