Instruction: Define an n-gram and provide examples of its use in NLP.
Context: This question checks the candidate's understanding of a basic concept in NLP that is crucial for tasks like text prediction and classification.
The way I'd explain it in an interview is this: An n-gram is a contiguous sequence of n tokens, such as words or characters. A unigram has one token, a bigram has two, and a trigram has three.
N-grams are useful because they capture local context. For example, a bigram model distinguishes "New York" from the separate words "New" and "York." They are simple but still valuable in language modeling, search, feature engineering, and text classification.
What matters in an interview is not only knowing the definition, but being able to connect it back to how it changes modeling, evaluation, or deployment decisions in practice.
A weak answer defines n-grams as groups of words without explaining why sequence length and local context matter.