What is an n-gram?

Question

This question checks the candidate's understanding of a basic concept in NLP that is crucial for tasks like text prediction and classification.

Accepted Answer

Example Answer

The way I'd explain it in an interview is this: An n-gram is a contiguous sequence of n tokens, such as words or characters. A unigram has one token, a bigram has two, and a trigram has three.

N-grams are useful because they capture local context. For example, a bigram model distinguishes "New York" from the separate words "New" and "York." They are simple but still valuable in language modeling, search, feature engineering, and text classification.

What matters in an interview is not only knowing the definition, but being able to connect it back to how it changes modeling, evaluation, or deployment decisions in practice.

Common Poor Answer

A weak answer defines n-grams as groups of words without explaining why sequence length and local context matter.

What is an n-gram?

Example Answer

Common Poor Answer

Related Questions