Discuss the differences between stochastic gradient descent (SGD) and mini-batch gradient descent.

Instruction: Compare and contrast these two variations of gradient descent, including their advantages and disadvantages.

Context: This question aims to assess the candidate's understanding of nuanced optimization algorithms and their practical implications in training machine learning models.

Official answer available

Preview the opening of the answer, then unlock the full walkthrough.

The way I'd explain it in an interview is this: Stochastic gradient descent updates the model using one example at a time, which makes it noisy but very responsive. Mini-batch gradient descent uses a small batch of examples per update, which...

Upgrade to view official answer

Discuss the differences between stochastic gradient descent (SGD) and mini-batch gradient descent.

Related Questions