Instruction: Compare and contrast these two variations of gradient descent, including their advantages and disadvantages.
Context: This question aims to assess the candidate's understanding of nuanced optimization algorithms and their practical implications in training machine learning models.
Official answer available
Preview the opening of the answer, then unlock the full walkthrough.
The way I'd explain it in an interview is this: Stochastic gradient descent updates the model using one example at a time, which makes it noisy but very responsive. Mini-batch gradient descent uses a small batch of examples per update, which...