In Week 1#Gradient Descent, we learned the Batch Gradient Descent(Use all of the training examples at a time). It costs lots of time to compute the derivative part($\frac{d}{d\theta}J(\theta)$). Because every time we sum all of the differences of the samples.
Stochastic Gradient Descent define the cost function slightly differently, as $$\text{cost}(\theta, (x^{(i)}, y^{(i)})) = \frac{1}{2}(h_{\theta}(x^{(i)}) - y^{{(i)})}2$$, The overall cost function is $$J_{\text{train} } = \frac{1}{m} \sum_{i=1}^m \text{cost}(\theta, (x^{(i)}, y^{(i)}))$$, which is equivalent to the batch gradient descent.
The steps are:
In Batch Gradient Descent, the derivative term is $\frac{1}{m} \sum\limits_{i=1}^{m}(h_\theta(x_{i}) - y_{i})x_j^{(i)}$, we sum all the differences. But in Stochastic Gradient Descent, we calculate it one by one in m loops: ($(h_{\theta}(x^{(i)}) - y^{(i)})x_j^{(i)}$).
Comparison with Batch Gradient Descent
$\text{cost}(\theta, (x^{(i)}, y^{(i)})) = \frac{1}{2}(h_{\theta}(x^{(i)}) - y^{(i)})^2$