, and this means:
This kind of function is a Linear Regression with one variable,
A cost function lets us figure out how to fit the best straight line to our data
Choosing values for (parameters)
ys as possible
To formalize this:
The following image summarizes what the cost function does:
Hypothesis - is like your prediction machine, throw in an
x value, get a putative
Cost - is a way to, using your training data, determine values for your values which make the hypothesis as accurate as possible
What does this all mean?
Derivative term: , will derive it later
How this gradient descent algrorithm is impliemented
Understanding the algorithm
back to simpler function where we minimize one parameter
Two key terms in the algorithm
Alpha term (α)
How does gradient descent converge with a fixed step size α?
The intuition behind the convergence is that , approaches 0 as we approach the bottom of our convex function. At the minimum, the derivative will always be 0 and thus we get:
Apply gradient descent to minimize the squared error cost function
Now we have a partial derivative:
So we need to determine the derivative for each parameter:
How does it work
Matrix: Rectangular array of numbers:
Dimension of matrix: number of rows x number of columns
= “i,j entry” in the row, column.
Vector: An n x 1 matrix.
Uppercase letter for Matrix, and lowwercase for Vector.