Introduction to Probability

Multivariable Calculus

Algorithms: Part II

Algorithms: Part I

Introduction to Software Design and Architecture

Calculus Two: Sequences and Series

LAFF Linear Algebra

Stanford Machine Learning

Calculus One

Computational Thinking

Effective Thinking Through Mathematics

CS50 Introduction to Computer Science

Others

Unit 2: Derivatives of multivariable functions

Partial Derivative and Gradient

Introduction to partial derivatives

  • For a multivariable function, like f(x,y)=x2yf(x, y) = x^2 y, computing partial derivatives looks something like this:
  • \partial ∂, called “del”, is used to distinguish partial derivatives from ordinary single-variable derivatives.

Formal Definition

  • fx(x0,y0)=limh0f(x0+h,y)f(x0,y0)h\frac{\partial f}{\color{green}{ \partial x} }(x_0, y_0) = \lim_{h \to 0} \frac{ f(x_0 \color{green}{+ h}, y) - f(x_0, y_0) } { \color{green}{ h } }

Symbol Informal understanding Formal understanding
x\partial x A tiny nudge in the xx direction. A limiting variable hh which goes to 00, and will be added to the first component of the function’s input.
f\partial f The resulting change in the output of ff after the nudge. The difference between f(x0+h,y0)f(x_0 + h, y_0) and f(x0,y0)f(x_0, y_0), taken in the same limit as h0h \to 0.

Second partial derivatives

  • notation:
  • The second partial derivatives which involve multiple distinct input variables, such as fyxf_{ \color{red}{y}\color{blue}{x} } and fxyf_{ \color{blue}{x}\color{red}{y} }, are called “mixed partial derivatives”.

Symmetry of second derivatives

  • The two mixed partial derivatives are the same.
  • Schwarz’s theorem or Clairaut’s theorem, which states that symmetry of second derivatives will always hold at a point if the second partial derivatives are continuous around that point.

Higher order derivatives

  • the order of differentiation is indicated by the order of the terms in the denominator from right to left.

The gradient

  • The gradient of a function ff, denoted as f\nabla f, is the collection of all its partial derivatives into a vector.

  • The most important thing to remember about the gradient:

    • The gradient of ff, is evaluated at an input (x0,y0)(x_0, y_0), points in the direction of steepest ascent.
    • The gradient is perpendicular to contour lines.
  • Example differential operators

Directional derivatives

  • If you have some multivariable function, f(x,y)f(x, y) and some vector in the function’s input space, v\vec{\textbf{v}}, the directional derivative of ff along v\vec{\textbf{v}} on top tells you the rate at which ff will change while the input moves with velocity vector v\vec{\textbf{v}}.
  • The notation here is vf\nabla_{\vec{\textbf{v}}} f, and it is computed by taking the dot product between the gradient of ff and the vector v\vec{\textbf{v}}, that is, fv\nabla f \cdot \vec{\textbf{v}}.
  • Remember: If the directional derivative is used to compute slope, either v\vec{\textbf{v}} must be a unit vector or you must remember to divide by v\lVert \vec{\textbf{v}}\rVert at the end.
    • Because the slope of a graph in the direction of v\vec{\textbf{v}} only depends on the direction of v\vec{\textbf{v}} not the magnitude v\lVert \vec{\textbf{v}}\rVert
  • Alternate definition of directional derivative: $$\nabla_{ \vec{ \textbf{v} } } f = \lim_{h \to 0} \frac{ f(x + h \vec{ \textbf{v} }) - f(x) }{ h \color{green}{\lVert \vec{ \textbf{v} } \rVert} }$$

Why does the gradient point in the direction of steepest ascent?

  • u^f(x0,y0)=u^f(x0,y0)Maximize this quantity\nabla_{ \hat{ u} } f(x_0, y_0) = \underbrace{ \hat{ u} \cdot \nabla f(x_0, y_0) }_{ \text{Maximize this quantity} }
    • Which is the product of two vectors.
  • And Cauchy-Schwarz inequality tells us:
    • Let x,yRnx, y \in R^n, then xyxy|x y| \le \lVert x \rVert \lVert y \rVert
    • And xy=xy|x y| = \lVert x \rVert \lVert y \rVert, iff x=cy,cRx = cy, c \in \mathbb{R}.
  • So the gradient points in the direction of steepest ascent is the unit vector in the direction f(x0,y0)\nabla f(x_0, y_0).

Differentiating vector-valued functions

Derivatives of vector-valued functions

  • ddt[x(t)y(t)]=[x(t)y(t)]\frac{d}{dt}\begin{bmatrix} x(t) \\ y(t)\end{bmatrix} = \begin{bmatrix} x'(t) \\ y'(t)\end{bmatrix}

Curvature

Multivariable chain rule, simple version

Partial derivatives of parametric surfaces

Words

  • nudge [nʌdʒ] n. 推动;用肘轻推;没完没了抱怨的人 vt. 推进;用肘轻推;向…不停地唠叨 vi. 轻推;推进;唠叨
  • parametrization [pə,ræmitrai’zeiʃən, -tri’z-] n. [数] 参数化;参数化法;[计] 参量化
  • parallelogram [,pærə’leləɡræm] n. 平行四边形
  • magnitude ['mæɡnitju:d] n. 大小;量级;[地震] 震级;重要;光度