In mathematics, the Hessian matrix or Hessian is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. Hessian Matrices are often used in optimization problems within Newton-Raphson's method.
H f = [ ∂ 2 f ∂ x 2 ∂ 2 f ∂ x ∂ y ∂ 2 f ∂ x ∂ z ⋯ ∂ 2 f ∂ y ∂ x ∂ 2 f ∂ y 2 ∂ 2 f ∂ y ∂ z ⋯ ∂ 2 f ∂ z ∂ x ∂ 2 f ∂ z ∂ y ∂ 2 f ∂ z 2 ⋯ ⋮ ⋮ ⋮ ⋱ ] \mathbf{H} f=\left[ \begin{array}{cccc}{\frac{\partial^{2} f}{\partial x^{2}}} & {\frac{\partial^{2} f}{\partial x \partial y}} & {\frac{\partial^{2} f}{\partial x \partial z}} & {\cdots} \\ {\frac{\partial^{2} f}{\partial y \partial x}} & {\frac{\partial^{2} f}{\partial y^{2}}} & {\frac{\partial^{2} f}{\partial y \partial z}} & {\cdots} \\ {\frac{\partial^{2} f}{\partial z \partial x}} & {\frac{\partial^{2} f}{\partial z \partial y}} & {\frac{\partial^{2} f}{\partial z^{2}}} & {\cdots} \\ {\vdots} & {\vdots} & {\vdots} & {\ddots}\end{array}\right]
H f = ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎡ ∂ x 2 ∂ 2 f ∂ y ∂ x ∂ 2 f ∂ z ∂ x ∂ 2 f ⋮ ∂ x ∂ y ∂ 2 f ∂ y 2 ∂ 2 f ∂ z ∂ y ∂ 2 f ⋮ ∂ x ∂ z ∂ 2 f ∂ y ∂ z ∂ 2 f ∂ z 2 ∂ 2 f ⋮ ⋯ ⋯ ⋯ ⋱ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎤
Example 1: Computing a Hessian
Problem : Compute the Hessian of f ( x , y ) = x 3 − 2 x y − y 6 f(x, y)=x^{3}-2 x y-y^{6} f ( x , y ) = x 3 − 2 x y − y 6 .
Solution :
First compute both partial derivatives:
f x ( x , y ) = ∂ ∂ x ( x 3 − 2 x y − y 6 ) = 3 x 2 − 2 y f_{x}(x, y)=\frac{\partial}{\partial x}\left(x^{3}-2 x y-y^{6}\right)=3 x^{2}-2 y
f x ( x , y ) = ∂ x ∂ ( x 3 − 2 x y − y 6 ) = 3 x 2 − 2 y
f y ( x , y ) = ∂ ∂ y ( x 3 − 2 x y − y 6 ) = − 2 x − 6 y 5 f_{y}(x, y)=\frac{\partial}{\partial y}\left(x^{3}-2 x y-y^{6}\right)=-2 x-6 y^{5}
f y ( x , y ) = ∂ y ∂ ( x 3 − 2 x y − y 6 ) = − 2 x − 6 y 5
With these, we compute all four second partial derivatives:
f x x ( x , y ) = ∂ ∂ x ( 3 x 2 − 2 y ) = 6 x f_{x x}(x, y)=\frac{\partial}{\partial x}\left(3 x^{2}-2 y\right)=6 x
f x x ( x , y ) = ∂ x ∂ ( 3 x 2 − 2 y ) = 6 x
f x y ( x , y ) = ∂ ∂ y ( 3 x 2 − 2 y ) = − 2 {f_{x y}(x, y)=\frac{\partial}{\partial y}\left(3 x^{2}-2 y\right)=-2}
f x y ( x , y ) = ∂ y ∂ ( 3 x 2 − 2 y ) = − 2
f y x ( x , y ) = ∂ ∂ x ( − 2 x − 6 y 5 ) = − 2 {f_{y x}(x, y)=\frac{\partial}{\partial x}\left(-2 x-6 y^{5}\right)=-2}
f y x ( x , y ) = ∂ x ∂ ( − 2 x − 6 y 5 ) = − 2
f y y ( x , y ) = ∂ ∂ y ( − 2 x − 6 y 5 ) = − 30 y 4 f_{y y}(x, y)=\frac{\partial}{\partial y}\left(-2 x-6 y^{5}\right)=-30 y^{4}
f y y ( x , y ) = ∂ y ∂ ( − 2 x − 6 y 5 ) = − 3 0 y 4
The Hessian matrix in this case is a $ 2\times 2$ matrix with these functions as entries:
H f ( x , y ) = [ f x x ( x , y ) f x y ( x , y ) f y x ( x , y ) f y y ( x , y ) ] = [ 6 x − 2 − 2 − 30 y 4 ] \mathbf{H} f(x, y)=\left[ \begin{array}{cc}{f_{x x}(x, y)} & {f_{x y}(x, y)} \\ {f_{y x}(x, y)} & {f_{y y}(x, y)}\end{array}\right]=\left[ \begin{array}{cc}{6 x} & {-2} \\ {-2} & {-30 y^{4}}\end{array}\right]
H f ( x , y ) = [ f x x ( x , y ) f y x ( x , y ) f x y ( x , y ) f y y ( x , y ) ] = [ 6 x − 2 − 2 − 3 0 y 4 ]
Example 2
Problem : the function f ( x ) = x ⊤ A x + b ⊤ x + c f(x)=x^{\top} A x+b^{\top} x+c f ( x ) = x ⊤ A x + b ⊤ x + c , where A A A is a n × n n \times n n × n matrix, b b b is a vector of length n n n and c c c is a constant.
Determine the gradient of f f f : ∇ f ( x ) \nabla f(x) ∇ f ( x ) .
Determine the Hessian of f f f : H f ( x ) H_{f}(x) H f ( x ) .
Solution :
compute the gradient ∇ f ( x ) \nabla f(x) ∇ f ( x ) :
∇ f ( x ) = ∂ x T ∂ x ⋅ ( A x ) + x T ⋅ ∂ ( A x ) ∂ x ⎵ p r o d u c t − r u l e + ∂ b T x ∂ x + ∂ c ∂ x = A x + x T ⋅ A + b = A x + x ⋅ A T + b = ( A + A T ) x + b \begin{aligned}
\nabla f(x)&=\underbrace{\frac{\partial x^{T}}{\partial x}\cdot (Ax)+x^{T}\cdot \frac{\partial (Ax)}{\partial x}}_{product-rule}+\frac{\partial b^Tx}{\partial x}+\frac{\partial c}{\partial x}\\
&= Ax + x^{T}\cdot A+b \\
&= Ax + x\cdot A^{T} + b \\
&= (A+A^{T})x + b
\end{aligned}
∇ f ( x ) = p r o d u c t − r u l e ∂ x ∂ x T ⋅ ( A x ) + x T ⋅ ∂ x ∂ ( A x ) + ∂ x ∂ b T x + ∂ x ∂ c = A x + x T ⋅ A + b = A x + x ⋅ A T + b = ( A + A T ) x + b
compute the Hessian H f ( x ) H_{f}(x) H f ( x ) :
H f ( x ) = ∂ ∇ f ( x ) ∂ x = A + A T H_{f}(x) = \frac{\partial \nabla f(x)}{\partial x} = A + A^{T}
H f ( x ) = ∂ x ∂ ∇ f ( x ) = A + A T