In mathematics, the Hessian matrix or Hessian is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. Hessian Matrices are often used in optimization problems within Newton-Raphson's method.

Hf=[2fx22fxy2fxz2fyx2fy22fyz2fzx2fzy2fz2]\mathbf{H} f=\left[ \begin{array}{cccc}{\frac{\partial^{2} f}{\partial x^{2}}} & {\frac{\partial^{2} f}{\partial x \partial y}} & {\frac{\partial^{2} f}{\partial x \partial z}} & {\cdots} \\ {\frac{\partial^{2} f}{\partial y \partial x}} & {\frac{\partial^{2} f}{\partial y^{2}}} & {\frac{\partial^{2} f}{\partial y \partial z}} & {\cdots} \\ {\frac{\partial^{2} f}{\partial z \partial x}} & {\frac{\partial^{2} f}{\partial z \partial y}} & {\frac{\partial^{2} f}{\partial z^{2}}} & {\cdots} \\ {\vdots} & {\vdots} & {\vdots} & {\ddots}\end{array}\right]

Example 1: Computing a Hessian

Problem: Compute the Hessian of f(x,y)=x32xyy6f(x, y)=x^{3}-2 x y-y^{6}.


First compute both partial derivatives:

fx(x,y)=x(x32xyy6)=3x22yf_{x}(x, y)=\frac{\partial}{\partial x}\left(x^{3}-2 x y-y^{6}\right)=3 x^{2}-2 y

fy(x,y)=y(x32xyy6)=2x6y5f_{y}(x, y)=\frac{\partial}{\partial y}\left(x^{3}-2 x y-y^{6}\right)=-2 x-6 y^{5}

With these, we compute all four second partial derivatives:

fxx(x,y)=x(3x22y)=6xf_{x x}(x, y)=\frac{\partial}{\partial x}\left(3 x^{2}-2 y\right)=6 x

fxy(x,y)=y(3x22y)=2{f_{x y}(x, y)=\frac{\partial}{\partial y}\left(3 x^{2}-2 y\right)=-2}

fyx(x,y)=x(2x6y5)=2{f_{y x}(x, y)=\frac{\partial}{\partial x}\left(-2 x-6 y^{5}\right)=-2}

fyy(x,y)=y(2x6y5)=30y4f_{y y}(x, y)=\frac{\partial}{\partial y}\left(-2 x-6 y^{5}\right)=-30 y^{4}

The Hessian matrix in this case is a $ 2\times 2$ matrix with these functions as entries:

Hf(x,y)=[fxx(x,y)fxy(x,y)fyx(x,y)fyy(x,y)]=[6x2230y4]\mathbf{H} f(x, y)=\left[ \begin{array}{cc}{f_{x x}(x, y)} & {f_{x y}(x, y)} \\ {f_{y x}(x, y)} & {f_{y y}(x, y)}\end{array}\right]=\left[ \begin{array}{cc}{6 x} & {-2} \\ {-2} & {-30 y^{4}}\end{array}\right]

Example 2

Problem: the function f(x)=xAx+bx+cf(x)=x^{\top} A x+b^{\top} x+c, where AA is a n×nn \times n matrix, bb is a vector of length nn and cc is a constant.

  1. Determine the gradient of ff: f(x)\nabla f(x).
  2. Determine the Hessian of ff: Hf(x)H_{f}(x).


  1. compute the gradient f(x)\nabla f(x):

f(x)=xTx(Ax)+xT(Ax)xproductrule+bTxx+cx=Ax+xTA+b=Ax+xAT+b=(A+AT)x+b\begin{aligned} \nabla f(x)&=\underbrace{\frac{\partial x^{T}}{\partial x}\cdot (Ax)+x^{T}\cdot \frac{\partial (Ax)}{\partial x}}_{product-rule}+\frac{\partial b^Tx}{\partial x}+\frac{\partial c}{\partial x}\\ &= Ax + x^{T}\cdot A+b \\ &= Ax + x\cdot A^{T} + b \\ &= (A+A^{T})x + b \end{aligned}

  1. compute the Hessian Hf(x)H_{f}(x):

Hf(x)=f(x)x=A+ATH_{f}(x) = \frac{\partial \nabla f(x)}{\partial x} = A + A^{T}