Unconstrained Optimization

2.1. Unconstrained Optimization#

2.1.1. Definitions#

In this chapter, we begin our study of optimization algorithms by focusing on the unconstrained case, where the optimization variable can be chosen from the entire space \(\RR^d\). That is, \(\vw\) can be chosen from the entire \(\RR^d\) space.

The general form of an unconstrained optimization problem is:

\[\min_{\vw \in \RR^d} \quad f(\vw)\]

where \(\vw\) is the optimization variable; \(f: \RR^d \to \RR\) is the objective function; There are no explicit constraints on \(\vw\).

Let us begin by discussing the definition of local and global optimal solutions for problem

Definition 2.1 (Local and Global Optimal Solutions)

Let \(f: \RR^d \to \RR\) be a continuous function.

Local Minimum: \(\vw^*\) is a local minimizer if there exists \(\epsilon > 0\) such that:

\[ f(\vw) \ge f(\vw^*) \quad \forall \vw \in \RR^d \text{ with } \|\vw - \vw^*\| \le \epsilon \]
Global Minimum: \(\hat{\vw}\) is a global minimizer if:

\[ f(\vw) \ge f(\hat{\vw}) \quad \forall \vw \in \RR^d \]

Local optimality describes the function’s behavior in a neighborhood, while global optimality describes the best solution across the entire space.

Now let us talk about whether global or local optimal solution exists. Interestingly, even if the objective function is continuous, a global minimum may not be attained unless we impose additional structure. For example:

\[\inf_{x \in \RR} \exp(-|x|) = 0\]

But the minimum is never achieved because \(\exp(-|x|) \to 0\) as \(|x| \to \infty\).

A standard result from real analysis helps clarify when minimizers exist:

Theorem 2.1 (Weierstrass Theorem)

If \(f: \RR^d \to \RR\) is continuous and its level set, defined below:

\[\{ \vw \in \RR^d : f(\vw) \le f(\vw_0) \}\]

is compact (i.e., closed and bounded) for some point \(\vw_0\), then \(f\) achieves its minimum on \(\RR^d\).

2.1.2. Optimality Conditions#

Now we know when the solutions exist, let’s discuss how to identify them. Towards this end, we will discuss both sufficient and necessary conditions for a solution to be a local optimal solution.

Theorem 2.2 (First-order and second-order necessary condition)

If \(f\) is continuously differentiable, and let \(\vw^*\) be a local minimizer. Then the following holds:

\[\nabla f(\vw^*) = 0.\]

This first-order necessary condition defines a (first-order) stationary points. They are candidates for global/local minimum, maximum solutions, as well as saddle points.

Further, suppose that \(f\) is twice continuously differentiable. Suppose \(\vw^*\) is a local minimizer, then:

\[ \nabla f(\vw^*) = 0, \quad \nabla^2 f(\vw^*) \succeq 0. \]

This second-order necessary condition, together with the first-order condition, defines a (second-order) stationary points. They are candidates for global and local minimum solutions.

Proof (First-order Necessary Condition)

First, consider the scalar case where \(d=1\).

By local minimality, there exists \(\epsilon>0\) such that \(f(x)\ge f(x^*)\) whenever \(|x-x^*|\le \epsilon\).

For \(x\downarrow x^*\) (i.e., \(x>x^*\) and \(x\to x^*\)), we have \(f(x)-f(x^*)\ge 0\) and \(x-x^*>0\), hence

\[ \lim_{x\downarrow x^*}\frac{f(x)-f(x^*)}{x-x^*}\;\ge\;0. \]

For \(x\uparrow x^*\) (i.e., \(x<x^*\) and \(x\to x^*\)), we have \(f(x)-f(x^*)\ge 0\) and \(x-x^*<0\), hence

\[ \lim_{x\uparrow x^*}\frac{f(x)-f(x^*)}{x-x^*}\;\le\;0. \]

If \(f\) is differentiable at \(x^*\), the two one‑sided limits equal \(f'(x^*)\). Therefore both inequalities imply that the following holds:

\[ f'(x^*)=0. \]

Then let us consider the vector case where \(d\ge 2\). Fix any direction \(h\in\RR^d\) and define the scalar function \(g(\alpha)=f(\,\vw^*+\alpha h\,)\). Because \(\vw^*\) is a local minimizer of \(f\), we have that \(\alpha=0\) is a local minimizer of \(g\).
Note that \(f\) is \(\mathcal C^1\), \(g\) is differentiable and

\[ g'(0)=\nabla f(\vw^*)^\top h. \]

Since \(g(\cdot)\) is unconstrained and differentiable scalar function, then the fact that at \(\alpha=0\) it achieves local minimum implies \(g'(0)=0\) (see the first part of this proof), hence

\[ \nabla f(\vw^*)^\top d=0\quad\text{for all } d\in\RR^d. \]

The only vector orthogonal to every direction \(d\) is the zero vector, so \(\nabla f(\vw^*)=\vzero\).

Proof (Second-Order Necessary Condition)

Again, first consider the scalar case where \(d=1\). From the first-order necessary condition, \(f'(x^*)=0\).
Since \(x^*\) is a local minimum solution, for \(x\) sufficiently close to \(x^*\) we have \(f(x)\ge f(x^*)\). Consider

\[ q(x)\;=\;\frac{\,f(x)-f(x^*)-f'(x^*)(x-x^*)\,}{(x-x^*)^2}. \]

Since \(f'(x^*)=0\) and \(f(x)\ge f(x^*)\), we get \(q(x)\ge 0\) for all \(x\neq x^*\) sufficiently close to \(x^*\).
Because \(f\) is \(\mathcal C^2\) at \(x^*\), the limit exists and

\[ \lim_{x\to x^*} q(x)\;=\;\frac{1}{2}f''(x^*). \]

Taking limits preserves the inequality, hence \(\tfrac{1}{2}f''(x^*)\ge 0\), i.e., \(f''(x^*)\ge 0\).

Then consider the vector case where \(d\ge 2\). Fix any direction \(h\in\RR^d\) and define \(g(\alpha)=f(\vw^*+\alpha \vh)\).
Since \(\vw^*\) is a local minimizer of \(f\), \(\alpha=0\) is a local minimizer of the scalar \(\mathcal C^2\) function \(g\).
From the first-order result, \(g'(0)=\nabla f(\vw^*)^\top \vh=0\). By the chain rule,

\[ g''(0)= h^\top \nabla^2 f(\vw^*)\, \vh. \]

From the first part of the proof, for scalar functions, if \(\alpha^* =0\) is a local min of a twice differentiable scalar function, then \(g''(0)\ge 0\). Because \(h\) is arbitrary, \(h^\top \nabla^2 f(\vw^*) \vh \ge 0\) for all \(\vh\), i.e., \(\nabla^2 f(\vw^*)\succeq 0\).

Next, we discuss sufficient optimality conditions. It is important to note that sufficient conditions are always more strict compared to necessary conditions.

Theorem 2.3 (Second-Order Sufficient Condition)

Let \(f:\RR^d\to\RR\) be \(\mathcal C^2\) near \(\vw^*\).
Suppose that \(f(\cdot)\) is twice continuously differentiable, then if the following two relations hold:

\[\nabla f(\vw^*)=\vzero, \quad \nabla^2 f(\vw^*) \succ 0,\]

then \(\vw^*\) is a strict local minimizer of \(f\).

Proof (Second-order Sufficient Conditions)

If \(d=1\), by Taylor expansion, for \(x\) close to \(w^*\) there exists some \(\xi\) between \(w\) and \(w^*\) such that

\[ f(w) = f(w^*) + f'(w^*)(w-w^*) + \tfrac12 f''(\xi)\,(w-w^*)^2. \]

This is also known as the (scalar version of) Mean Value Theorem (MVT).

From the first-order condition, \(f'(w^*)=0\). By continuity of \(f''\) and the fact that \(f''(w^*)>0\), there exists a neighborhood \(U\) of \(w^*\) such that \(f''(\xi) \ge \mu > 0\) for all \(\xi\in U\).
Thus, for \(w \in U\setminus\{w^*\}\),

\[ f(w) - f(w^*) \;\ge\; \tfrac{\mu}{2}(w-w^*)^2 \;>\; 0, \]

which proves \(w^*\) is a strict local minimizer.

If \(d\ge 2\), by the multivariate Taylor’s theorem, for \(\vh\) sufficiently small there exists \(\theta\in(0,1)\) such that

\[ f(\vw^*+\vh) = f(\vw^*) + \nabla f(\vw^*)^\top \vh + \tfrac12 \vh^\top \nabla^2 f(\vw^*+\theta\vh)\, \vh. \]

This is also known as the (vector version of) MVT.

From the first-order condition, \(\nabla f(\vw^*)=\vzero\). Since \(\nabla^2 f(\vw^*) \succ 0\) and is continuous, there exists \(\mu>0\) and a neighborhood \(\mathcal N\) of \(\vw^*\) such that \(\nabla^2 f(\vw) \succeq \mu I\) for all \(\vw\in\mathcal N\). Therefore, for \(\vh\) small with \(\vw^*+\vh\in\mathcal N\),

\[ f(\vw^*+\vh) - f(\vw^*) \;\ge\; \tfrac{\mu}{2}\|\vh\|^2 \;>\; 0 \quad (\vh \neq 0), \]

which proves \(\vw^*\) is a strict local minimizer.

2.1.2.1. Examples: Using Optimality Conditions#

Example 2.1 (Scalar quadratic (global minimum))

Let

\[ f(w) = (w-2)^2. \]

First-order condition:

\[ f'(w) = 2(w-2) = 0 \quad\Rightarrow\quad w^* = 2. \]

Second-order condition:

\[ f''(w) = 2 > 0. \]

Therefore, our conclusion is that \(w^*=2\) is a strict local (and global) minimizer.

Example 2.2 (Scalar quartic (flat at minimum))

Let

\[ f(w) = w^4. \]

First-order condition: \(f'(w) = 4w^3 = 0 \quad\Rightarrow\quad w^* = 0\).
Second-order condition: \(f''(w) = 12w^2 \quad\Rightarrow\quad f''(0) = 0\).
Necessary conditions hold, but sufficient condition is inconclusive. Checking directly: \(f(w) \ge 0\) for all \(x\), so \(w^* = 0\) is indeed a global minimum.

Example 2.3 (Scalar cubic (saddle / inflection point))

Let

\[ f(w) = w^3. \]

First-order: \(f'(w) = 3w^2 = 0 \quad\Rightarrow\quad w^* = 0\).
Second-order: \(f''(w) = 6x \quad\Rightarrow\quad f''(0) = 0\).
Test inconclusive. Checking: \(f(w) < 0\) for \(w < 0\) and \(f(w) > 0\) for \(w > 0\). This implies that \(w^* = 0\) is not a (local)minimum.

Example 2.4 (Multivariate quadratic (positive definite Hessian))

Let

\[ f(\vw) = \tfrac12 \vw^\top Q \vw - \vb^\top \vw, \]

where

\[\begin{split} Q = \begin{bmatrix} 2 & 0 \\ 0 & 4 \end{bmatrix}, \quad \vb = \begin{bmatrix} 2 \\ 4 \end{bmatrix}. \end{split}\]

First-order: \(\nabla f(\vw) = Q\vw - \vb = 0 \quad\Rightarrow\quad \vw^* = Q^{-1} \vb = (1, 1)^\top\).
Second-order: \(\nabla^2 f(\vw) = Q \succ 0\) (positive definite).

Then our conclusion is that, \(\vw^* = (1,1)^\top\) is a strict local (and global) minimizer.

Example 2.5 (Multivariate saddle point)

Let

\[ f(w_1, w_2) = w_1^2 - w_2^2. \]

First-order: \(\nabla f = (2w_1, -2w_2)^\top = 0 \quad\Rightarrow\quad (w_1, w_2) = (0,0)\).
Second-order: \(\nabla^2 f = \begin{bmatrix} 2 & 0 \\ 0 & -2 \end{bmatrix}\). Noe that this matrix is neither a positive nor a negative definite matrix.

Then the conclusion is that, \((0,0)\) is a saddle point, not a minimum.

Now we see both the necessary and sufficient conditions, a natural questions is: Why Are Optimality Conditions Important? Below we list a few reasons:

They provide verification tools — necessary conditions rule out non-optimal points; sufficient conditions certify local optima.
They are used to design stopping criteria in algorithms: e.g., when \(\|\nabla f(\vw)\| \le \text{tolerance}\).
They guide algorithmic design: most algorithms search for points satisfying \(\nabla f(\vw) = 0\) (approximately).
In convex problems, first-order conditions are sufficient, making them very powerful. More to come later in Subsection Convexity, Concavity, and Related Concepts

At this point, one might ask the following question: our discussion so far are only about conditions characterizing the local optimal solutions; how to characterize the globally optimal ones?

It turns out that this question is not that easy to answer, as in general, a locally optimaly solution needs not be a globally optimal. The simplest answer is that, one can just use the definition of the global optimal solution in Definition 2.1, however this is not practical since this requires to check all the solutions in \(\RR^d\). In practice, one has to further study the type of problems in order to characterize the global optimal solutions.

Fortunately, for some special classes of optimization problems, every local optimal solution is guaranteed to be global. Knowing these cases is important because it means checking local optimality conditions is enough to guarantee global optimality.

Unconstrained Quadratic Programs (UQPs)
Convex Problems

Theorem 2.4 (UQPs: Local ⇒ Global (and uniqueness when \(Q\succ 0\)))

Consider the quadratic problem

\[ f(\vw)=\tfrac12\,\vw^\top Q\,\vw+\vb^\top \vw + c,\quad Q=Q^\top. \]

Then every local optimal solution is also globally optimal.

The argument for the about result is simple. Note that the first- and second-order necessary conditions are

\[\nabla f(\vw)=Q\vw+\vb=0, \quad \nabla^2f(\vw) = Q\succeq 0.\]

So every local optimal solution has to satisfy the above two conditions. However, note that if \(Q\succeq 0\), then the problem is convex (see Property 2.1), therefore local min is also global min (see Theorem 2.5). The above necessary condition is also sufficient. If \(Q\succ 0\), the minimizer is unique (strong convexity).

We will talk about convex problems in the next subsection.

2.1.4. Practice Problems: Applying Optimality Conditions#

Try the following problems. For each, (a) find all stationary points by solving the first-order condition, (b) apply the second-order condition to classify them, and (c) verify by direct inspection if needed.

Problem 1 (Scalar cubic-quadratic)

\[ f(x) = x^3 - 3x^2 + 4 \]

Problem 2 (Scalar quartic with multiple minima)

\[ f(x) = x^4 - 4x^2 \]

Problem 3 (Multivariate quadratic)

\[\begin{split} f(\vw) = \tfrac12 \vw^\top \begin{bmatrix} 4 & 1 \\ 1 & 2 \end{bmatrix} \vw - \begin{bmatrix} 4 \\ 2 \end{bmatrix}^\top \vw \end{split}\]

Problem 4 (Saddle point in \(\RR^2\))

\[ f(x_1, x_2) = x_1^2 - 4x_2^2 \]

Problem 5 (Convex norm function)

\[ f(\vw) = \|\vw - \vone\|_2^2 \]

Problem 6 (Flat region)

\[\begin{split} f(x) = \begin{cases} 0 & |x| \le 1 \\ (x-1)^2 & x > 1 \\ (x+1)^2 & x < -1 \end{cases} \end{split}\]

Note: Not differentiable at \(x=\pm 1\).

Unconstrained Optimization

Contents

2.1. Unconstrained Optimization#

2.1.1. Definitions#

2.1.2. Optimality Conditions#

2.1.2.1. Examples: Using Optimality Conditions#

2.1.3. Convexity, Concavity, and Related Concepts#

2.1.4. Practice Problems: Applying Optimality Conditions#