Local Optimization
Overview
Local optimization finds a point where an objective function f(x) achieves a minimum value relative to all nearby points in the search space. Unlike global optimization, which seeks the absolute minimum across the entire domain, local methods focus on identifying a local minimum—a point where the function value is lower than at all neighboring points, though not necessarily the lowest possible value overall.
Local optimization is fundamental to countless applications: training neural networks, calibrating econometric models, fitting nonlinear regression models, tuning control systems, and solving engineering design problems. The efficiency of local methods makes them practical for high-dimensional problems where global search would be computationally prohibitive.
The defining characteristic of local optimization is that the solution depends on the starting point. For smooth, convex functions, any local minimum is also the global minimum, and the initial guess matters little. However, for non-convex functions with multiple valleys and peaks, different starting points may converge to different local minima. This sensitivity requires careful initialization and sometimes multiple runs from different starting values.
Derivative-Free Methods
Derivative-free optimization (also called “direct search” or “pattern search”) algorithms rely solely on function evaluations without computing or approximating gradients. These methods are essential when: - The objective function is noisy or discontinuous - Derivatives are unavailable (e.g., black-box simulations, legacy code) - Computing gradients is expensive relative to function evaluations - The function is non-smooth (e.g., containing abs(), max(), or conditional logic)
The Nelder-Mead simplex method maintains a simplex (a geometric shape with n+1 vertices in n dimensions) and iteratively reflects, expands, or contracts the simplex based on function values at its vertices. It’s robust and requires no derivative information, making it popular for exploratory optimization. However, it can stagnate on ill-conditioned problems and may require many function evaluations for high-dimensional spaces.
Powell’s method performs sequential one-dimensional minimizations along a set of conjugate directions, updating these directions to accelerate convergence. It exploits the structure of the problem without computing gradients and often outperforms Nelder-Mead on smooth functions.
For single-variable problems, Brent’s method combines the robustness of golden-section search with the speed of inverse parabolic interpolation. It’s the gold standard for univariate minimization when derivatives are unavailable.
Gradient-Based Methods
Gradient-based optimization uses first-order derivative information (the gradient \nabla f(x)) and sometimes second-order information (the Hessian H(x)) to guide the search toward a minimum. These methods converge much faster than derivative-free approaches on smooth functions, typically requiring far fewer iterations.
The BFGS algorithm (Broyden–Fletcher–Goldfarb–Shanno) is a quasi-Newton method that builds an approximation to the Hessian matrix using gradient information from successive iterations. It offers superlinear convergence without the computational cost of computing and inverting the full Hessian. BFGS is the default choice for unconstrained smooth optimization and is remarkably robust even on moderately ill-conditioned problems. The method is described in detail in Numerical Optimization by Nocedal and Wright.
L-BFGS-B (Limited-memory BFGS with Bounds) extends BFGS to handle box constraints (x_{lb} \leq x \leq x_{ub}) while using limited memory to store Hessian approximations. This makes it suitable for large-scale problems with thousands of variables.
Sequential Quadratic Programming (SQP) is the industry standard for nonlinear constrained optimization. It iteratively solves a quadratic programming subproblem that approximates the original nonlinear program (NLP), updating the approximation with each iteration. SQP handles both equality and inequality constraints and converges rapidly near the solution under mild regularity conditions.
Automatic Differentiation
Automatic differentiation (AD) computes exact derivatives by systematically applying the chain rule to elementary operations in a computational graph. Unlike numerical differentiation (finite differences), AD produces derivatives at machine precision without truncation error. Unlike symbolic differentiation, AD scales efficiently to large, complex programs.
CasADi is an open-source framework specifically designed for AD-based optimization. It constructs a symbolic expression graph from mathematical expressions, applies AD to generate exact gradients and Hessians, and interfaces with high-performance solvers like IPOPT and SQP. CasADi is particularly powerful for optimal control problems, trajectory optimization, and model predictive control, where derivatives are essential but manual computation is impractical.
The key advantage of AD is that gradient computation cost scales linearly with the number of variables (for reverse-mode AD), making it feasible to optimize functions with thousands of parameters. This is central to modern deep learning, where frameworks like PyTorch and TensorFlow rely on reverse-mode AD (backpropagation) to train neural networks.
Constrained vs. Unconstrained Optimization
Unconstrained optimization seeks to minimize f(x) over the entire space \mathbb{R}^n. The optimality condition is \nabla f(x^*) = 0 (the gradient vanishes at the minimum), and the Hessian H(x^*) must be positive semidefinite. Methods like BFGS and Nelder-Mead are designed for this setting.
Constrained optimization adds restrictions: - Box constraints: x_{lb} \leq x \leq x_{ub} restrict variables to intervals - Linear constraints: Ax = b or Ax \leq b - Nonlinear constraints: g(x) = 0 (equality) or h(x) \leq 0 (inequality)
Constrained problems require specialized algorithms. L-BFGS-B and TNC handle box constraints by projecting iterates onto the feasible region. SLSQP (Sequential Least Squares Programming) and SQP methods handle general nonlinear constraints using Lagrange multipliers and KKT conditions.
Native Excel capabilities
Excel provides several built-in tools for local optimization:
Solver Add-in: Excel’s primary optimization tool implements a variant of the Generalized Reduced Gradient (GRG) algorithm for nonlinear programming. It handles unconstrained and constrained problems with up to 200 decision variables and supports bounds, linear constraints, and nonlinear constraints. The GRG algorithm uses numerical finite differences to approximate gradients, which can be slow and numerically unstable for poorly scaled or noisy functions. Solver is adequate for small-scale problems but lacks the efficiency and robustness of modern quasi-Newton methods.
Goal Seek: A simplified tool for single-variable problems where you want to find the input that produces a specific output. It uses a basic bisection-like method or secant method and is limited to one decision variable and one target cell. Suitable for simple what-if analysis but not true optimization.
Limitations: Excel Solver uses outdated algorithms compared to modern libraries like SciPy and NLopt. The 200-variable limit excludes most real-world data science and machine learning applications. Numerical gradient approximation can fail on noisy, discontinuous, or ill-scaled functions. Solver also lacks support for automatic differentiation, limiting its efficiency on problems where exact derivatives could be computed.
Python-based functions offer access to state-of-the-art algorithms (BFGS, L-BFGS-B, SQP), automatic differentiation frameworks (CasADi), and the ability to handle problems with thousands of variables.
Third-party Excel add-ins
Frontline Solvers (Premium Solver, Analytic Solver): Developed by the creators of Excel’s built-in Solver, these commercial add-ins offer higher variable limits (up to millions), faster algorithms, and support for stochastic programming. Premium Solver includes GRG2, SQP, and barrier methods for nonlinear optimization.
OpenSolver: An open-source add-in that extends Excel with COIN-OR optimization engines, particularly strong for linear and mixed-integer programming. While primarily focused on LP/MILP, it provides an alternative interface for calling external solvers.
AIMMS: An advanced optimization modeling platform that integrates with Excel for large-scale optimization and decision support. AIMMS supports a wide range of solvers and is used in operations research and supply chain optimization.
Tools
| Tool | Description |
|---|---|
| CA_MINIMIZE | Minimize a multivariate function using CasADi with automatic differentiation. |
| MINIMIZE | Minimize a multivariate function using SciPy’s minimize routine. |
| MINIMIZE_SCALAR | Minimize a single-variable function using SciPy’s minimize_scalar. |