Least Squares

Overview

Least squares is a mathematical optimization technique for finding the parameters of a model that best fit a set of observations by minimizing the sum of the squared differences (residuals) between observed and predicted values. Originally developed by Carl Friedrich Gauss and Adrien-Marie Legendre in the early 19th century, least squares has become the cornerstone of regression analysis, parameter estimation, and curve fitting across virtually every scientific and engineering discipline.

The fundamental principle is elegantly simple: given a model f(x; \theta) with parameters \theta and observed data pairs (x_i, y_i), find the parameter values that minimize the residual sum of squares (RSS):

S(\theta) = \sum_{i=1}^{n} \left(y_i - f(x_i; \theta)\right)^2

When the model is linear in its parameters, the solution is analytical and exact. When the model is nonlinear, iterative numerical optimization algorithms are required. The resulting parameter estimates have well-understood statistical properties, particularly when errors are normally distributed: the least-squares estimator is unbiased, has minimum variance among all linear unbiased estimators (Gauss-Markov theorem), and provides the foundation for hypothesis testing and confidence interval construction.

Applications Across Disciplines

Least squares is ubiquitous in data analysis: - Experimental Physics: Fitting decay curves, resonance peaks, and calibration functions. High-energy physics relies heavily on robust minimizers like Minuit for complex likelihood fitting. - Chemistry and Spectroscopy: Peak deconvolution in chromatography and spectroscopy (Gaussian, Lorentzian, and Voigt profiles). Reaction kinetics modeling. - Engineering: System identification, parameter tuning for control systems, and material property characterization. - Economics and Finance: Econometric modeling, time-series forecasting, and risk analysis. - Machine Learning: Linear regression, polynomial regression, and the training of neural networks (where gradient descent minimizes mean squared error).

Theoretical Foundations

The elegance of least squares stems from its deep connection to probability theory. When measurement errors are independent and normally distributed with constant variance, the least-squares estimator is equivalent to the maximum likelihood estimator. This provides a rigorous statistical framework for inference.

Uncertainty Quantification: After fitting, the covariance matrix of parameter estimates can be computed from the Jacobian of the residuals:

\text{Cov}(\hat{\theta}) = \sigma^2 (J^T J)^{-1}

where \sigma^2 = RSS / (n - p) is the estimated variance of the residuals, n is the number of observations, and p is the number of parameters. The diagonal elements of this matrix provide the standard errors of each parameter, which are critical for constructing confidence intervals and performing hypothesis tests.

Weighted Least Squares: When observations have different precision, weighted least squares incorporates known measurement uncertainties \sigma_i:

\chi^2(\theta) = \sum_{i=1}^{n} \frac{(y_i - f(x_i; \theta))^2}{\sigma_i^2}

Minimizing \chi^2 gives more influence to precise measurements and provides the correct statistical treatment of heteroscedastic data.

Optimization Algorithms

For linear models (linear in parameters), the solution is obtained directly by solving the normal equations or using matrix decomposition techniques (QR decomposition, Singular Value Decomposition).

For nonlinear models, iterative algorithms are required: - Levenberg-Marquardt algorithm: A hybrid method that interpolates between the Gauss-Newton method (fast convergence near the minimum) and gradient descent (robust far from the minimum). This is the default solver in many implementations. - Trust Region Reflective (TRF): Handles bounded constraints efficiently by solving a sequence of trust-region subproblems. - Dogbox: A variation of trust-region methods optimized for box-constrained problems. - MIGRAD algorithm: A sophisticated variable-metric method (a variant of the Davidon-Fletcher-Powell algorithm) used in particle physics. It builds an approximation to the inverse Hessian matrix to guide the search.

Automatic Differentiation: Modern implementations increasingly use automatic differentiation to compute exact Jacobians and Hessians symbolically, eliminating numerical differentiation errors and improving convergence.

Comparison of Approaches

The functions in this category represent different levels of abstraction and specialization:

General-Purpose Flexible Fitting

CURVE_FIT wraps SciPy’s curve_fit and provides maximum flexibility. You define your model as a Python lambda function, allowing arbitrary mathematical expressions. This is ideal when you have a custom model derived from theory or when existing prebuilt models don’t match your needs. The trade-off is that you must provide reasonable initial guesses and have some understanding of the model’s parameters.

CA_CURVE_FIT uses the CasADi optimization framework with automatic differentiation. Instead of approximating derivatives numerically, CasADi constructs a symbolic computational graph and computes exact derivatives. This leads to faster convergence and higher accuracy, particularly for complex models with many parameters. CasADi also supports multiple solver backends and advanced features like sensitivity analysis.

MINUIT_FIT leverages the iminuit library, a Python interface to the CERN ROOT Minuit2 minimizer. Minuit is the gold standard in high-energy physics for parameter estimation because of its robustness, careful handling of numerical precision, and sophisticated uncertainty estimation (including MINOS for asymmetric errors). If you need publication-quality parameter uncertainties or are fitting complex models with correlated parameters, MINUIT_FIT is the tool of choice.

Predefined Scientific Models

LM_FIT provides access to a comprehensive catalog of 27 validated models through the lmfit package. Instead of writing mathematical expressions, you select models by name (e.g., “gaussian”, “exponential”, “lorentzian”). lmfit handles initial parameter guessing intelligently, applies physical constraints (e.g., positive widths), and supports model composition (e.g., adding a Gaussian peak to a linear baseline). This is the most user-friendly option for common scientific workflows like peak fitting in spectroscopy or chromatography, exponential decay fitting, or polynomial regression.

Native Excel Capabilities

Excel provides several built-in tools for least-squares fitting: - LINEST Function: Performs multiple linear regression using the least-squares method. Returns regression coefficients, standard errors, R², and residual statistics. Limited to models that are linear in parameters (polynomials, multiple linear regression). - Trendline Feature: Excel charts support adding trendlines (linear, exponential, logarithmic, polynomial, power) directly to scatter plots. This provides quick visual fitting but limited control over parameters and no uncertainty estimates. - Solver Add-in: The Solver can minimize an RSS objective function by adjusting parameters. This is flexible but manual—you must construct the model, compute residuals, and set up the optimization problem yourself. - Analysis ToolPak: The regression tool in the Analysis ToolPak performs multiple linear regression with comprehensive diagnostic output (ANOVA table, residual plots).

Limitations: Excel’s native tools are restricted to linear regression and a small set of simple nonlinear forms (exponential, power, logarithmic). Fitting custom nonlinear models requires manual Solver setup, and Excel does not provide standard errors for nonlinear fits without additional calculation. The Python functions in this category offer access to state-of-the-art nonlinear optimization algorithms, automatic differentiation, uncertainty quantification, and hundreds of validated scientific models.

Third-Party Excel Add-ins

  • XLSTAT: A comprehensive statistical add-in that includes advanced regression capabilities, nonlinear regression with custom equations, and robust curve fitting with diagnostic plots.
  • TableCurve 2D: Specialized software for automated curve fitting. It can test thousands of model equations against your data and rank them by goodness-of-fit. Ideal for exploratory data analysis when the functional form is unknown.
  • SigmaPlot: Scientific graphing and analysis software with extensive nonlinear regression capabilities, including dynamic curve fitting within plots and automatic report generation.

Tools

Tool Description
CA_CURVE_FIT Fit an arbitrary symbolic model to data using CasADi and automatic differentiation.
CURVE_FIT Fit a model expression to xdata, ydata using scipy.optimize.curve_fit.
LM_FIT Fit data using lmfit’s built-in models with optional model composition.
MINUIT_FIT Fit an arbitrary model expression to data using iminuit least-squares minimization with uncertainty estimates.