Curve Fitting
Overview
Curve fitting is the process of constructing a mathematical function that best approximates a series of data points. At its heart, curve fitting transforms empirical observations into predictive models, enabling interpolation, extrapolation, and scientific insight. The fundamental question is deceptively simple: given a set of (x, y) pairs, what function f(x) best captures the underlying relationship?
This discipline bridges theory and experiment across virtually every quantitative field. In chemistry and biochemistry, curve fitting extracts kinetic parameters from reaction rates and binding assays. In engineering, it models system responses, material properties, and signal characteristics. In economics and finance, it reveals trends, cycles, and forecast trajectories. The ubiquity of curve fitting reflects a deeper truth: real-world phenomena rarely present themselves as clean equations—we must infer them from noisy, incomplete data.
The Fitting Process
All curve fitting involves three core elements:
The Model: A mathematical expression f(x; \theta) parameterized by \theta = (\theta_1, \theta_2, \ldots, \theta_n). This could be a simple line (y = mx + b), a nonlinear function (e.g., Michaelis-Menten), or an arbitrary user-defined expression.
The Objective: A loss function that quantifies the quality of fit. The most common is the sum of squared residuals (SSR): \text{SSR}(\theta) = \sum_{i=1}^{N} [y_i - f(x_i; \theta)]^2 Minimizing SSR yields the least-squares estimate.
The Solver: An optimization algorithm that searches the parameter space \theta to minimize the objective. Linear models have closed-form solutions; nonlinear models require iterative methods.
Least Squares Methods
Least squares regression is the workhorse of curve fitting. When the model is linear in its parameters (e.g., polynomial regression), the solution is analytical via normal equations or matrix decomposition. When the model is nonlinear (e.g., exponential decay, sigmoid growth), iterative algorithms are required.
CURVE_FIT: The standard nonlinear least-squares fitter, powered by scipy.optimize.curve_fit. It uses the Levenberg-Marquardt algorithm to minimize SSR for arbitrary model expressions.
LM_FIT: Built on the lmfit library, which extends curve fitting with parameter constraints, model composition (adding/multiplying models), and richer uncertainty quantification.
MINUIT_FIT: Leverages iminuit, a robust minimizer originating from particle physics. It provides detailed uncertainty estimates, Hessian matrices, and correlation matrices.
CA_CURVE_FIT: Uses CasADi for automatic differentiation, enabling highly efficient gradient computations for complex symbolic models. Ideal for computationally intensive fitting tasks or integration with optimization-based model calibration.
Pre-Built Models for Domain-Specific Applications
While general-purpose fitting functions accept arbitrary model expressions, many scientific and engineering domains use canonical functional forms repeatedly. Pre-built model functions streamline workflows by encapsulating domain expertise:
Exponential Models (EXP_GROWTH, EXP_DECAY, EXP_ADVANCED): Model radioactive decay, population dynamics, capacitor discharge, and other processes exhibiting constant relative rates of change.
Sigmoid Growth Models (GROWTH_SIGMOID): Capture S-shaped growth curves with saturation, such as logistic growth, Gompertz curves, and Richards functions. Widely used in biology, epidemiology, and market adoption forecasting.
Enzyme Kinetics (ENZYME_BASIC, ENZYME_INHIBIT): Fit Michaelis-Menten, competitive/noncompetitive inhibition, and Hill equation models. Essential for biochemistry and pharmacology.
Dose-Response Curves (DOSE_RESPONSE): Model the relationship between stimulus and biological response, including EC50/IC50 calculations critical for drug development and toxicology.
Adsorption Isotherms (ADSORPTION): Fit Langmuir, Freundlich, and BET models to describe surface adsorption phenomena in materials science and environmental engineering.
Binding Models (BINDING_MODEL): Quantify molecular interactions and equilibrium constants in biophysics and analytical chemistry.
Peak Functions (SPECTRO_PEAKS, CHROMA_PEAKS, PEAK_ASYM): Fit Gaussian, Lorentzian, Voigt profiles, and asymmetric peaks for spectroscopy (NMR, IR, Raman) and chromatography.
Waveforms (WAVEFORM): Fit sinusoidal, damped oscillatory, and periodic functions for signal processing and vibration analysis.
Polynomial and Power Laws (POLY_BASIC, GROWTH_POWER): Model polynomial trends and power-law relationships (scaling, allometry).
Statistical Distributions (STAT_DISTRIB, STAT_PARETO): Fit probability distributions to empirical data for statistical inference.
Rheology (RHEOLOGY): Model the flow and deformation behavior of complex fluids (Newtonian, Bingham plastic, power-law fluids).
Other Specialized Domains: AGRICULTURE, ELECTRO_ION, MISC_PIECEWISE provide curated models for niche applications.
Model Selection and Validation
Choosing the right model is both an art and a science. Overfitting occurs when a model captures noise rather than signal (high variance, poor generalization). Underfitting occurs when the model is too simple to capture the true relationship (high bias). Tools for model selection include:
- Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC): Penalize complexity while rewarding goodness-of-fit.
- Cross-validation: Assess predictive performance on held-out data.
- Residual analysis: Examine patterns in y_i - f(x_i; \hat{\theta}) to detect systematic deviations.
Native Excel Capabilities
Excel provides several built-in curve fitting tools, but they are primarily limited to simple cases:
LINEST: Fits linear and polynomial models using least squares. Returns regression coefficients and statistics but is limited to models that are linear in their parameters.
LOGEST: Fits exponential models of the form y = b \cdot m^x by linearizing via logarithmic transformation.
Trendline: Adds fitted curves to charts (linear, polynomial, exponential, logarithmic, power). Provides R^2 values and equation display, but lacks flexibility for custom models or parameter constraints.
Solver Add-in: Can minimize the sum of squared residuals for arbitrary models but requires manual setup of objective cells and is cumbersome for routine fitting tasks.
Limitations: Native Excel tools cannot handle complex nonlinear models, parameter constraints, uncertainty propagation, or model composition. They lack the algorithmic sophistication of modern nonlinear optimizers like Levenberg-Marquardt or trust-region methods.
Third-Party Excel Add-ins
XLSTAT: Comprehensive statistical software with advanced regression capabilities, including nonlinear regression, weighted least squares, and robust fitting methods.
SigmaPlot: Offers extensive curve fitting with a library of over 100 built-in equations, automatic initial parameter estimation, and detailed goodness-of-fit statistics. Popular in scientific research.
TableCurve 2D/3D (SYSTAT): Automatically tests thousands of model equations to find the best fit. Ideal for exploratory data analysis when the functional form is unknown.
DataFit: Specializes in nonlinear regression with an extensive model library and statistical output. Allows custom model definition.
Least Squares
| Tool | Description |
|---|---|
| CA_CURVE_FIT | Fit an arbitrary symbolic model to data using CasADi and automatic differentiation. |
| CURVE_FIT | Fit a model expression to xdata, ydata using scipy.optimize.curve_fit. |
| LM_FIT | Fit data using lmfit’s built-in models with optional model composition. |
| MINUIT_FIT | Fit an arbitrary model expression to data using iminuit least-squares minimization with uncertainty estimates. |
Models
| Tool | Description |
|---|---|
| ADSORPTION | Fits adsorption models to data using scipy.optimize.curve_fit. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html for details. |
| AGRICULTURE | Fits agriculture models to data using scipy.optimize.curve_fit. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html for details. |
| BINDING_MODEL | Fits binding_model models to data using scipy.optimize.curve_fit. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html for details. |
| CHROMA_PEAKS | Fits chroma_peaks models to data using scipy.optimize.curve_fit. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html for details. |
| DOSE_RESPONSE | Fits dose_response models to data using scipy.optimize.curve_fit. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html for details. |
| ELECTRO_ION | Fits electro_ion models to data using scipy.optimize.curve_fit. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html for details. |
| ENZYME_BASIC | Fits enzyme_basic models to data using scipy.optimize.curve_fit. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html for details. |
| ENZYME_INHIBIT | Fits enzyme_inhibit models to data using scipy.optimize.curve_fit. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html for details. |
| EXP_ADVANCED | Fits exp_advanced models to data using scipy.optimize.curve_fit. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html for details. |
| EXP_DECAY | Fits exp_decay models to data using scipy.optimize.curve_fit. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html for details. |
| EXP_GROWTH | Fits exponential growth models to data using scipy.optimize.curve_fit. |
| GROWTH_POWER | Fits growth_power models to data using scipy.optimize.curve_fit. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html for details. |
| GROWTH_SIGMOID | Fits growth_sigmoid models to data using scipy.optimize.curve_fit. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html for details. |
| MISC_PIECEWISE | Fits misc_piecewise models to data using scipy.optimize.curve_fit. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html for details. |
| PEAK_ASYM | Fits peak_asym models to data using scipy.optimize.curve_fit. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html for details. |
| POLY_BASIC | Fits poly_basic models to data using scipy.optimize.curve_fit. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html for details. |
| RHEOLOGY | Fits rheology models to data using scipy.optimize.curve_fit. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html for details. |
| SPECTRO_PEAKS | Fits spectro_peaks models to data using scipy.optimize.curve_fit. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html for details. |
| STAT_DISTRIB | Fits stat_distrib models to data using scipy.optimize.curve_fit. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html for details. |
| STAT_PARETO | Fits stat_pareto models to data using scipy.optimize.curve_fit. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html for details. |
| WAVEFORM | Fits waveform models to data using scipy.optimize.curve_fit. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html for details. |