CURVE_FIT
Overview
The CURVE_FIT function performs nonlinear least squares regression to fit a user-defined model function to observed data. This is essential for modeling relationships that cannot be captured by simple linear regression, such as exponential decay, power laws, or Gaussian distributions.
This implementation wraps scipy.optimize.curve_fit from the SciPy library. The function minimizes the sum of squared residuals between the observed data and the model predictions:
\min_{\theta} \sum_{i=1}^{M} \left( y_i - f(x_i, \theta) \right)^2
where f(x, \theta) is the model function, \theta represents the parameters to be optimized, and (x_i, y_i) are the data points.
The function supports three optimization algorithms via the curve_fit_method parameter:
- Levenberg-Marquardt (lm): The default method for unconstrained problems. It combines gradient descent and Gauss-Newton methods, making it robust for a wide range of problems. Cannot be used with parameter bounds.
- Trust Region Reflective (trf): Suitable for large-scale problems and supports box constraints on parameters. This method is automatically selected when bounds are provided.
- Dogleg Box (dogbox): A trust-region method particularly effective when the number of observations is small relative to the number of parameters.
The function returns fitted parameter values along with their standard errors. Standard errors are computed from the covariance matrix as \sigma_i = \sqrt{\text{diag}(C)_i}, where C is the estimated covariance matrix of the parameters. These errors represent the uncertainty in parameter estimates based on a linear approximation around the optimal solution.
The model expression is specified as a Python lambda string (e.g., lambda x, a, b: a * x**b for a power law). The function automatically extracts parameter names from the lambda signature and supports common mathematical functions including trigonometric, exponential, and logarithmic operations.
For more details on the underlying algorithm and advanced options, see the SciPy curve_fit documentation and the related least_squares function.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=CURVE_FIT(model_expr, xdata, ydata, guess, bounds, curve_fit_method)
model_expr(str, required): Model expression as a Python lambda function string (e.g. lambda x, a, b for a power law)xdata(list[list], required): Independent variable values as a 2D list (column vector with at least two rows)ydata(list[list], required): Dependent variable values as a 2D list matching xdata lengthguess(list[list], optional, default: null): Initial parameter values as a single-row 2D list (optional)bounds(list[list], optional, default: null): Parameter bounds as lower and upper values (optional)curve_fit_method(str, optional, default: null): Solver algorithm (lm, trf, or dogbox) (optional)
Returns (list[list] or str): 2D list where the first row contains parameter names (list[str]), the second row contains the fitted values (list[float]), and the third row contains standard errors (list[float], zeros if unavailable). Returns an error message string on failure.
Examples
Example 1: Power law fit y = a * x^b
Inputs:
| model_expr | xdata | ydata |
|---|---|---|
| lambda x, a, b: a * x**b | 1 | 2 |
| 2 | 5.6568542495 | |
| 3 | 10.3923048454 | |
| 4 | 16 | |
| 5 | 22.360679775 |
Excel formula:
=CURVE_FIT("lambda x, a, b: a * x**b", {1;2;3;4;5}, {2;5.6568542495;10.3923048454;16;22.360679775})
Expected output:
| a | b |
|---|---|
| 2 | 1.5 |
| 0 | 0 |
Example 2: Exponential association with TRF method
Inputs:
| model_expr | xdata | ydata | curve_fit_method |
|---|---|---|---|
| lambda x, A, B: A * (1 - exp(-B * x)) | 0 | 0 | trf |
| 1 | 2.5918177932 | ||
| 2 | 4.5118836391 | ||
| 3 | 5.9343034026 | ||
| 4 | 6.9880578809 | ||
| 5 | 7.7686983985 |
Excel formula:
=CURVE_FIT("lambda x, A, B: A * (1 - exp(-B * x))", {0;1;2;3;4;5}, {0;2.5918177932;4.5118836391;5.9343034026;6.9880578809;7.7686983985}, "trf")
Expected output:
| A | B |
|---|---|
| 10 | 0.3 |
| 0 | 0 |
Example 3: Gaussian peak with initial guess
Inputs:
| model_expr | xdata | ydata | guess | ||
|---|---|---|---|---|---|
| lambda x, A, x0, sigma: A * exp(-((x - x0)**2) / (2*sigma**2)) | -3 | 0.0555449827 | 4.5 | 0.2 | 0.9 |
| -2 | 0.6766764162 | ||||
| -1 | 3.0326532986 | ||||
| 0 | 5 | ||||
| 1 | 3.0326532986 | ||||
| 2 | 0.6766764162 | ||||
| 3 | 0.0555449827 |
Excel formula:
=CURVE_FIT("lambda x, A, x0, sigma: A * exp(-((x - x0)**2) / (2*sigma**2))", {-3;-2;-1;0;1;2;3}, {0.0555449827;0.6766764162;3.0326532986;5;3.0326532986;0.6766764162;0.0555449827}, {4.5,0.2,0.9})
Expected output:
| A | x0 | sigma |
|---|---|---|
| 5 | 0 | 1 |
| 0 | 0 | 0 |
Example 4: Sine wave fit with bounds using dogbox
Inputs:
| model_expr | xdata | ydata | guess | bounds | curve_fit_method | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| lambda x, A, omega, phi, y0: A * sin(omega*x + phi) + y0 | 0 | 0 | 1.5 | 1 | 0 | 0 | 0 | 0.5 | -0.5 | -0.5 | dogbox |
| 1 | 1.7320508076 | 3 | 2.5 | 0.5 | 0.5 | ||||||
| 2 | 1.7320508076 | ||||||||||
| 3 | 0 | ||||||||||
| 4 | -1.7320508076 | ||||||||||
| 5 | -1.7320508076 | ||||||||||
| 6 | 0 |
Excel formula:
=CURVE_FIT("lambda x, A, omega, phi, y0: A * sin(omega*x + phi) + y0", {0;1;2;3;4;5;6}, {0;1.7320508076;1.7320508076;0;-1.7320508076;-1.7320508076;0}, {1.5,1,0,0}, {0,0.5,-0.5,-0.5;3,2.5,0.5,0.5}, "dogbox")
Expected output:
| A | omega | phi | y0 |
|---|---|---|---|
| 2 | 1.0472 | 0 | 0 |
| 0 | 0 | 0 | 0 |
Python Code
from scipy.optimize import curve_fit as scipy_curve_fit
from scipy import special as sc
import math
import inspect
import numpy as np
import re
def curve_fit(model_expr, xdata, ydata, guess=None, bounds=None, curve_fit_method=None):
"""
Fit a model expression to xdata, ydata using scipy.optimize.curve_fit.
See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html
This example function is provided as-is without any representation of accuracy.
Args:
model_expr (str): Model expression as a Python lambda function string (e.g. lambda x, a, b for a power law)
xdata (list[list]): Independent variable values as a 2D list (column vector with at least two rows)
ydata (list[list]): Dependent variable values as a 2D list matching xdata length
guess (list[list], optional): Initial parameter values as a single-row 2D list (optional) Default is None.
bounds (list[list], optional): Parameter bounds as lower and upper values (optional) Default is None.
curve_fit_method (str, optional): Solver algorithm (lm, trf, or dogbox) (optional) Valid options: Levenberg-Marquardt (LM), Trust Region Reflective (TRF), Dogleg Box (DOGBOX). Default is None.
Returns:
list[list] or str: 2D list where the first row contains parameter names (list[str]), the second row contains the fitted values (list[float]), and the third row contains standard errors (list[float], zeros if unavailable). Returns an error message string on failure.
"""
def _ensure_2d_list(arg):
"""
Validate that arg is a 2D list with at least two rows and return a 1-D numpy
array (float64) of the first column values.
"""
if not isinstance(arg, list) or len(arg) < 2:
raise ValueError("must be a 2D list with at least two rows")
vals = []
for i, row in enumerate(arg):
if not isinstance(row, list) or len(row) == 0:
raise ValueError(f"row {i} must be a non-empty list")
try:
vals.append(float(row[0]))
except Exception:
raise ValueError(f"row {i} contains non-numeric value")
return np.asarray(vals, dtype=np.float64)
if not isinstance(model_expr, str):
return "Invalid input: model_expr must be a string."
if model_expr.strip() == "":
return "Invalid input: model_expr must be a non-empty string."
if "x" not in model_expr:
return "Invalid input: model_expr must reference variable x (e.g., 'lambda x, a, b: a * x**b')."
# Auto-convert caret (^) to double asterisk (**) for exponentiation ONCE
model_expr = re.sub(r'\^', '**', model_expr)
# Build safe globals dictionary following func_arg_spec.md pattern
safe_globals = {
"math": math,
"np": np,
"numpy": np,
"sc": sc,
"__builtins__": {},
}
# Add all math module functions
safe_globals.update({
name: getattr(math, name)
for name in dir(math)
if not name.startswith("_")
})
# Add common numpy/math function aliases
safe_globals.update({
"sin": np.sin,
"cos": np.cos,
"tan": np.tan,
"asin": np.arcsin,
"arcsin": np.arcsin,
"acos": np.arccos,
"arccos": np.arccos,
"atan": np.arctan,
"arctan": np.arctan,
"sinh": np.sinh,
"cosh": np.cosh,
"tanh": np.tanh,
"exp": np.exp,
"log": np.log,
"ln": np.log,
"log10": np.log10,
"sqrt": np.sqrt,
"abs": np.abs,
"pow": np.power,
"pi": math.pi,
"e": math.e,
"inf": math.inf,
"nan": math.nan,
})
# Evaluate the model expression to get the callable
try:
model_func = eval(model_expr, safe_globals)
except Exception as exc:
return f"Error evaluating model_expr: {exc}"
if not callable(model_func):
return "Invalid input: model_expr must evaluate to a callable function."
# Extract parameter names from the model function signature (skip the first arg, which is 'x')
try:
sig = inspect.signature(model_func)
param_names = [p.name for i, p in enumerate(sig.parameters.values()) if i > 0]
except Exception:
return "Internal error: unable to inspect model function signature."
n_params = len(param_names)
try:
x_array = _ensure_2d_list(xdata)
except ValueError as e:
return f"Invalid input for xdata: {e}"
try:
y_array = _ensure_2d_list(ydata)
except ValueError as e:
return f"Invalid input for ydata: {e}"
if x_array.shape[0] != y_array.shape[0]:
return "Invalid input: xdata and ydata must have the same number of rows."
# Helper function to normalize 2D list to handle Excel's conversion of single-element 2D lists
def normalize_to_2d_list(value):
"""Convert a scalar or 1D list to a 2D list for consistent processing."""
if not isinstance(value, list):
return [[value]]
return value
# Process initial guess
p0 = None
if guess is not None:
guess = normalize_to_2d_list(guess)
if not isinstance(guess, list) or len(guess) == 0 or not isinstance(guess[0], list):
return f"Invalid input: guess must be a 2D list (e.g. [[{','.join(param_names)}]])."
try:
p0_raw = guess[0]
except Exception as e:
return f"Invalid guess structure: {e}"
try:
p0_arr = np.asarray(p0_raw, dtype=np.float64).ravel()
p0 = tuple(float(x) for x in p0_arr.tolist())
except Exception as e:
return f"Invalid guess values: {e}"
if len(p0) != n_params:
return f"Invalid input: guess must contain {n_params} values matching the model parameters."
else:
# scipy requires p0 to determine number of parameters
# Use 1.0 as default for all parameters as a reasonable starting point
p0 = tuple([1.0] * n_params)
# Process bounds
bounds_tuple = None
if bounds is not None:
if not isinstance(bounds, list) or len(bounds) != 2:
return f"Invalid input: bounds must be a 2D list [[lower1,...],[upper1,...]]."
if not isinstance(bounds[0], list) or not isinstance(bounds[1], list):
return f"Invalid input: bounds must be a 2D list [[lower1,...],[upper1,...]]."
try:
raw_lower = bounds[0]
raw_upper = bounds[1]
if len(raw_lower) != n_params or len(raw_upper) != n_params:
return f"Invalid input: bounds must contain {n_params} values for lower and {n_params} for upper."
lower = []
upper = []
for v in raw_lower:
if v is None:
lower.append(-np.inf)
else:
lower.append(float(v))
for v in raw_upper:
if v is None:
upper.append(np.inf)
else:
upper.append(float(v))
except Exception as e:
return f"Invalid bounds values: {e}"
bounds_tuple = (np.asarray(lower, dtype=np.float64), np.asarray(upper, dtype=np.float64))
# Validate method parameter
allowed_methods = {"lm", "trf", "dogbox"}
method_value = None
if curve_fit_method is not None:
if isinstance(curve_fit_method, str):
method_candidate = curve_fit_method.strip().lower()
else:
return "Invalid input: curve_fit_method must be one of 'lm', 'trf', or 'dogbox'."
if method_candidate not in allowed_methods:
return "Invalid input: curve_fit_method must be one of 'lm', 'trf', or 'dogbox'."
method_value = method_candidate
# Validate method vs bounds compatibility
if method_value == 'lm' and bounds_tuple is not None:
return "Invalid input: curve_fit_method='lm' cannot be used with bounds; use 'trf' or 'dogbox' instead."
# With lm, number of observations (M) must be >= number of parameters (N)
if method_value == 'lm' and x_array.shape[0] < n_params:
return f"Invalid input: curve_fit_method='lm' requires number of observations >= number of parameters ({x_array.shape[0]} < {n_params})."
# Wrap the model to ensure outputs are float64
def _model(x, *params):
try:
result = model_func(x, *params)
return np.asarray(result, dtype=np.float64)
except Exception as e:
raise ValueError(f"Model evaluation error: {e}")
try:
curve_fit_kwargs = {'maxfev': 10000}
if p0 is not None:
curve_fit_kwargs['p0'] = p0
if bounds_tuple is not None:
curve_fit_kwargs['bounds'] = bounds_tuple
if method_value is not None:
curve_fit_kwargs['method'] = method_value
popt, pcov = scipy_curve_fit(_model, x_array, y_array, **curve_fit_kwargs)
except Exception as e:
return f"Error during curve_fit: {e}"
try:
fitted_vals = [float(v) for v in popt]
except Exception as e:
return f"Result parsing error: {e}"
for v in fitted_vals:
if math.isnan(v) or math.isinf(v):
return "Fitting produced invalid numeric values (NaN or inf)."
# Calculate standard errors from covariance matrix
# If covariance cannot be estimated, return zeros for standard errors
std_errors = [0.0] * n_params
try:
if pcov is not None and np.isfinite(pcov).all():
std_errors = [float(v) for v in np.sqrt(np.diag(pcov))]
except Exception:
# Keep default zeros if calculation fails
pass
return [param_names, fitted_vals, std_errors]