CA_CURVE_FIT
Overview
The CA_CURVE_FIT function performs nonlinear least-squares curve fitting using CasADi, an open-source tool for nonlinear optimization and algorithmic differentiation. Unlike traditional curve fitting methods that require explicit derivative formulas, this function leverages automatic differentiation (AD) to compute exact gradients of arbitrary symbolic expressions, enabling robust optimization of user-defined models.
CasADi implements both forward and reverse mode AD on expression graphs to construct gradients, Jacobians, and Hessians. This approach eliminates numerical differentiation errors and significantly improves solver convergence. The function uses CasADi’s sequential quadratic programming (SQP) solver to minimize the sum of squared residuals between observed data and model predictions.
The objective function minimizes the weighted residual sum of squares:
\min_{\theta} \sum_{i=1}^{n} w_i \left( y_i - f(x_i, \theta) \right)^2
where \theta represents the model parameters, f(x, \theta) is the user-defined model expression, y_i are observed values, x_i are predictor variables, and w_i are optional weights.
The function supports arbitrary mathematical expressions using common operations (exp, log, sqrt, sin, cos, tan, abs) and constants (pi, e). Parameter bounds can be specified to constrain the optimization, and weighted least-squares is available for heteroscedastic data. Standard errors are computed from the covariance matrix derived from the Jacobian at the optimal solution. For more details on CasADi’s optimization capabilities, see the CasADi documentation and the GitHub repository.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=CA_CURVE_FIT(model_expr, x_data, y_data, param_names, param_guess, param_bounds, weights)
model_expr(str, required): The model expression as a string (e.g., “ax[0] + bx[0]**2 + c”).x_data(list[list], required): The predictor (independent) variables as a 2D list where each row is a sample.y_data(list[list], required): The observed (dependent) variables as a 2D list where each row is [[y_i]].param_names(list[list], required): The parameter names in the model expression as a 2D list.param_guess(list[list], required): Initial guess for the parameters as a 2D list.param_bounds(list[list], optional, default: null): Optional parameter bounds as a 2D list [[lower_1, lower_2, …], [upper_1, upper_2, …]].weights(list[list], optional, default: null): Optional data point weights for weighted least-squares as a 2D list [[w_1], [w_2], …].
Returns (list[list]): 2D list [param_names, values, std_errors], or error message string.
Examples
**Example 1: Linear fit y = a*x + b**
Inputs:
| model_expr | x_data | y_data | param_names | param_guess | ||
|---|---|---|---|---|---|---|
| a*x[0] + b | 1 | 3 | a | b | 1 | 1 |
| 2 | 5 | |||||
| 3 | 7 | |||||
| 4 | 9 |
Excel formula:
=CA_CURVE_FIT("a*x[0] + b", {1;2;3;4}, {3;5;7;9}, {"a","b"}, {1,1})
Expected output:
| a | b |
|---|---|
| 2 | 1 |
| 4.864753555590493e-16 | 1.3322676295501877e-15 |
Example 2: Quadratic fit y = ax^2 + bx + c
Inputs:
| model_expr | x_data | y_data | param_names | param_guess | ||||
|---|---|---|---|---|---|---|---|---|
| a*x[0]**2 + b*x[0] + c | 0 | 1 | a | b | c | 1 | 1 | 1 |
| 1 | 2 | |||||||
| 2 | 5 | |||||||
| 3 | 10 |
Excel formula:
=CA_CURVE_FIT("a*x[0]**2 + b*x[0] + c", {0;1;2;3}, {1;2;5;10}, {"a","b","c"}, {1,1,1})
Expected output:
| a | b | c |
|---|---|---|
| 1 | 0 | 1 |
| 1.7235296186091101e-15 | 5.395501143821959e-15 | 3.359777747954008e-15 |
Example 3: Linear fit with parameter bounds
Inputs:
| model_expr | x_data | y_data | param_names | param_guess | param_bounds | |||
|---|---|---|---|---|---|---|---|---|
| m*x[0] + c | 0 | 0.5 | m | c | 1 | 0.5 | 0.5 | 0 |
| 1 | 1.5 | 2 | 1 | |||||
| 2 | 2.5 | |||||||
| 3 | 3.5 |
Excel formula:
=CA_CURVE_FIT("m*x[0] + c", {0;1;2;3}, {0.5;1.5;2.5;3.5}, {"m","c"}, {1,0.5}, {0.5,0;2,1})
Expected output:
| m | c |
|---|---|
| 1.000000000251489 | 0.4999999996221758 |
| 1.7782997487017766e-10 | 3.326894195314123e-10 |
Example 4: Weighted linear fit with equal weights
Inputs:
| model_expr | x_data | y_data | param_names | param_guess | weights | ||
|---|---|---|---|---|---|---|---|
| a*x[0] + b | 1 | 2 | a | b | 2 | 0 | 1 |
| 2 | 4 | 1 | |||||
| 3 | 6 | 1 | |||||
| 4 | 8 | 1 | |||||
| 5 | 10 | 1 |
Excel formula:
=CA_CURVE_FIT("a*x[0] + b", {1;2;3;4;5}, {2;4;6;8;10}, {"a","b"}, {2,0}, {1;1;1;1;1})
Expected output:
| a | b |
|---|---|
| 2 | 0 |
| 0 | 0 |
Python Code
import casadi as ca
import math
import numpy as np
def ca_curve_fit(model_expr, x_data, y_data, param_names, param_guess, param_bounds=None, weights=None):
"""
Fit an arbitrary symbolic model to data using CasADi and automatic differentiation.
See: https://web.casadi.org/
This example function is provided as-is without any representation of accuracy.
Args:
model_expr (str): The model expression as a string (e.g., "a*x[0] + b*x[0]**2 + c").
x_data (list[list]): The predictor (independent) variables as a 2D list where each row is a sample.
y_data (list[list]): The observed (dependent) variables as a 2D list where each row is [[y_i]].
param_names (list[list]): The parameter names in the model expression as a 2D list.
param_guess (list[list]): Initial guess for the parameters as a 2D list.
param_bounds (list[list], optional): Optional parameter bounds as a 2D list [[lower_1, lower_2, ...], [upper_1, upper_2, ...]]. Default is None.
weights (list[list], optional): Optional data point weights for weighted least-squares as a 2D list [[w_1], [w_2], ...]. Default is None.
Returns:
list[list]: 2D list [param_names, values, std_errors], or error message string.
"""
def to2d(x):
"""Convert scalar to 2D list if needed."""
return [[x]] if not isinstance(x, list) else x
# Normalize single-element 2D lists to scalars for these parameters
param_names = to2d(param_names)
param_guess = to2d(param_guess)
try:
# Validate inputs
if not isinstance(model_expr, str) or not model_expr.strip():
return "Invalid input: model_expr must be a non-empty string"
if not isinstance(x_data, list) or len(x_data) < 2:
return "Invalid input: x_data must be a 2D list with at least 2 data points"
if not isinstance(y_data, list) or len(y_data) < 2:
return "Invalid input: y_data must be a 2D list with at least 2 data points"
# Validate x_data structure
if not isinstance(x_data[0], list) or len(x_data[0]) == 0:
return "Invalid input: x_data rows must be non-empty lists"
# Validate y_data structure
if not isinstance(y_data[0], list) or len(y_data[0]) == 0:
return "Invalid input: y_data rows must be non-empty lists"
if len(x_data) != len(y_data):
return "Invalid input: x_data and y_data must have the same number of rows"
# Extract parameter names from the 2D list
if not param_names or not param_names[0]:
return "Invalid input: param_names cannot be empty"
pnames = param_names[0] if isinstance(param_names[0], list) else param_names
pnames = [str(p) for p in pnames] # Ensure all parameter names are strings
n_params = len(pnames)
# Extract initial guess from 2D list
if not param_guess or not param_guess[0]:
return "Invalid input: param_guess cannot be empty"
p0_list = param_guess[0] if isinstance(param_guess[0], list) else param_guess
if len(p0_list) != n_params:
return f"Invalid input: param_guess must have {n_params} values"
# Convert expression: replace ^ with **
expr_str = str(model_expr).replace("^", "**")
# Create symbolic variables
x_sym = ca.SX.sym("x", len(x_data[0]))
p_sym = ca.SX.sym("p", n_params)
# Create namespace for expression evaluation with math functions
namespace = {
"x": x_sym,
"exp": ca.exp,
"log": ca.log,
"sqrt": ca.sqrt,
"sin": ca.sin,
"cos": ca.cos,
"tan": ca.tan,
"abs": ca.fabs,
"pi": math.pi,
"e": math.e,
}
for i, pname in enumerate(pnames):
namespace[str(pname)] = p_sym[i]
# Parse and evaluate the model expression
try:
model_fn = ca.Function("model", [x_sym, p_sym], [eval(expr_str, {"__builtins__": {}}, namespace)])
except Exception as e:
return f"Invalid model expression: {str(e)}"
# Extract numeric data
try:
x_vals = [[float(val) for val in row] for row in x_data]
y_vals = [float(row[0]) for row in y_data]
p0_vals = [float(v) for v in p0_list]
except (ValueError, TypeError, IndexError) as e:
return f"Invalid numeric input: {e}"
# Check for NaN or inf in input data
for row in x_vals:
for val in row:
if math.isnan(val) or math.isinf(val):
return "Invalid input: x_data contains NaN or inf"
for val in y_vals:
if math.isnan(val) or math.isinf(val):
return "Invalid input: y_data contains NaN or inf"
for val in p0_vals:
if math.isnan(val) or math.isinf(val):
return "Invalid input: param_guess contains NaN or inf"
# Handle weights
if weights is not None:
try:
w_vals = [float(row[0]) for row in weights]
if len(w_vals) != len(y_vals):
return "Invalid input: weights must have the same length as y_data"
for w in w_vals:
if w <= 0 or math.isnan(w) or math.isinf(w):
return "Invalid input: weights must be positive and not NaN/inf"
except (ValueError, TypeError, IndexError):
return "Invalid input: weights must be a 2D list of positive numbers"
else:
w_vals = [1.0] * len(y_vals)
# Handle bounds
p_lower = [-float('inf')] * n_params
p_upper = [float('inf')] * n_params
if param_bounds is not None:
try:
if not isinstance(param_bounds, list) or len(param_bounds) != 2:
return "Invalid input: param_bounds must be a 2D list [[lower], [upper]]"
if not isinstance(param_bounds[0], list) or not isinstance(param_bounds[1], list):
return "Invalid input: param_bounds must be a 2D list [[lower], [upper]]"
if len(param_bounds[0]) != n_params or len(param_bounds[1]) != n_params:
return f"Invalid input: param_bounds must have {n_params} values for each bound"
for i, (lb, ub) in enumerate(zip(param_bounds[0], param_bounds[1])):
if lb is not None:
lb_val = float(lb)
if not math.isfinite(lb_val):
return f"Invalid input: lower bound for parameter {pnames[i]} must be finite"
p_lower[i] = lb_val
if ub is not None:
ub_val = float(ub)
if not math.isfinite(ub_val):
return f"Invalid input: upper bound for parameter {pnames[i]} must be finite"
p_upper[i] = ub_val
if p_lower[i] > p_upper[i]:
return f"Invalid input: lower bound > upper bound for parameter {pnames[i]}"
except (ValueError, TypeError) as e:
return f"Invalid input: param_bounds must contain numeric values ({str(e)})"
# Build optimization problem
p = ca.SX.sym("p", n_params)
cost = 0
for i, (x_row, y_val, w) in enumerate(zip(x_vals, y_vals, w_vals)):
y_pred = model_fn(x_row, p)
residual = y_pred - y_val
cost += w * residual ** 2
# Create NLP with appropriate solver options
nlp = {"x": p, "f": cost}
opts = {
'qpsol': 'qrqp',
'qpsol_options': {'print_iter': False, 'print_header': False},
'print_time': 0
}
try:
solver = ca.nlpsol("solver", "sqpmethod", nlp, opts)
except Exception as e:
return f"Solver initialization error: {str(e)}"
# Solve
try:
result = solver(x0=p0_vals, lbx=p_lower, ubx=p_upper)
p_opt_raw = result["x"]
# Convert CasADi DM to Python list
p_opt = [float(p_opt_raw[i]) for i in range(n_params)]
except Exception as e:
return f"Solver error: {str(e)}"
# Verify results
for i, val in enumerate(p_opt):
if math.isnan(val) or math.isinf(val):
return f"Optimization produced invalid value for parameter {pnames[i]}: {val}"
# Calculate standard errors from Jacobian
std_errors = None
try:
# Create Jacobian function for residuals
p_sym = ca.SX.sym("p", n_params)
residuals = ca.SX([])
for x_row, y_val, w in zip(x_vals, y_vals, w_vals):
y_pred = model_fn(x_row, p_sym)
weighted_residual = ca.sqrt(w) * (y_pred - y_val)
residuals = ca.vertcat(residuals, weighted_residual)
# Create Jacobian function
jacobian_func = ca.Function("jacobian", [p_sym], [ca.jacobian(residuals, p_sym)])
# Evaluate Jacobian at optimal parameters
J = jacobian_func(p_opt)
J_dense = J.full()
# Compute covariance matrix and standard errors
# Standard error formula: sqrt(diag((J^T * J)^-1 * RSS / (n - p)))
J_array = np.array(J_dense, dtype=float)
n_data = len(y_vals)
degrees_of_freedom = max(1, n_data - n_params)
try:
JtJ = np.dot(J_array.T, J_array)
# Check if matrix is well-conditioned and full rank
if np.linalg.matrix_rank(JtJ) == JtJ.shape[0] and np.isfinite(JtJ).all():
# Calculate residual sum of squares at optimal parameters
rss = 0.0
for x_row, y_val in zip(x_vals, y_vals):
y_pred_val = float(model_fn(x_row, p_opt))
rss += (y_pred_val - y_val) ** 2
# Calculate variance estimate
variance = rss / degrees_of_freedom
# Calculate covariance matrix
cov_matrix = np.linalg.inv(JtJ) * variance
if np.isfinite(cov_matrix).all():
std_errors = [float(np.sqrt(max(0.0, cov_matrix[i, i]))) for i in range(n_params)]
except (np.linalg.LinAlgError, ZeroDivisionError):
# Matrix is singular or computation failed
pass
except Exception:
# If standard error calculation fails, continue without it
pass
return [pnames, p_opt, std_errors] if std_errors else [pnames, p_opt]
except Exception as e:
return f"Unexpected error: {str(e)}"