CA_CURVE_FIT

Overview

The CA_CURVE_FIT function performs nonlinear least-squares curve fitting using CasADi, an open-source tool for nonlinear optimization and algorithmic differentiation. Unlike traditional curve fitting methods that require explicit derivative formulas, this function leverages automatic differentiation (AD) to compute exact gradients of arbitrary symbolic expressions, enabling robust optimization of user-defined models.

CasADi implements both forward and reverse mode AD on expression graphs to construct gradients, Jacobians, and Hessians. This approach eliminates numerical differentiation errors and significantly improves solver convergence. The function uses CasADi’s sequential quadratic programming (SQP) solver to minimize the sum of squared residuals between observed data and model predictions.

The objective function minimizes the weighted residual sum of squares:

\min_{\theta} \sum_{i=1}^{n} w_i \left( y_i - f(x_i, \theta) \right)^2

where \theta represents the model parameters, f(x, \theta) is the user-defined model expression, y_i are observed values, x_i are predictor variables, and w_i are optional weights.

The function supports arbitrary mathematical expressions using common operations (exp, log, sqrt, sin, cos, tan, abs) and constants (pi, e). Parameter bounds can be specified to constrain the optimization, and weighted least-squares is available for heteroscedastic data. Standard errors are computed from the covariance matrix derived from the Jacobian at the optimal solution. For more details on CasADi’s optimization capabilities, see the CasADi documentation and the GitHub repository.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=CA_CURVE_FIT(model_expr, x_data, y_data, param_names, param_guess, param_bounds, weights)
  • model_expr (str, required): The model expression as a string (e.g., “ax[0] + bx[0]**2 + c”).
  • x_data (list[list], required): The predictor (independent) variables as a 2D list where each row is a sample.
  • y_data (list[list], required): The observed (dependent) variables as a 2D list where each row is [[y_i]].
  • param_names (list[list], required): The parameter names in the model expression as a 2D list.
  • param_guess (list[list], required): Initial guess for the parameters as a 2D list.
  • param_bounds (list[list], optional, default: null): Optional parameter bounds as a 2D list [[lower_1, lower_2, …], [upper_1, upper_2, …]].
  • weights (list[list], optional, default: null): Optional data point weights for weighted least-squares as a 2D list [[w_1], [w_2], …].

Returns (list[list]): 2D list [param_names, values, std_errors], or error message string.

Examples

**Example 1: Linear fit y = a*x + b**

Inputs:

model_expr x_data y_data param_names param_guess
a*x[0] + b 1 3 a b 1 1
2 5
3 7
4 9

Excel formula:

=CA_CURVE_FIT("a*x[0] + b", {1;2;3;4}, {3;5;7;9}, {"a","b"}, {1,1})

Expected output:

a b
2 1
4.864753555590493e-16 1.3322676295501877e-15

Example 2: Quadratic fit y = ax^2 + bx + c

Inputs:

model_expr x_data y_data param_names param_guess
a*x[0]**2 + b*x[0] + c 0 1 a b c 1 1 1
1 2
2 5
3 10

Excel formula:

=CA_CURVE_FIT("a*x[0]**2 + b*x[0] + c", {0;1;2;3}, {1;2;5;10}, {"a","b","c"}, {1,1,1})

Expected output:

a b c
1 0 1
1.7235296186091101e-15 5.395501143821959e-15 3.359777747954008e-15

Example 3: Linear fit with parameter bounds

Inputs:

model_expr x_data y_data param_names param_guess param_bounds
m*x[0] + c 0 0.5 m c 1 0.5 0.5 0
1 1.5 2 1
2 2.5
3 3.5

Excel formula:

=CA_CURVE_FIT("m*x[0] + c", {0;1;2;3}, {0.5;1.5;2.5;3.5}, {"m","c"}, {1,0.5}, {0.5,0;2,1})

Expected output:

m c
1.000000000251489 0.4999999996221758
1.7782997487017766e-10 3.326894195314123e-10

Example 4: Weighted linear fit with equal weights

Inputs:

model_expr x_data y_data param_names param_guess weights
a*x[0] + b 1 2 a b 2 0 1
2 4 1
3 6 1
4 8 1
5 10 1

Excel formula:

=CA_CURVE_FIT("a*x[0] + b", {1;2;3;4;5}, {2;4;6;8;10}, {"a","b"}, {2,0}, {1;1;1;1;1})

Expected output:

a b
2 0
0 0

Python Code

import casadi as ca
import math
import numpy as np

def ca_curve_fit(model_expr, x_data, y_data, param_names, param_guess, param_bounds=None, weights=None):
    """
    Fit an arbitrary symbolic model to data using CasADi and automatic differentiation.

    See: https://web.casadi.org/

    This example function is provided as-is without any representation of accuracy.

    Args:
        model_expr (str): The model expression as a string (e.g., "a*x[0] + b*x[0]**2 + c").
        x_data (list[list]): The predictor (independent) variables as a 2D list where each row is a sample.
        y_data (list[list]): The observed (dependent) variables as a 2D list where each row is [[y_i]].
        param_names (list[list]): The parameter names in the model expression as a 2D list.
        param_guess (list[list]): Initial guess for the parameters as a 2D list.
        param_bounds (list[list], optional): Optional parameter bounds as a 2D list [[lower_1, lower_2, ...], [upper_1, upper_2, ...]]. Default is None.
        weights (list[list], optional): Optional data point weights for weighted least-squares as a 2D list [[w_1], [w_2], ...]. Default is None.

    Returns:
        list[list]: 2D list [param_names, values, std_errors], or error message string.
    """
    def to2d(x):
        """Convert scalar to 2D list if needed."""
        return [[x]] if not isinstance(x, list) else x

    # Normalize single-element 2D lists to scalars for these parameters
    param_names = to2d(param_names)
    param_guess = to2d(param_guess)

    try:
        # Validate inputs
        if not isinstance(model_expr, str) or not model_expr.strip():
            return "Invalid input: model_expr must be a non-empty string"
        if not isinstance(x_data, list) or len(x_data) < 2:
            return "Invalid input: x_data must be a 2D list with at least 2 data points"
        if not isinstance(y_data, list) or len(y_data) < 2:
            return "Invalid input: y_data must be a 2D list with at least 2 data points"

        # Validate x_data structure
        if not isinstance(x_data[0], list) or len(x_data[0]) == 0:
            return "Invalid input: x_data rows must be non-empty lists"

        # Validate y_data structure
        if not isinstance(y_data[0], list) or len(y_data[0]) == 0:
            return "Invalid input: y_data rows must be non-empty lists"

        if len(x_data) != len(y_data):
            return "Invalid input: x_data and y_data must have the same number of rows"

        # Extract parameter names from the 2D list
        if not param_names or not param_names[0]:
            return "Invalid input: param_names cannot be empty"
        pnames = param_names[0] if isinstance(param_names[0], list) else param_names
        pnames = [str(p) for p in pnames]  # Ensure all parameter names are strings
        n_params = len(pnames)

        # Extract initial guess from 2D list
        if not param_guess or not param_guess[0]:
            return "Invalid input: param_guess cannot be empty"
        p0_list = param_guess[0] if isinstance(param_guess[0], list) else param_guess
        if len(p0_list) != n_params:
            return f"Invalid input: param_guess must have {n_params} values"

        # Convert expression: replace ^ with **
        expr_str = str(model_expr).replace("^", "**")

        # Create symbolic variables
        x_sym = ca.SX.sym("x", len(x_data[0]))
        p_sym = ca.SX.sym("p", n_params)

        # Create namespace for expression evaluation with math functions
        namespace = {
            "x": x_sym,
            "exp": ca.exp,
            "log": ca.log,
            "sqrt": ca.sqrt,
            "sin": ca.sin,
            "cos": ca.cos,
            "tan": ca.tan,
            "abs": ca.fabs,
            "pi": math.pi,
            "e": math.e,
        }
        for i, pname in enumerate(pnames):
            namespace[str(pname)] = p_sym[i]

        # Parse and evaluate the model expression
        try:
            model_fn = ca.Function("model", [x_sym, p_sym], [eval(expr_str, {"__builtins__": {}}, namespace)])
        except Exception as e:
            return f"Invalid model expression: {str(e)}"

        # Extract numeric data
        try:
            x_vals = [[float(val) for val in row] for row in x_data]
            y_vals = [float(row[0]) for row in y_data]
            p0_vals = [float(v) for v in p0_list]
        except (ValueError, TypeError, IndexError) as e:
            return f"Invalid numeric input: {e}"

        # Check for NaN or inf in input data
        for row in x_vals:
            for val in row:
                if math.isnan(val) or math.isinf(val):
                    return "Invalid input: x_data contains NaN or inf"
        for val in y_vals:
            if math.isnan(val) or math.isinf(val):
                return "Invalid input: y_data contains NaN or inf"
        for val in p0_vals:
            if math.isnan(val) or math.isinf(val):
                return "Invalid input: param_guess contains NaN or inf"

        # Handle weights
        if weights is not None:
            try:
                w_vals = [float(row[0]) for row in weights]
                if len(w_vals) != len(y_vals):
                    return "Invalid input: weights must have the same length as y_data"
                for w in w_vals:
                    if w <= 0 or math.isnan(w) or math.isinf(w):
                        return "Invalid input: weights must be positive and not NaN/inf"
            except (ValueError, TypeError, IndexError):
                return "Invalid input: weights must be a 2D list of positive numbers"
        else:
            w_vals = [1.0] * len(y_vals)

        # Handle bounds
        p_lower = [-float('inf')] * n_params
        p_upper = [float('inf')] * n_params
        if param_bounds is not None:
            try:
                if not isinstance(param_bounds, list) or len(param_bounds) != 2:
                    return "Invalid input: param_bounds must be a 2D list [[lower], [upper]]"
                if not isinstance(param_bounds[0], list) or not isinstance(param_bounds[1], list):
                    return "Invalid input: param_bounds must be a 2D list [[lower], [upper]]"
                if len(param_bounds[0]) != n_params or len(param_bounds[1]) != n_params:
                    return f"Invalid input: param_bounds must have {n_params} values for each bound"
                for i, (lb, ub) in enumerate(zip(param_bounds[0], param_bounds[1])):
                    if lb is not None:
                        lb_val = float(lb)
                        if not math.isfinite(lb_val):
                            return f"Invalid input: lower bound for parameter {pnames[i]} must be finite"
                        p_lower[i] = lb_val
                    if ub is not None:
                        ub_val = float(ub)
                        if not math.isfinite(ub_val):
                            return f"Invalid input: upper bound for parameter {pnames[i]} must be finite"
                        p_upper[i] = ub_val
                    if p_lower[i] > p_upper[i]:
                        return f"Invalid input: lower bound > upper bound for parameter {pnames[i]}"
            except (ValueError, TypeError) as e:
                return f"Invalid input: param_bounds must contain numeric values ({str(e)})"

        # Build optimization problem
        p = ca.SX.sym("p", n_params)
        cost = 0
        for i, (x_row, y_val, w) in enumerate(zip(x_vals, y_vals, w_vals)):
            y_pred = model_fn(x_row, p)
            residual = y_pred - y_val
            cost += w * residual ** 2

        # Create NLP with appropriate solver options
        nlp = {"x": p, "f": cost}
        opts = {
            'qpsol': 'qrqp',
            'qpsol_options': {'print_iter': False, 'print_header': False},
            'print_time': 0
        }

        try:
            solver = ca.nlpsol("solver", "sqpmethod", nlp, opts)
        except Exception as e:
            return f"Solver initialization error: {str(e)}"

        # Solve
        try:
            result = solver(x0=p0_vals, lbx=p_lower, ubx=p_upper)
            p_opt_raw = result["x"]
            # Convert CasADi DM to Python list
            p_opt = [float(p_opt_raw[i]) for i in range(n_params)]
        except Exception as e:
            return f"Solver error: {str(e)}"

        # Verify results
        for i, val in enumerate(p_opt):
            if math.isnan(val) or math.isinf(val):
                return f"Optimization produced invalid value for parameter {pnames[i]}: {val}"

        # Calculate standard errors from Jacobian
        std_errors = None
        try:
            # Create Jacobian function for residuals
            p_sym = ca.SX.sym("p", n_params)
            residuals = ca.SX([])
            for x_row, y_val, w in zip(x_vals, y_vals, w_vals):
                y_pred = model_fn(x_row, p_sym)
                weighted_residual = ca.sqrt(w) * (y_pred - y_val)
                residuals = ca.vertcat(residuals, weighted_residual)

            # Create Jacobian function
            jacobian_func = ca.Function("jacobian", [p_sym], [ca.jacobian(residuals, p_sym)])

            # Evaluate Jacobian at optimal parameters
            J = jacobian_func(p_opt)
            J_dense = J.full()

            # Compute covariance matrix and standard errors
            # Standard error formula: sqrt(diag((J^T * J)^-1 * RSS / (n - p)))
            J_array = np.array(J_dense, dtype=float)
            n_data = len(y_vals)
            degrees_of_freedom = max(1, n_data - n_params)

            try:
                JtJ = np.dot(J_array.T, J_array)
                # Check if matrix is well-conditioned and full rank
                if np.linalg.matrix_rank(JtJ) == JtJ.shape[0] and np.isfinite(JtJ).all():
                    # Calculate residual sum of squares at optimal parameters
                    rss = 0.0
                    for x_row, y_val in zip(x_vals, y_vals):
                        y_pred_val = float(model_fn(x_row, p_opt))
                        rss += (y_pred_val - y_val) ** 2

                    # Calculate variance estimate
                    variance = rss / degrees_of_freedom

                    # Calculate covariance matrix
                    cov_matrix = np.linalg.inv(JtJ) * variance

                    if np.isfinite(cov_matrix).all():
                        std_errors = [float(np.sqrt(max(0.0, cov_matrix[i, i]))) for i in range(n_params)]
            except (np.linalg.LinAlgError, ZeroDivisionError):
                # Matrix is singular or computation failed
                pass
        except Exception:
            # If standard error calculation fails, continue without it
            pass

        return [pnames, p_opt, std_errors] if std_errors else [pnames, p_opt]

    except Exception as e:
        return f"Unexpected error: {str(e)}"

Online Calculator