CURVE_FIT

Overview

The CURVE_FIT function is an Excel-oriented wrapper around SciPy’s scipy.optimize.curve_fit, enabling nonlinear least-squares fits by selecting a named model and providing paired x/y observations. Model functions, default bounds, and heuristic initial guesses are sourced from the repository’s curve_fit_models.json, so users can choose among numerous prebuilt functional forms without writing Python code. The optimization minimizes the residual sum of squares for the selected model:

\begin{align*} S(\theta) &= \sum_{i=1}^{M} \left(y_i - f(x_i;\theta)\right)^2 \\ f_{\text{Box-Lucas}}(x; a, b) &= a\left(1 - e^{-b x}\right) \\ f_{\text{Boltzmann}}(x; A_1, A_2, x_0, dx) &= \frac{A_1 - A_2}{1 + e^{(x-x_0)/dx}} + A_2 \\ f_{\text{ExpGrow2}}(x; y_0, A_1, t_1, A_2, t_2) &= y_0 + A_1 e^{x/t_1} + A_2 e^{x/t_2} \end{align*}

For more details on solver behavior, see the SciPy documentation . This example function is provided as-is without any representation of accuracy.

Usage

To use the function in Excel:


=CURVE_FIT(model_id, xdata, ydata, [initial_guess], [bounds], [curve_fit_method])

model_id (string (enum), required): Identifier of the model to fit. Valid options: Allometric1, Allometric2, Asym2Sig, Asymptotic1, BET, BETMod, Belehradek, Beta, BiDoseResp, BiHill, Bigaussian, Bingham, Biphasic, BlNeld, BlNeldSmp, BoltzIV, Boltzmann, BoxLucas1, BoxLucas1Mod, BoxLucas2, Bradley, CCE, Carreau, Chapman, CompInhib, Constant, Cross, Cubic, Curve Fitting Functions, Dhyperbl, DoseResp, DoubleBoltzmann, ECS, Exp1P1, Exp1P3Md, Exp1P4, Exp1P4Md, Exp1p2, Exp1p2md, Exp1p3, Exp2P, Exp2PMod1, Exp2PMod2, Exp3P1, Exp3P1Md, Exp3P2, ExpAssoc, ExpDec1, ExpDec2, ExpDec3, ExpDecay1, ExpDecay2, ExpDecay3, ExpGro1, ExpGro2, ExpGro3, ExpGrow1, ExpGrow2, ExpGrow3Dec2, ExpGrowDec, ExpLinear, Exponential, ExponentialCDF, Extreme, FarazdaghiHarris, FreundlichEXT, GCAS, GammaCDF, Gauss, GaussAmp, GaussMod, Gaussian, GaussianLorentz, Giddings, Goldman, Gumbel, Gunary, Herschel, Hill, Hill1, HillBurk, Holliday, Holliday1, Hyperbl, HyperbolaGen, HyperbolaMod, InvsPoly, Langevin, LangevinMod, LangmuirEXT1, LangmuirEXT2, Laplace, Line, Line3, LineExp, LineMod, Log2P1, Log2P2, Log3P1, LogNormal, Logarithm, Logistic, Logistic5, Logistpk, LognormalCDF, Lorentz, MYEGA, MichaelisMenten, MixedModelInhib, MnMolecular, MnMolecular1, Nelder, NoncompInhib, OneSiteBind, OneSiteComp, PWL2, PWL3, Parabola, Pareto, Pareto2, Peak Analyzer Functions, PearsonIV, PearsonVII, Poisson, Poly, Poly4, Poly5, Pow2P1, Pow2P2, Pow2P3, Power, Power0, Power1, Power2, PsdVoigt1, PsdVoigt2, Pulse, Rational0, Rational1, Rational2, Rational3, Rational4, Rational5, Rayleigh, Reciprocal, Reciprocal0, Reciprocal1, ReciprocalMod, RectHyperbola, SGompertz, SLogistic1, SLogistic2, SLogistic3, SRichards1, SRichards2, SWeibull1, SWeibull2, SawtoothWave, Shah, Sine, SineDamp, SineSqr, SquareWave, SquareWaveMod, Step, Stirling, SubstrateInhib, TwoSiteBind, TwoSiteComp, UncompInhib, VFT, Voigt, Weibull, Weibull3, WeibullCDF, YldFert, YldFert1.
xdata (2D list, required): Column vector with at least two rows containing the independent variable values. The first column is used; additional columns are ignored.
ydata (2D list, required): Column vector with at least two rows containing observed dependent variable values aligned with xdata.
initial_guess (2D list, optional): Single-row table specifying initial parameter values in the order expected by the selected model. If omitted, the wrapper computes a heuristic guess.
bounds (2D list, optional): Two-row table where the first row contains parameter lower bounds and the second row contains upper bounds. Use large magnitudes for effectively unbounded parameters.
curve_fit_method (string (enum), optional): Solver algorithm. Valid options: Levenberg-Marquardt (LM), Trust Region Reflective (TRF), Dogleg Box (DOGBOX). LM ignores bounds; TRF and DOGBOX support bound constraints.

The function returns a 2D list where the first row contains parameter names and the second row contains the fitted parameter values (floats). If validation fails or SciPy reports an error, a descriptive error message string is returned instead.

Examples

Example 1: Allometric Power Law Fit

Inputs:

xdata	ydata
1.0	2.000
2.0	5.657
3.0	10.392
4.0	16.000
5.0	22.361

Excel formula:


=CURVE_FIT("Allometric1", {1;2;3;4;5}, {2;5.656854;10.392305;16;22.36068})

Expected output:

a	b
2.000	1.500

The fitted curve recovers the original amplitude and exponent used to generate the synthetic data.

Example 2: Two-Phase Association with TRF Solver

Inputs:

xdata	ydata	method
0.0	0.000	trf
1.0	2.592
2.0	4.512
3.0	5.934
4.0	6.988
5.0	7.769

Excel formula:


=CURVE_FIT("ExpAssoc", {0;1;2;3;4;5}, {0;2.591818;4.511884;5.934303;6.988058;7.768698},,,"trf")

Expected output:

A	B
10.000	0.300

Using the trust-region reflective solver reproduces the amplitude and rate constants for the exponential association model.

Example 3: Gaussian Peak with Custom Initial Guess

Inputs:

xdata	ydata	initial_guess
-3.0	0.056	4.500	0.200	0.900
-2.0	0.677
-1.0	3.033
0.0	5.000
1.0	3.033
2.0	0.677
3.0	0.056

Excel formula:


=CURVE_FIT("Gauss", {-3;-2;-1;0;1;2;3}, {0.055545;0.676676;3.032653;5;3.032653;0.676676;0.055545}, {4.5,0.2,0.9})

Expected output:

A	x0	sigma
5.000	0.000	1.000

Supplying a reasonable initial guess helps the optimizer converge quickly to the true Gaussian parameters.

Example 4: Sine Wave with Bounds and DOGBOX Solver

Inputs:

xdata	ydata	initial_guess				bounds				method
0.0	0.000	1.500	1.000	0.000	0.000	0.000	0.500	-0.500	-0.500	dogbox
1.0	1.732					3.000	2.500	0.500	0.500
2.0	1.732
3.0	0.000
4.0	-1.732
5.0	-1.732
6.0	0.000

Excel formula:


=CURVE_FIT("Sine", {0;1;2;3;4;5;6}, {0;1.732051;1.732051;0;-1.732051;-1.732051;0}, {1.5,1,0,0}, {0,0.5,-0.5,-0.5;3,2.5,0.5,0.5}, "dogbox")

Expected output:

A	omega	phi	y0
2.000	1.047	0.000	0.000

The bounded DOGBOX solve maintains amplitude and offset limits while recovering the sine wave’s true parameters.

Python Code


from typing import List, Union, Optional
from scipy.optimize import curve_fit as scipy_curve_fit
from scipy import special as sc
import math
import inspect
import numpy as np
import json
from urllib.request import urlopen, Request
from urllib.error import URLError, HTTPError
import os
 
Number = Union[float, int]
 
def curve_fit(model_id: str,
                     xdata: List[List[Number]],
                     ydata: List[List[Number]],
                     initial_guess: Optional[List[List[Number]]] = None,
                     bounds: Optional[List[List[Number]]] = None,
                     curve_fit_method: Optional[str] = None) -> Union[List[List[Union[str, float]]], str]:
    """
    Fit a model specified by model_id to xdata, ydata using scipy.optimize.curve_fit.
 
    Args:
        model_id: String identifier for the model (e.g. "exp_gro_two", "exp_grow_two").
        xdata: 2D list where each row is a sample; must have at least two rows.
        ydata: 2D list matching xdata in length; each row contains the y value in its first element.
        initial_guess: Optional 2D list representing initial guess for parameters.
        bounds: Optional 2D list [[lower_list],[upper_list]] for parameters.
        curve_fit_method: Optional method string passed to curve_fit (e.g., 'trf', 'dogbox', 'lm').
 
    Returns:
        2D list where first row is parameter names and second row are the fitted values, or an error string on failure.
 
    This example function is provided as-is without any representation of accuracy.
    """
    def _ensure_2d_list(arg: Union[List[List[Number]], object]) -> np.ndarray:
        """
        Validate that arg is a 2D list with at least two rows and return a 1-D numpy
        array (float64) of the first column values. This wrapper intentionally does
        NOT support vector predictors (multi-column x rows) for now and will raise
        on single-sample inputs.
        """
        if not isinstance(arg, list) or len(arg) < 2:
            raise ValueError("must be a 2D list with at least two rows")
        vals = []
        for i, row in enumerate(arg):
            if not isinstance(row, list) or len(row) == 0:
                raise ValueError(f"row {i} must be a non-empty list")
            # We only accept the first element of each row as the scalar value
            try:
                vals.append(float(row[0]))
            except Exception:
                raise ValueError(f"row {i} contains non-numeric value")
        arr = np.asarray(vals, dtype=np.float64)
        return arr
 
    # Persistent cache attached to the function object so it survives across calls
    if not hasattr(curve_fit, "_cached_models"):
        curve_fit._cached_models = None
 
    def _load_models():
        """Load model definitions from JSON file and combine with function implementations."""
        # Use the persistent cache on the function object
        if curve_fit._cached_models is not None:
            return curve_fit._cached_models
 
        model_metadata = None
        
        # Prefer a local JSON file (useful for tests / offline development). If not
        # present, fall back to the remote URL used in production.
        try:
            local_path = os.path.join(os.path.dirname(__file__), "curve_fit_models.json")
            if os.path.exists(local_path):
                try:
                    with open(local_path, "r", encoding="utf-8") as fh:
                        model_metadata = json.load(fh)
                except ValueError as exc:
                    raise RuntimeError(f"Failed to parse local curve fit models file {local_path}: {exc}") from exc
        except NameError:
            # __file__ is not defined (e.g., in Pyodide), skip local file loading
            pass
        
        if model_metadata is None:
            json_url = "https://preview.python-functions.pages.dev/resources/solvers/optimization/curve_fitting/curve_fit/curve_fit_models.json"
            try:
                request = Request(json_url, headers={"User-Agent": "python-functions/curve-fit-loader"})
                with urlopen(request) as response:
                    payload = response.read().decode("utf-8")
                model_metadata = json.loads(payload)
            except (HTTPError, URLError, ValueError) as exc:
                raise RuntimeError(f"Failed to fetch curve fit models from {json_url}: {exc}") from exc
 
        def normalize_bounds(bounds_value):
            if not bounds_value:
                return None
            lower = bounds_value.get("lower")
            upper = bounds_value.get("upper")
            if lower is None or upper is None:
                return None
            return [list(lower), list(upper)]
 
        # Build lookup from JSON metadata only. Each JSON entry should include
        # 'model' and 'guess' as strings which we will eval below.
        metadata_lookup = {}
        for entry in model_metadata:
            name = entry.get("name")
            if not name:
                continue
            # Ensure keys exist; if model/guess/bounds are missing, keep them as None
            entry.setdefault("model", None)
            entry.setdefault("guess", None)
            entry.setdefault("bounds", None)
            metadata_lookup[name] = entry
 
        eval_globals = {"np": np, "sc": sc, "math": math}
        combined_models = {}
 
        for name, metadata in metadata_lookup.items():
            model_code = metadata.get("model")
            guess_code = metadata.get("guess")
            if not model_code or not guess_code:
                continue
            try:
                model_callable = eval(model_code, eval_globals)
                guess_callable = eval(guess_code, eval_globals)
            except Exception as exc:
                raise ValueError(f"Failed to evaluate model '{name}': {exc}") from exc
 
            combined_models[name] = {
                "model": model_callable,
                "guess": guess_callable,
                "bounds": normalize_bounds(metadata.get("bounds")),
                "equation": metadata.get("equation", ""),
                "source": metadata.get("source", ""),
                "description": metadata.get("description", ""),
                "category": metadata.get("category", ""),
                "sample_image": metadata.get("sample_image", ""),
                "model_source": model_code,
                "guess_source": guess_code,
            }
 
        curve_fit._cached_models = combined_models
        return curve_fit._cached_models
 
    try:
        available_models = _load_models()
    except Exception as exc:
        return f"Model loading error: {exc}"
    if model_id not in available_models:
        supported_models = sorted(available_models.keys())
        preview = ", ".join(supported_models[:10])
        if len(supported_models) > 10:
            preview = f"{preview}, ..."
        return f"Invalid model_id: {model_id}. Must be one of: {preview}"
    model_info = available_models[model_id]
    model_func = model_info["model"]
    guess_func = model_info["guess"]
    # Derive parameter names from the model function signature (skip the first arg, typically 'x')
    try:
        sig = inspect.signature(model_func)
        # parameters preserves order; skip parameter 0 (x)
        param_names = [p.name for i, p in enumerate(sig.parameters.values()) if i > 0]
    except Exception:
        return "Internal error: unable to inspect model function signature."
    n_params = len(param_names)
 
    try:
        x_arr = _ensure_2d_list(xdata)
    except ValueError as e:
        return f"Invalid input for xdata: {e}"
    try:
        y_arr = _ensure_2d_list(ydata)
    except ValueError as e:
        return f"Invalid input for ydata: {e}"
 
    if x_arr.shape[0] != y_arr.shape[0]:
        return "Invalid input: xdata and ydata must have the same number of rows."
 
    # Already ensured float64 in _ensure_2d_list
    xa = x_arr
    ya = y_arr
 
    # Derive initial guess (p0) and normalize to a 1-D tuple of float64
    if initial_guess is not None:
        if not isinstance(initial_guess, list) or len(initial_guess) == 0 or not isinstance(initial_guess[0], list):
            return f"Invalid input: initial_guess must be a 2D list (e.g. [[{','.join(param_names)}]])."
        try:
            p0_raw = initial_guess[0]
        except Exception as e:
            return f"Invalid initial_guess structure: {e}"
    else:
        # allow guess_func to return array-like or tuple/list
        try:
            p0_raw = guess_func(xa, ya)
        except Exception as e:
            return f"Initial guess generation error: {e}"
 
    # Normalize p0 to a flat tuple of float64 and validate length
    try:
        p0_arr = np.asarray(p0_raw, dtype=np.float64).ravel()
        p0 = tuple(float(x) for x in p0_arr.tolist())
    except Exception as e:
        return f"Invalid initial_guess values: {e}"
    if len(p0) != n_params:
        return f"Invalid input: initial_guess must contain {n_params} values."
 
    # Bounds handling: prefer user-provided bounds, otherwise model-specific defaults if present
    model_bounds = model_info.get('bounds') if isinstance(model_info, dict) else None
    if bounds is None and model_bounds is not None:
        bounds = model_bounds
 
    # If bounds provided (either by user or model defaults), validate and normalize
    if bounds is not None:
        if not isinstance(bounds, list) or len(bounds) != 2 or not isinstance(bounds[0], list) or not isinstance(bounds[1], list):
            return f"Invalid input: bounds must be a 2D list like [[lower1,...],[upper1,...]]."
        try:
            raw_lower = bounds[0]
            raw_upper = bounds[1]
            if not isinstance(raw_lower, list) or not isinstance(raw_upper, list):
                return f"Invalid input: bounds must be a 2D list like [[lower1,...],[upper1,...]]."
            if len(raw_lower) != n_params or len(raw_upper) != n_params:
                return f"Invalid input: bounds must contain {n_params} values for lower and {n_params} for upper."
            lower = []
            upper = []
            # Map None to -inf for lower and +inf for upper; otherwise convert to float
            for v in raw_lower:
                if v is None:
                    lower.append(-np.inf)
                else:
                    lower.append(float(v))
            for v in raw_upper:
                if v is None:
                    upper.append(np.inf)
                else:
                    upper.append(float(v))
        except Exception as e:
            return f"Invalid bounds values: {e}"
        # Convert to numpy arrays (preserve original behavior otherwise)
        bounds_tuple = (np.asarray(lower, dtype=np.float64), np.asarray(upper, dtype=np.float64))
    else:
        bounds_tuple = None
 
    allowed_methods = {"lm", "trf", "dogbox"}
    method_value = None
    if curve_fit_method is not None:
        if isinstance(curve_fit_method, str):
            method_candidate = curve_fit_method.strip().lower()
        else:
            return "Invalid input: curve_fit_method must be one of 'lm', 'trf', or 'dogbox'."
        if method_candidate not in allowed_methods:
            return "Invalid input: curve_fit_method must be one of 'lm', 'trf', or 'dogbox'."
        method_value = method_candidate
 
    # method vs bounds validation: Levenberg-Marquardt (lm) cannot handle bounds
    if method_value == 'lm' and bounds_tuple is not None:
        return "Invalid input: curve_fit_method='lm' cannot be used with bounds; use 'trf' or 'dogbox' instead."
 
    # With lm, number of observations (M) must be >= number of parameters (N)
    if method_value == 'lm' and xa.shape[0] < n_params:
        return f"Invalid input: curve_fit_method='lm' requires number of observations >= number of parameters ({xa.shape[0]} < {n_params})."
 
    # Wrap the model to ensure outputs are float64 (SciPy recommends float64)
    def _model(x, *params):
        try:
            return np.asarray(model_func(x, *params), dtype=np.float64)
        except Exception as e:
            # Return exception as a string via our interface
            raise
 
    try:
        curve_fit_kwargs = dict(p0=p0)
        if bounds_tuple is not None:
            curve_fit_kwargs['bounds'] = bounds_tuple
        if method_value is not None:
            curve_fit_kwargs['method'] = method_value
        curve_fit_kwargs['maxfev'] = 10000
        popt, pcov = scipy_curve_fit(_model, xa, ya, **curve_fit_kwargs)
    except Exception as e:
        return f"curve_fit error: {e}"
 
    try:
        fitted_vals = [float(v) for v in popt]
    except Exception as e:
        return f"Result parsing error: {e}"
    for v in fitted_vals:
        if math.isnan(v) or math.isinf(v):
            return "Fitting produced invalid numeric values (NaN or inf)."
    return [param_names, fitted_vals]