TRIM_INCOMPLETE

This function trims a data series by marking early and late periods of low data completeness for exclusion.

It computes a rolling completeness score and returns a boolean array (True meaning the data point is kept) with False values up to the first complete day, True between the first and last complete days, and False thereafter.

Excel Usage

=TRIM_INCOMPLETE(times, values, minimum_completeness, days, freq)
  • times (list[list], required): Timestamps in ISO8601 format.
  • values (list[list], required): Numeric data values corresponding to the times.
  • minimum_completeness (float, optional, default: 0.333333): The minimum required daily completeness score.
  • days (int, optional, default: 10): The consecutive complete days needed before transitioning.
  • freq (str, optional, default: null): Expected interval between samples as a pandas frequency string (e.g. ‘15min’). If blank, it is inferred.

Returns (list[list]): 2D list of booleans (True to keep, False to trim), or an error string.

Example 1: Identifies trailing gaps based on completeness

Inputs:

times values minimum_completeness days freq
2024-01-01T00:00:00Z 10 0.5 1 1D
2024-01-02T00:00:00Z 15
2024-01-03T00:00:00Z 20
2024-01-04T00:00:00Z

Excel formula:

=TRIM_INCOMPLETE({"2024-01-01T00:00:00Z";"2024-01-02T00:00:00Z";"2024-01-03T00:00:00Z";"2024-01-04T00:00:00Z"}, {10;15;20;""}, 0.5, 1, "1D")

Expected output:

Result
true
true
true
false
Example 2: Keep all values when completeness threshold is always met

Inputs:

times values minimum_completeness days freq
2024-01-05T00:00:00Z 10 1 1 1D
2024-01-06T00:00:00Z 11
2024-01-07T00:00:00Z 12

Excel formula:

=TRIM_INCOMPLETE({"2024-01-05T00:00:00Z";"2024-01-06T00:00:00Z";"2024-01-07T00:00:00Z"}, {10;11;12}, 1, 1, "1D")

Expected output:

Result
true
true
true
Example 3: Require multiple complete days before keep window starts

Inputs:

times values minimum_completeness days freq
2024-01-08T00:00:00Z 0.5 2 1D
2024-01-09T00:00:00Z 10
2024-01-10T00:00:00Z 10
2024-01-11T00:00:00Z 10

Excel formula:

=TRIM_INCOMPLETE({"2024-01-08T00:00:00Z";"2024-01-09T00:00:00Z";"2024-01-10T00:00:00Z";"2024-01-11T00:00:00Z"}, {"";10;10;10}, 0.5, 2, "1D")

Expected output:

Result
false
true
true
true
Example 4: Handle scalar time and value inputs

Inputs:

times values minimum_completeness days freq
2024-01-12T00:00:00Z 9 0.5 1 1D

Excel formula:

=TRIM_INCOMPLETE("2024-01-12T00:00:00Z", 9, 0.5, 1, "1D")

Expected output:

true

Python Code

Show Code
import pandas as pd
from pvanalytics.quality.gaps import trim_incomplete as result_func
from pvanalytics.quality.gaps import start_stop_dates

def trim_incomplete(times, values, minimum_completeness=0.333333, days=10, freq=None):
    """
    Return a boolean mask to trim leading and trailing low-completeness periods from a PV series.

    See: https://pvanalytics.readthedocs.io/en/stable/generated/pvanalytics.quality.gaps.trim_incomplete.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        times (list[list]): Timestamps in ISO8601 format.
        values (list[list]): Numeric data values corresponding to the times.
        minimum_completeness (float, optional): The minimum required daily completeness score. Default is 0.333333.
        days (int, optional): The consecutive complete days needed before transitioning. Default is 10.
        freq (str, optional): Expected interval between samples as a pandas frequency string (e.g. '15min'). If blank, it is inferred. Default is None.

    Returns:
        list[list]: 2D list of booleans (True to keep, False to trim), or an error string.
    """
    try:
        def flatten_str(data):
            if not isinstance(data, list): return [str(data)]
            return [str(val) for row in data for val in (row if isinstance(row, list) else [row]) if val != ""]

        def flatten_num(data):
            if not isinstance(data, list): return [float(data)]
            flat = []
            for row in data:
                row = row if isinstance(row, list) else [row]
                for val in row:
                    if val == "": flat.append(float('nan'))
                    else: flat.append(float(val))
            return flat

        time_list = flatten_str(times)
        val_list = flatten_num(values)

        if len(time_list) != len(val_list):
            return "Error: times and values must have the same length"
        if len(time_list) == 0:
            return "Error: input arrays cannot be empty"

        dt_idx = pd.DatetimeIndex(time_list)
        series = pd.Series(val_list, index=dt_idx)

        min_comp = float(minimum_completeness) if minimum_completeness is not None else 0.333333
        d = int(days) if days is not None else 10
        f = str(freq) if freq is not None and str(freq).strip() != "" else None

        # pvanalytics requires at least 3 timestamps to infer frequency even if a
        # frequency is provided. For short series, compute the completeness/trim
        # logic ourselves to avoid this limitation.
        if len(series.index) < 3:
            if f is None:
                return "Error: Need at least 3 dates to infer frequency"

            try:
                offset = pd.tseries.frequencies.to_offset(f)
                seconds_per_sample = float(offset.delta.total_seconds())
            except Exception:
                return "Error: Invalid freq"

            daily_counts = series.resample('D').count()
            daily_completeness = (daily_counts * seconds_per_sample) / (1440 * 60)
            daily_good = daily_completeness >= min_comp

            start, end = start_stop_dates(daily_good, days=d)
            mask = pd.Series(False, index=series.index)
            if start:
                mask.loc[start:end] = True
            return [[bool(v)] for v in mask]

        # For longer series, defer to the library implementation.
        res = result_func(series, minimum_completeness=min_comp, days=d, freq=f)

        return [[bool(v)] for v in res]
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

Timestamps in ISO8601 format.
Numeric data values corresponding to the times.
The minimum required daily completeness score.
The consecutive complete days needed before transitioning.
Expected interval between samples as a pandas frequency string (e.g. '15min'). If blank, it is inferred.