COMPLETENESS_SCORE

This function computes a daily data completeness score.

The completeness score for a given day is the fraction of the day for which there is valid data (non-missing values). The time duration attributed to each valid reading is equal to the timestamp spacing of the series, or the specified expected frequency. For example, a 24-hour time series with 30-minute spacing and 24 non-missing values translates to 12 hours of valid data, yielding a completeness score of 0.5.

Excel Usage

=COMPLETENESS_SCORE(times, values, freq, keep_index)
  • times (list[list], required): Timestamps in ISO8601 format.
  • values (list[list], required): Numeric data values corresponding to the times.
  • freq (str, optional, default: null): Expected interval between samples as a pandas frequency string (e.g. ‘15min’). If blank, it is inferred.
  • keep_index (bool, optional, default: true): Whether to return values padded to align with the input time array. If True, replicates daily scores to input resolution.

Returns (list[list]): 2D list of completeness floats corresponding to each input value, or an error string.

Example 1: Day with missing points returns partial completeness

Inputs:

times values freq keep_index
2024-01-01T00:00:00Z 10 6H true
2024-01-01T06:00:00Z
2024-01-01T12:00:00Z 15
2024-01-01T18:00:00Z

Excel formula:

=COMPLETENESS_SCORE({"2024-01-01T00:00:00Z";"2024-01-01T06:00:00Z";"2024-01-01T12:00:00Z";"2024-01-01T18:00:00Z"}, {10;"";15;""}, "6H", TRUE)

Expected output:

Result
0.5
0.5
0.5
0.5
Example 2: Day with all present values returns full completeness

Inputs:

times values freq keep_index
2024-01-02T00:00:00Z 5 12H true
2024-01-02T12:00:00Z 7

Excel formula:

=COMPLETENESS_SCORE({"2024-01-02T00:00:00Z";"2024-01-02T12:00:00Z"}, {5;7}, "12H", TRUE)

Expected output:

Result
1
1
Example 3: Return daily index scores when keep index is false

Inputs:

times values freq keep_index
2024-01-03T00:00:00Z 1 6H false
2024-01-03T06:00:00Z 1
2024-01-03T12:00:00Z 1
2024-01-03T18:00:00Z 1
2024-01-04T00:00:00Z 2
2024-01-04T06:00:00Z
2024-01-04T12:00:00Z 2
2024-01-04T18:00:00Z 2

Excel formula:

=COMPLETENESS_SCORE({"2024-01-03T00:00:00Z";"2024-01-03T06:00:00Z";"2024-01-03T12:00:00Z";"2024-01-03T18:00:00Z";"2024-01-04T00:00:00Z";"2024-01-04T06:00:00Z";"2024-01-04T12:00:00Z";"2024-01-04T18:00:00Z"}, {1;1;1;1;2;"";2;2}, "6H", FALSE)

Expected output:

Result
1
0.75
Example 4: Handle scalar timestamp and value inputs

Inputs:

times values freq keep_index
2024-01-05T00:00:00Z 10 1D true

Excel formula:

=COMPLETENESS_SCORE("2024-01-05T00:00:00Z", 10, "1D", TRUE)

Expected output:

1

Python Code

Show Code
import pandas as pd
from pvanalytics.quality.gaps import completeness_score as result_func

def completeness_score(times, values, freq=None, keep_index=True):
    """
    Calculate a data completeness score for each day from a timestamped PV series.

    See: https://pvanalytics.readthedocs.io/en/stable/generated/pvanalytics.quality.gaps.completeness_score.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        times (list[list]): Timestamps in ISO8601 format.
        values (list[list]): Numeric data values corresponding to the times.
        freq (str, optional): Expected interval between samples as a pandas frequency string (e.g. '15min'). If blank, it is inferred. Default is None.
        keep_index (bool, optional): Whether to return values padded to align with the input time array. If True, replicates daily scores to input resolution. Default is True.

    Returns:
        list[list]: 2D list of completeness floats corresponding to each input value, or an error string.
    """
    try:
        def flatten_str(data):
            if not isinstance(data, list): return [str(data)]
            return [str(val) for row in data for val in (row if isinstance(row, list) else [row]) if val != ""]

        def flatten_num(data):
            if not isinstance(data, list): return [float(data)]
            flat = []
            for row in data:
                row = row if isinstance(row, list) else [row]
                for val in row:
                    if val == "": flat.append(float('nan'))
                    else: flat.append(float(val))
            return flat

        def unwrap_scalar(x):
            if isinstance(x, list) and len(x) == 1 and isinstance(x[0], list) and len(x[0]) == 1:
                return x[0][0]
            return x

        time_list = flatten_str(times)
        val_list = flatten_num(values)

        if len(time_list) != len(val_list):
            return "Error: times and values must have the same length"
        if len(time_list) == 0:
            return "Error: input arrays cannot be empty"

        dt_idx = pd.DatetimeIndex(time_list)
        series = pd.Series(val_list, index=dt_idx)

        freq_val = unwrap_scalar(freq)
        keep_val = unwrap_scalar(keep_index)

        f = str(freq_val) if freq_val is not None and str(freq_val).strip() != "" else None
        keep = bool(keep_val) if keep_val is not None else True

        # pvanalytics uses pandas.infer_freq, which requires at least 3 timestamps.
        # For short series with an explicit freq, compute completeness manually.
        if f is not None and len(time_list) < 3:
            try:
                td = pd.to_timedelta(f)
            except Exception:
                td = pd.to_timedelta(str(f))

            if td <= pd.Timedelta(0):
                return "Error: Invalid frequency"

            df = pd.DataFrame({'value': series})
            df['day'] = df.index.normalize()

            scores = {}
            for day, group in df.groupby('day'):
                count = int(group['value'].notna().sum())
                score = float(count * td / pd.Timedelta(days=1))
                scores[day] = min(score, 1.0)

            if keep:
                return [[scores.get(day, "")] for day in df['day']]
            else:
                return [[scores.get(day, "")] for day in sorted(scores.keys())]

        res = result_func(series, freq=f, keep_index=keep)

        if keep:
            return [[float(v) if not pd.isna(v) else ""] for v in res]
        else:
            # If not keep_index, result is indexed by day.
            # Convert Series of daily values to an array
            return [[float(v) if not pd.isna(v) else ""] for v in res]
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

Timestamps in ISO8601 format.
Numeric data values corresponding to the times.
Expected interval between samples as a pandas frequency string (e.g. '15min'). If blank, it is inferred.
Whether to return values padded to align with the input time array. If True, replicates daily scores to input resolution.