COMPLETENESS_SCORE
This function computes a daily data completeness score.
The completeness score for a given day is the fraction of the day for which there is valid data (non-missing values). The time duration attributed to each valid reading is equal to the timestamp spacing of the series, or the specified expected frequency. For example, a 24-hour time series with 30-minute spacing and 24 non-missing values translates to 12 hours of valid data, yielding a completeness score of 0.5.
Excel Usage
=COMPLETENESS_SCORE(times, values, freq, keep_index)
times(list[list], required): Timestamps in ISO8601 format.values(list[list], required): Numeric data values corresponding to the times.freq(str, optional, default: null): Expected interval between samples as a pandas frequency string (e.g. ‘15min’). If blank, it is inferred.keep_index(bool, optional, default: true): Whether to return values padded to align with the input time array. If True, replicates daily scores to input resolution.
Returns (list[list]): 2D list of completeness floats corresponding to each input value, or an error string.
Example 1: Day with missing points returns partial completeness
Inputs:
| times | values | freq | keep_index |
|---|---|---|---|
| 2024-01-01T00:00:00Z | 10 | 6H | true |
| 2024-01-01T06:00:00Z | |||
| 2024-01-01T12:00:00Z | 15 | ||
| 2024-01-01T18:00:00Z |
Excel formula:
=COMPLETENESS_SCORE({"2024-01-01T00:00:00Z";"2024-01-01T06:00:00Z";"2024-01-01T12:00:00Z";"2024-01-01T18:00:00Z"}, {10;"";15;""}, "6H", TRUE)
Expected output:
| Result |
|---|
| 0.5 |
| 0.5 |
| 0.5 |
| 0.5 |
Example 2: Day with all present values returns full completeness
Inputs:
| times | values | freq | keep_index |
|---|---|---|---|
| 2024-01-02T00:00:00Z | 5 | 12H | true |
| 2024-01-02T12:00:00Z | 7 |
Excel formula:
=COMPLETENESS_SCORE({"2024-01-02T00:00:00Z";"2024-01-02T12:00:00Z"}, {5;7}, "12H", TRUE)
Expected output:
| Result |
|---|
| 1 |
| 1 |
Example 3: Return daily index scores when keep index is false
Inputs:
| times | values | freq | keep_index |
|---|---|---|---|
| 2024-01-03T00:00:00Z | 1 | 6H | false |
| 2024-01-03T06:00:00Z | 1 | ||
| 2024-01-03T12:00:00Z | 1 | ||
| 2024-01-03T18:00:00Z | 1 | ||
| 2024-01-04T00:00:00Z | 2 | ||
| 2024-01-04T06:00:00Z | |||
| 2024-01-04T12:00:00Z | 2 | ||
| 2024-01-04T18:00:00Z | 2 |
Excel formula:
=COMPLETENESS_SCORE({"2024-01-03T00:00:00Z";"2024-01-03T06:00:00Z";"2024-01-03T12:00:00Z";"2024-01-03T18:00:00Z";"2024-01-04T00:00:00Z";"2024-01-04T06:00:00Z";"2024-01-04T12:00:00Z";"2024-01-04T18:00:00Z"}, {1;1;1;1;2;"";2;2}, "6H", FALSE)
Expected output:
| Result |
|---|
| 1 |
| 0.75 |
Example 4: Handle scalar timestamp and value inputs
Inputs:
| times | values | freq | keep_index |
|---|---|---|---|
| 2024-01-05T00:00:00Z | 10 | 1D | true |
Excel formula:
=COMPLETENESS_SCORE("2024-01-05T00:00:00Z", 10, "1D", TRUE)
Expected output:
1
Python Code
Show Code
import pandas as pd
from pvanalytics.quality.gaps import completeness_score as result_func
def completeness_score(times, values, freq=None, keep_index=True):
"""
Calculate a data completeness score for each day from a timestamped PV series.
See: https://pvanalytics.readthedocs.io/en/stable/generated/pvanalytics.quality.gaps.completeness_score.html
This example function is provided as-is without any representation of accuracy.
Args:
times (list[list]): Timestamps in ISO8601 format.
values (list[list]): Numeric data values corresponding to the times.
freq (str, optional): Expected interval between samples as a pandas frequency string (e.g. '15min'). If blank, it is inferred. Default is None.
keep_index (bool, optional): Whether to return values padded to align with the input time array. If True, replicates daily scores to input resolution. Default is True.
Returns:
list[list]: 2D list of completeness floats corresponding to each input value, or an error string.
"""
try:
def flatten_str(data):
if not isinstance(data, list): return [str(data)]
return [str(val) for row in data for val in (row if isinstance(row, list) else [row]) if val != ""]
def flatten_num(data):
if not isinstance(data, list): return [float(data)]
flat = []
for row in data:
row = row if isinstance(row, list) else [row]
for val in row:
if val == "": flat.append(float('nan'))
else: flat.append(float(val))
return flat
def unwrap_scalar(x):
if isinstance(x, list) and len(x) == 1 and isinstance(x[0], list) and len(x[0]) == 1:
return x[0][0]
return x
time_list = flatten_str(times)
val_list = flatten_num(values)
if len(time_list) != len(val_list):
return "Error: times and values must have the same length"
if len(time_list) == 0:
return "Error: input arrays cannot be empty"
dt_idx = pd.DatetimeIndex(time_list)
series = pd.Series(val_list, index=dt_idx)
freq_val = unwrap_scalar(freq)
keep_val = unwrap_scalar(keep_index)
f = str(freq_val) if freq_val is not None and str(freq_val).strip() != "" else None
keep = bool(keep_val) if keep_val is not None else True
# pvanalytics uses pandas.infer_freq, which requires at least 3 timestamps.
# For short series with an explicit freq, compute completeness manually.
if f is not None and len(time_list) < 3:
try:
td = pd.to_timedelta(f)
except Exception:
td = pd.to_timedelta(str(f))
if td <= pd.Timedelta(0):
return "Error: Invalid frequency"
df = pd.DataFrame({'value': series})
df['day'] = df.index.normalize()
scores = {}
for day, group in df.groupby('day'):
count = int(group['value'].notna().sum())
score = float(count * td / pd.Timedelta(days=1))
scores[day] = min(score, 1.0)
if keep:
return [[scores.get(day, "")] for day in df['day']]
else:
return [[scores.get(day, "")] for day in sorted(scores.keys())]
res = result_func(series, freq=f, keep_index=keep)
if keep:
return [[float(v) if not pd.isna(v) else ""] for v in res]
else:
# If not keep_index, result is indexed by day.
# Convert Series of daily values to an array
return [[float(v) if not pd.isna(v) else ""] for v in res]
except Exception as e:
return f"Error: {str(e)}"