TRIM_INCOMPLETE
This function trims a data series by marking early and late periods of low data completeness for exclusion.
It computes a rolling completeness score and returns a boolean array (True meaning the data point is kept) with False values up to the first complete day, True between the first and last complete days, and False thereafter.
Excel Usage
=TRIM_INCOMPLETE(times, values, minimum_completeness, days, freq)
times(list[list], required): Timestamps in ISO8601 format.values(list[list], required): Numeric data values corresponding to the times.minimum_completeness(float, optional, default: 0.333333): The minimum required daily completeness score.days(int, optional, default: 10): The consecutive complete days needed before transitioning.freq(str, optional, default: null): Expected interval between samples as a pandas frequency string (e.g. ‘15min’). If blank, it is inferred.
Returns (list[list]): 2D list of booleans (True to keep, False to trim), or an error string.
Example 1: Identifies trailing gaps based on completeness
Inputs:
| times | values | minimum_completeness | days | freq |
|---|---|---|---|---|
| 2024-01-01T00:00:00Z | 10 | 0.5 | 1 | 1D |
| 2024-01-02T00:00:00Z | 15 | |||
| 2024-01-03T00:00:00Z | 20 | |||
| 2024-01-04T00:00:00Z |
Excel formula:
=TRIM_INCOMPLETE({"2024-01-01T00:00:00Z";"2024-01-02T00:00:00Z";"2024-01-03T00:00:00Z";"2024-01-04T00:00:00Z"}, {10;15;20;""}, 0.5, 1, "1D")
Expected output:
| Result |
|---|
| true |
| true |
| true |
| false |
Example 2: Keep all values when completeness threshold is always met
Inputs:
| times | values | minimum_completeness | days | freq |
|---|---|---|---|---|
| 2024-01-05T00:00:00Z | 10 | 1 | 1 | 1D |
| 2024-01-06T00:00:00Z | 11 | |||
| 2024-01-07T00:00:00Z | 12 |
Excel formula:
=TRIM_INCOMPLETE({"2024-01-05T00:00:00Z";"2024-01-06T00:00:00Z";"2024-01-07T00:00:00Z"}, {10;11;12}, 1, 1, "1D")
Expected output:
| Result |
|---|
| true |
| true |
| true |
Example 3: Require multiple complete days before keep window starts
Inputs:
| times | values | minimum_completeness | days | freq |
|---|---|---|---|---|
| 2024-01-08T00:00:00Z | 0.5 | 2 | 1D | |
| 2024-01-09T00:00:00Z | 10 | |||
| 2024-01-10T00:00:00Z | 10 | |||
| 2024-01-11T00:00:00Z | 10 |
Excel formula:
=TRIM_INCOMPLETE({"2024-01-08T00:00:00Z";"2024-01-09T00:00:00Z";"2024-01-10T00:00:00Z";"2024-01-11T00:00:00Z"}, {"";10;10;10}, 0.5, 2, "1D")
Expected output:
| Result |
|---|
| false |
| true |
| true |
| true |
Example 4: Handle scalar time and value inputs
Inputs:
| times | values | minimum_completeness | days | freq |
|---|---|---|---|---|
| 2024-01-12T00:00:00Z | 9 | 0.5 | 1 | 1D |
Excel formula:
=TRIM_INCOMPLETE("2024-01-12T00:00:00Z", 9, 0.5, 1, "1D")
Expected output:
true
Python Code
Show Code
import pandas as pd
from pvanalytics.quality.gaps import trim_incomplete as result_func
from pvanalytics.quality.gaps import start_stop_dates
def trim_incomplete(times, values, minimum_completeness=0.333333, days=10, freq=None):
"""
Return a boolean mask to trim leading and trailing low-completeness periods from a PV series.
See: https://pvanalytics.readthedocs.io/en/stable/generated/pvanalytics.quality.gaps.trim_incomplete.html
This example function is provided as-is without any representation of accuracy.
Args:
times (list[list]): Timestamps in ISO8601 format.
values (list[list]): Numeric data values corresponding to the times.
minimum_completeness (float, optional): The minimum required daily completeness score. Default is 0.333333.
days (int, optional): The consecutive complete days needed before transitioning. Default is 10.
freq (str, optional): Expected interval between samples as a pandas frequency string (e.g. '15min'). If blank, it is inferred. Default is None.
Returns:
list[list]: 2D list of booleans (True to keep, False to trim), or an error string.
"""
try:
def flatten_str(data):
if not isinstance(data, list): return [str(data)]
return [str(val) for row in data for val in (row if isinstance(row, list) else [row]) if val != ""]
def flatten_num(data):
if not isinstance(data, list): return [float(data)]
flat = []
for row in data:
row = row if isinstance(row, list) else [row]
for val in row:
if val == "": flat.append(float('nan'))
else: flat.append(float(val))
return flat
time_list = flatten_str(times)
val_list = flatten_num(values)
if len(time_list) != len(val_list):
return "Error: times and values must have the same length"
if len(time_list) == 0:
return "Error: input arrays cannot be empty"
dt_idx = pd.DatetimeIndex(time_list)
series = pd.Series(val_list, index=dt_idx)
min_comp = float(minimum_completeness) if minimum_completeness is not None else 0.333333
d = int(days) if days is not None else 10
f = str(freq) if freq is not None and str(freq).strip() != "" else None
# pvanalytics requires at least 3 timestamps to infer frequency even if a
# frequency is provided. For short series, compute the completeness/trim
# logic ourselves to avoid this limitation.
if len(series.index) < 3:
if f is None:
return "Error: Need at least 3 dates to infer frequency"
try:
offset = pd.tseries.frequencies.to_offset(f)
seconds_per_sample = float(offset.delta.total_seconds())
except Exception:
return "Error: Invalid freq"
daily_counts = series.resample('D').count()
daily_completeness = (daily_counts * seconds_per_sample) / (1440 * 60)
daily_good = daily_completeness >= min_comp
start, end = start_stop_dates(daily_good, days=d)
mask = pd.Series(False, index=series.index)
if start:
mask.loc[start:end] = True
return [[bool(v)] for v in mask]
# For longer series, defer to the library implementation.
res = result_func(series, minimum_completeness=min_comp, days=d, freq=f)
return [[bool(v)] for v in res]
except Exception as e:
return f"Error: {str(e)}"Online Calculator
Timestamps in ISO8601 format.
Numeric data values corresponding to the times.
The minimum required daily completeness score.
The consecutive complete days needed before transitioning.
Expected interval between samples as a pandas frequency string (e.g. '15min'). If blank, it is inferred.