Data Quality

Overview

Introduction Photovoltaic data quality is the discipline of testing whether time-series measurements from a PV system are physically plausible, internally consistent, and complete enough for analysis, forecasting, performance assessment, and warranty-grade reporting. In practical terms, it sits between raw telemetry and downstream decisions: if the quality layer is weak, every KPI built on top of it is unstable. PV systems produce rich streams of irradiance, temperature, power, and status signals; however, those streams are vulnerable to sensor drift, communication outages, timestamp misalignment, logger freezes, and environmental edge cases. Data-quality methods convert these imperfections into explicit, auditable flags so analysts can separate trustworthy observations from suspect ones.

From a mathematical perspective, PV data quality combines three broad families of checks:

  1. Range and physics checks: values must lie within bounds implied by geometry and extraterrestrial irradiance.
  2. Relational checks: variables that should cohere (for example, irradiance components) are tested for consistency.
  3. Coverage and continuity checks: timestamps and sampling density must support intended calculations.

These ideas map closely to established quality-control frameworks in atmospheric and solar monitoring, including QCRad concepts and broader sensor QA/QC principles described in references such as Data quality. In this category, Boardflare implements production-ready checks through pvanalytics, an open-source PV data-quality library from the PV performance community. Because the calculators are exposed in spreadsheet-friendly form, teams can apply scientific QA logic without writing bespoke code pipelines.

This matters to business users because financial and operational outcomes are sensitive to small data errors. A biased irradiance sensor can make a healthy plant look underperforming; missing intervals can distort contractual availability metrics; stale values can hide inverter trips; and untrimmed commissioning periods can depress annualized performance ratio. The net effect is avoidable uncertainty in O&M prioritization, investor reporting, and root-cause analysis. A disciplined quality workflow, by contrast, improves confidence intervals, reduces false alarms, and shortens time-to-diagnosis when problems are real.

Within this category, the tools span the full quality chain: irradiance limit checks (CHECK_GHI_QCRAD, CHECK_DHI_QCRAD, CHECK_DNI_QCRAD), component consistency validation (CHECK_IRRAD_QCRAD), temporal regularity (SPACING), missingness quantification (COMPLETENESS_SCORE), edge-period trimming (TRIM_INCOMPLETE), sensor-freeze detection (STALE_VALUES_ROUND), and weather-corrected fleet KPI estimation (PERF_RATIO_NREL). Together they support a “quality-first” analytics architecture where each downstream metric can be tied back to transparent inclusion rules.

When to Use It Use this category when the core job is not “compute a metric quickly,” but “compute a metric you can trust.” In operating PV portfolios, many disputes are not about formula choice; they are about whether the input data is valid for the formula at all. The calculators here are especially useful when teams need reproducible quality decisions across many sites, long historical windows, or mixed data infrastructures.

One common scenario is monthly performance reporting for asset managers. The analyst must publish PR trends, identify underperformers, and justify exclusions to owners or lenders. A practical workflow is to verify timestamp cadence with SPACING, quantify missingness with COMPLETENESS_SCORE, exclude low-quality edges with TRIM_INCOMPLETE, flag physically impossible irradiance points via CHECK_GHI_QCRAD, CHECK_DHI_QCRAD, and CHECK_DNI_QCRAD, then compute weather-corrected KPI using PERF_RATIO_NREL. This sequence creates a defensible audit trail: each dropped interval is explained by a specific rule, rather than manual judgment.

Another high-value scenario is SCADA troubleshooting and anomaly triage. During a suspected communications issue, values may appear flat for hours even though the plant is changing. STALE_VALUES_ROUND detects repeated rounded values characteristic of frozen sensors or logger buffering artifacts. In parallel, SPACING reveals irregular logging intervals, and COMPLETENESS_SCORE surfaces the effective coverage loss day by day. Engineers can then distinguish “real plant stability” from “telemetry malfunction,” reducing unnecessary field dispatches.

A third scenario is bankable energy-model backtesting and model calibration. Modelers need high-integrity irradiance decomposition relationships to evaluate transposition, inverter clipping assumptions, and loss factors. CHECK_IRRAD_QCRAD is central here because it tests whether measured GHI, DHI, and DNI satisfy expected component relationships under valid solar geometry. Combined with range screens (CHECK_GHI_QCRAD, CHECK_DHI_QCRAD, CHECK_DNI_QCRAD), this reduces bias from unphysical points that would otherwise skew calibration coefficients.

These tools are also useful during commissioning handoff and data migration projects. When historical feeds are replatformed, timestamp shifts and null-handling differences often appear. A quick run of SPACING and COMPLETENESS_SCORE can detect structural issues before they contaminate dashboards. If the first and last weeks are unstable due to onboarding gaps, TRIM_INCOMPLETE can define the “analysis-valid” window without ad hoc date cuts.

At an organizational level, teams should apply these checks whenever decisions involve cost, risk transfer, or contractual accountability: availability guarantees, underperformance claims, O&M incentive calculations, and degradation studies. In all those cases, a standardized QA layer is cheaper than arguing later about whether the input data should have been included.

How It Works The category is implemented with pvanalytics, with one key metric aligned to NREL methodology via pvanalytics metrics documentation. Conceptually, the calculators apply boolean masks and scoring functions to indexed time series. Each mask marks valid/invalid records under a specific assumption; masks can then be combined using logical operations to produce a final analysis set.

For irradiance limits, QCRad-style checks use solar geometry and extraterrestrial irradiance as dynamic bounds. A simplified representation is:

L(\theta_z, E_0) < I(t) < U(\theta_z, E_0)

where I(t) is measured irradiance (GHI, DHI, or DNI), \theta_z is solar zenith, and E_0 is extraterrestrial normal irradiance. The lower bound enforces non-negative/physically plausible minima, and the upper bound scales with conditions that limit what sunlight can realistically deliver at the sensor plane. Boardflare exposes these checks as CHECK_GHI_QCRAD, CHECK_DHI_QCRAD, and CHECK_DNI_QCRAD, each supporting “physical” and “extreme” limit modes.

Component consistency uses the irradiance closure identity (with geometry projection):

GHI \approx DNI \cos(\theta_z) + DHI

The calculator CHECK_IRRAD_QCRAD evaluates whether this relation is sufficiently satisfied and whether diffuse fraction behavior is reasonable under domain-specific criteria. This is powerful because each individual channel could pass its own range test yet still be inconsistent as a trio, indicating calibration mismatch, shading artifacts, tracker metadata issues, or timestamp offsets.

Temporal integrity is handled in two layers. First, SPACING checks that adjacent timestamps match expected interval \Delta t:

t_i - t_{i-1} = \Delta t

Second, COMPLETENESS_SCORE computes daily coverage as observed valid duration divided by expected duration. In simplified form:

C_d = \frac{\sum_{i \in d} \mathbf{1}_{\text{valid},i} \cdot \Delta t}{24\,\text{h}}

where C_d \in [0,1] is the day-level completeness score. This decouples “are timestamps regular?” from “how much valid data exists?”, both of which are necessary for trustworthy aggregation.

Edge trimming then uses completeness history to define analysis windows. TRIM_INCOMPLETE finds sustained periods meeting a minimum completeness threshold and masks sparse startup/shutdown tails. Operationally, this prevents short incomplete segments from disproportionately affecting monthly or annual KPIs.

Stale-value detection, via STALE_VALUES_ROUND, identifies contiguous runs where rounded values repeat for at least a window length. If x_t is rounded signal value and window size is w, stale conditions are detected where:

x_t = x_{t-1} = \cdots = x_{t-w+1}

under user-selected marking logic (tail, end, or all). This captures a common real-world failure mode: sensors or loggers that stop updating numerically but continue emitting timestamps.

Finally, PERF_RATIO_NREL produces weather-corrected PR by normalizing AC output against irradiance and temperature-adjusted reference behavior. The exact implementation follows NREL-style methodology in pvanalytics, including SAPM temperature model parameters. A conceptual form is:

PR_{wc} = \frac{\sum P_{AC}}{\sum \left(P_{dc,ref}\,\frac{G_{POA}}{G_{ref}}\,f_T\right)}

where f_T applies temperature correction using parameters such as a, b, \Delta T, and \gamma_{pdc}. The business value is comparability: weather-corrected PR is less sensitive to meteorological variation, so trend changes more often reflect asset performance than weather noise.

An important assumption across all methods is that metadata (timezone, sensor orientation, naming conventions, units) is coherent. Quality checks can catch many errors, but they cannot fully compensate for fundamentally misconfigured source systems. For robust operations, teams should pair these calculators with governance controls: fixed unit standards, explicit null policies, and consistent timestamp localization.

Practical Example Consider a utility-scale PV plant where an analyst must publish a monthly operations memo and answer a recurring question: “Did PR decline due to plant issues, or due to data artifacts?” The raw feed includes 5-minute irradiance components (GHI, DHI, DNI), ambient temperature, wind speed, and AC power. Over the month, there are several logger interruptions and a suspected irradiance sensor freeze.

Step 1 is structural validation. The analyst runs SPACING on timestamps with 5min frequency to detect interval discontinuities. Any nonconforming points are flagged for investigation and are excluded from downstream KPI windows. This catches a subtle but common issue: duplicated or skipped records after SCADA reconnect.

Step 2 is coverage quantification. Using COMPLETENESS_SCORE, the analyst computes day-level completeness, optionally preserving original index granularity for easier joins. Days with severe data loss are visible immediately (for example, C_d < 0.7). Instead of manually inspecting logs, the team gets an objective daily coverage metric suitable for governance thresholds.

Step 3 is edge-window hygiene. With TRIM_INCOMPLETE, the analyst trims leading and trailing low-completeness periods caused by maintenance and communications cutover. This avoids contaminating “month-to-date PR” with sparse endpoints that do not represent normal operation.

Step 4 is irradiance plausibility screening. The analyst applies CHECK_GHI_QCRAD, CHECK_DHI_QCRAD, and CHECK_DNI_QCRAD with physical limits for routine reporting and optionally extreme limits for forensic review. This removes unphysical spikes, negative daytime irradiance, and implausible upper-bound excursions.

Step 5 is relational consistency testing. CHECK_IRRAD_QCRAD evaluates whether component closure approximately holds. During the suspected freeze window, the check reveals repeated diffuse patterns inconsistent with DNI\cos(\theta_z)+DHI, confirming instrumentation artifacts rather than true irradiance behavior.

Step 6 is stale-run detection. STALE_VALUES_ROUND is applied to key channels (often irradiance and power) with a window tuned to operational cadence (for 5-minute data, maybe 6–12 points). This identifies frozen-value plateaus that would otherwise bias averages.

Step 7 is KPI computation on the quality-controlled subset. The final mask is built from logical conjunction of spacing validity, acceptable completeness window, irradiance range passes, consistency flags, and stale-value exclusions. PERF_RATIO_NREL is then computed on this curated dataset.

In the memo, the analyst reports both raw and quality-controlled PR, plus counts of excluded intervals by rule. This framing changes the conversation from subjective “data looked odd” to objective “3.8% of records excluded: 1.2% spacing failures, 0.9% stale runs, 1.7% irradiance QC violations.” Stakeholders gain confidence because method and impact are transparent, repeatable, and comparable month to month.

Compared with traditional Excel-only workflows, this approach is faster and more robust. Instead of building fragile custom formulas for each site, teams reuse standardized calculators with well-defined semantics from pvanalytics. The result is lower analyst variance, easier peer review, and better portability across portfolios.

How to Choose Choose calculators based on the question being answered and where uncertainty enters the pipeline: timestamps, missingness, physical plausibility, cross-channel consistency, instrument behavior, or final KPI normalization.

graph TD
    A[Start: PV dataset ready for analysis] --> B{Are timestamps regular?}
    B -- No --> C[Use SPACING]
    B -- Yes --> D{Is coverage sufficient by day?}
    C --> D
    D -- No --> E[Use COMPLETENESS_SCORE and TRIM_INCOMPLETE]
    D -- Yes --> F{Are irradiance values physically plausible?}
    E --> F
    F -- No --> G[Use CHECK_GHI_QCRAD / CHECK_DHI_QCRAD / CHECK_DNI_QCRAD]
    F -- Yes --> H{Do components agree?}
    G --> H
    H -- No --> I[Use CHECK_IRRAD_QCRAD]
    H -- Yes --> J{Any frozen or stuck sensors?}
    I --> J
    J -- Yes --> K[Use STALE_VALUES_ROUND]
    J -- No --> L[Compute weather-corrected KPI]
    K --> L
    L --> M[Use PERF_RATIO_NREL]

Function Best use case Inputs required Strengths Caveats
SPACING Detect irregular or broken timestamp cadence Time index + expected freq Fast structural gate before all other tests Does not quantify missingness severity by itself
COMPLETENESS_SCORE Measure effective day-level data availability Time index + values (+ optional freq) Produces interpretable 01 coverage metric High completeness does not guarantee physical correctness
TRIM_INCOMPLETE Exclude weak leading/trailing periods Time index + values + threshold settings Standardizes analysis window selection Threshold choice should align with reporting policy
CHECK_GHI_QCRAD Screen GHI for physical/extreme bounds GHI, zenith, extraterrestrial irradiance Domain-aware dynamic limits Needs accurate zenith and units
CHECK_DHI_QCRAD Screen DHI plausibility DHI, zenith, extraterrestrial irradiance Captures diffuse-channel outliers Same metadata dependencies as other QCRad checks
CHECK_DNI_QCRAD Screen DNI plausibility DNI, zenith, extraterrestrial irradiance Catches direct-beam spikes/invalid points Sensitive to sensor alignment and soiling effects
CHECK_IRRAD_QCRAD Validate GHI-DHI-DNI closure consistency Zenith + all three irradiance components Detects cross-channel mismatch missed by single-channel bounds Must interpret night/out-of-domain handling correctly
STALE_VALUES_ROUND Identify frozen sensors/loggers via repeated values Signal series + window/rounding/mark rules Excellent for telemetry fault detection Needs sensible window setting for sample rate
PERF_RATIO_NREL Compute weather-corrected plant performance POA irradiance, weather, AC power, rating Produces normalized KPI suitable for trend comparison Quality of inputs strongly determines interpretability

A practical selection pattern is to run tools in this order: structural checks, coverage checks, physics checks, relational checks, stale checks, then KPI computation. In other words, PERF_RATIO_NREL should usually be the endpoint, not the starting point. If data quality is uncertain, compute PR twice (raw and filtered) and report the delta; large deltas often indicate data-process problems rather than plant degradation.

For policy design, teams can define a tiered approach: “monitoring tier” with milder thresholds for early warning, and “reporting tier” with stricter exclusions for contractual numbers. The same calculators support both tiers by changing limit modes and completeness requirements. This aligns well with the dual pillar strategy: one pillar for broad operational visibility, another for high-confidence decision support.

In summary, pick each function based on the failure mode you need to control, then combine outputs into a transparent inclusion mask. The category is strongest when used as a system, not a menu of isolated checks.

CHECK_DHI_QCRAD

This function tests for lower and upper limits on Diffuse Horizontal Irradiance (DHI) using the QCRad criteria.

The test is applied to each DHI value. A reading passes if it is strictly greater than the lower bound and less than the upper bound. The bounds can be either ‘physical’ or ‘extreme’ criteria.

Excel Usage

=CHECK_DHI_QCRAD(dhi, solar_zenith, dni_extra, limits)
  • dhi (list[list], required): Diffuse horizontal irradiance (W/m^2).
  • solar_zenith (list[list], required): Solar zenith angle (degrees).
  • dni_extra (list[list], required): Extraterrestrial normal irradiance (W/m^2).
  • limits (str, optional, default: “physical”): QCRAD QC limits.

Returns (list[list]): 2D list of booleans (True if passed limits), or an error string.

Example 1: Check DHI physical limits pass and fail cases

Inputs:

dhi solar_zenith dni_extra limits
100 30 1367 physical
-10 30 1367
1500 30 1367

Excel formula:

=CHECK_DHI_QCRAD({100;-10;1500}, {30;30;30}, {1367;1367;1367}, "physical")

Expected output:

Result
true
false
false
Example 2: Evaluate DHI values with extreme limits setting

Inputs:

dhi solar_zenith dni_extra limits
80 20 1367 extreme
400 45 1367
1100 60 1367

Excel formula:

=CHECK_DHI_QCRAD({80;400;1100}, {20;45;60}, {1367;1367;1367}, "extreme")

Expected output:

Result
true
true
false
Example 3: Handle scalar inputs as single observations

Inputs:

dhi solar_zenith dni_extra limits
200 35 1367 physical

Excel formula:

=CHECK_DHI_QCRAD(200, 35, 1367, "physical")

Expected output:

true

Example 4: Process a row vector of DHI observations

Inputs:

dhi solar_zenith dni_extra limits
50 150 300 10 30 50 1367 1367 1367 physical

Excel formula:

=CHECK_DHI_QCRAD({50,150,300}, {10,30,50}, {1367,1367,1367}, "physical")

Expected output:

Result
true
true
true

Python Code

Show Code
import pandas as pd
from pvanalytics.quality.irradiance import check_dhi_limits_qcrad as result_func

def check_dhi_qcrad(dhi, solar_zenith, dni_extra, limits='physical'):
    """
    Return a pass/fail QC flag array for each DHI reading against QCRad physical or extreme limits.

    See: https://pvanalytics.readthedocs.io/en/stable/generated/pvanalytics.quality.irradiance.check_dhi_limits_qcrad.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        dhi (list[list]): Diffuse horizontal irradiance (W/m^2).
        solar_zenith (list[list]): Solar zenith angle (degrees).
        dni_extra (list[list]): Extraterrestrial normal irradiance (W/m^2).
        limits (str, optional): QCRAD QC limits. Valid options: Physical, Extreme. Default is 'physical'.

    Returns:
        list[list]: 2D list of booleans (True if passed limits), or an error string.
    """
    try:
        def flatten(data):
            if not isinstance(data, list):
                return [float(data)]
            flat = []
            for row in data:
                if isinstance(row, list):
                    for val in row:
                        if val != "":
                            flat.append(float(val))
                elif row != "":
                    flat.append(float(row))
            return flat

        dhi_s = pd.Series(flatten(dhi))
        zenith_s = pd.Series(flatten(solar_zenith))
        dni_extra_s = pd.Series(flatten(dni_extra))

        n = len(dhi_s)
        if not (len(zenith_s) == n and len(dni_extra_s) == n):
            return "Error: All input arrays must have the same length"
        if n == 0:
            return "Error: Arrays cannot be empty"

        lim = str(limits) if limits is not None else "physical"
        if lim not in ["physical", "extreme"]:
            return "Error: limits must be either 'physical' or 'extreme'"

        res = result_func(dhi_s, zenith_s, dni_extra_s, limits=lim)

        return [[bool(v)] for v in res]
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

Diffuse horizontal irradiance (W/m^2).
Solar zenith angle (degrees).
Extraterrestrial normal irradiance (W/m^2).
QCRAD QC limits.

CHECK_DNI_QCRAD

This function tests for lower and upper limits on Direct Normal Irradiance (DNI) using the QCRad criteria.

The test is applied to each DNI value. A reading passes if it is strictly greater than the lower bound and less than the upper bound. The bounds can be either ‘physical’ or ‘extreme’ criteria.

Excel Usage

=CHECK_DNI_QCRAD(dni, solar_zenith, dni_extra, limits)
  • dni (list[list], required): Direct normal irradiance (W/m^2).
  • solar_zenith (list[list], required): Solar zenith angle (degrees).
  • dni_extra (list[list], required): Extraterrestrial normal irradiance (W/m^2).
  • limits (str, optional, default: “physical”): QCRAD QC limits.

Returns (list[list]): 2D list of booleans (True if passed limits), or an error string.

Example 1: Check DNI physical limits pass and fail cases

Inputs:

dni solar_zenith dni_extra limits
800 30 1367 physical
-10 30 1367
1500 30 1367

Excel formula:

=CHECK_DNI_QCRAD({800;-10;1500}, {30;30;30}, {1367;1367;1367}, "physical")

Expected output:

Result
true
false
false
Example 2: Evaluate DNI values with extreme limits setting

Inputs:

dni solar_zenith dni_extra limits
500 15 1367 extreme
900 30 1367
1200 55 1367

Excel formula:

=CHECK_DNI_QCRAD({500;900;1200}, {15;30;55}, {1367;1367;1367}, "extreme")

Expected output:

Result
true
true
false
Example 3: Handle scalar DNI inputs as single observations

Inputs:

dni solar_zenith dni_extra limits
700 25 1367 physical

Excel formula:

=CHECK_DNI_QCRAD(700, 25, 1367, "physical")

Expected output:

true

Example 4: Process a row vector of DNI observations

Inputs:

dni solar_zenith dni_extra limits
300 600 900 10 20 40 1367 1367 1367 physical

Excel formula:

=CHECK_DNI_QCRAD({300,600,900}, {10,20,40}, {1367,1367,1367}, "physical")

Expected output:

Result
true
true
true

Python Code

Show Code
import pandas as pd
from pvanalytics.quality.irradiance import check_dni_limits_qcrad as result_func

def check_dni_qcrad(dni, solar_zenith, dni_extra, limits='physical'):
    """
    Return a pass/fail QC flag array for each DNI reading against QCRad physical or extreme limits.

    See: https://pvanalytics.readthedocs.io/en/stable/generated/pvanalytics.quality.irradiance.check_dni_limits_qcrad.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        dni (list[list]): Direct normal irradiance (W/m^2).
        solar_zenith (list[list]): Solar zenith angle (degrees).
        dni_extra (list[list]): Extraterrestrial normal irradiance (W/m^2).
        limits (str, optional): QCRAD QC limits. Valid options: Physical, Extreme. Default is 'physical'.

    Returns:
        list[list]: 2D list of booleans (True if passed limits), or an error string.
    """
    try:
        def flatten(data):
            if not isinstance(data, list):
                return [float(data)]
            flat = []
            for row in data:
                if isinstance(row, list):
                    for val in row:
                        if val != "":
                            flat.append(float(val))
                elif row != "":
                    flat.append(float(row))
            return flat

        dni_s = pd.Series(flatten(dni))
        zenith_s = pd.Series(flatten(solar_zenith))
        dni_extra_s = pd.Series(flatten(dni_extra))

        n = len(dni_s)
        if not (len(zenith_s) == n and len(dni_extra_s) == n):
            return "Error: All input arrays must have the same length"
        if n == 0:
            return "Error: Arrays cannot be empty"

        lim = str(limits) if limits is not None else "physical"
        if lim not in ["physical", "extreme"]:
            return "Error: limits must be either 'physical' or 'extreme'"

        res = result_func(dni_s, zenith_s, dni_extra_s, limits=lim)

        return [[bool(v)] for v in res]
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

Direct normal irradiance (W/m^2).
Solar zenith angle (degrees).
Extraterrestrial normal irradiance (W/m^2).
QCRAD QC limits.

CHECK_GHI_QCRAD

This function tests for lower and upper limits on Global Horizontal Irradiance (GHI) using the QCRad criteria.

The test is applied to each GHI value. A reading passes if it is strictly greater than the lower bound and less than the upper bound. The bounds can be either ‘physical’ or ‘extreme’ criteria.

Excel Usage

=CHECK_GHI_QCRAD(ghi, solar_zenith, dni_extra, limits)
  • ghi (list[list], required): Global horizontal irradiance (W/m^2).
  • solar_zenith (list[list], required): Solar zenith angle (degrees).
  • dni_extra (list[list], required): Extraterrestrial normal irradiance (W/m^2).
  • limits (str, optional, default: “physical”): QCRAD QC limits.

Returns (list[list]): 2D list of booleans (True if passed limits), or an error string.

Example 1: Check GHI physical limits pass and fail cases

Inputs:

ghi solar_zenith dni_extra limits
1000 30 1367 physical
-10 30 1367
1500 90 1367

Excel formula:

=CHECK_GHI_QCRAD({1000;-10;1500}, {30;30;90}, {1367;1367;1367}, "physical")

Expected output:

Result
true
false
false
Example 2: Evaluate GHI values with extreme limits setting

Inputs:

ghi solar_zenith dni_extra limits
200 10 1367 extreme
800 35 1367
1300 70 1367

Excel formula:

=CHECK_GHI_QCRAD({200;800;1300}, {10;35;70}, {1367;1367;1367}, "extreme")

Expected output:

Result
true
true
false
Example 3: Handle scalar GHI inputs as single observations

Inputs:

ghi solar_zenith dni_extra limits
600 40 1367 physical

Excel formula:

=CHECK_GHI_QCRAD(600, 40, 1367, "physical")

Expected output:

true

Example 4: Process a row vector of GHI observations

Inputs:

ghi solar_zenith dni_extra limits
150 450 900 15 25 45 1367 1367 1367 physical

Excel formula:

=CHECK_GHI_QCRAD({150,450,900}, {15,25,45}, {1367,1367,1367}, "physical")

Expected output:

Result
true
true
true

Python Code

Show Code
import pandas as pd
from pvanalytics.quality.irradiance import check_ghi_limits_qcrad as result_func

def check_ghi_qcrad(ghi, solar_zenith, dni_extra, limits='physical'):
    """
    Return a pass/fail QC flag array for each GHI reading against QCRad physical or extreme limits.

    See: https://pvanalytics.readthedocs.io/en/stable/generated/pvanalytics.quality.irradiance.check_ghi_limits_qcrad.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        ghi (list[list]): Global horizontal irradiance (W/m^2).
        solar_zenith (list[list]): Solar zenith angle (degrees).
        dni_extra (list[list]): Extraterrestrial normal irradiance (W/m^2).
        limits (str, optional): QCRAD QC limits. Valid options: Physical, Extreme. Default is 'physical'.

    Returns:
        list[list]: 2D list of booleans (True if passed limits), or an error string.
    """
    try:
        def flatten(data):
            if not isinstance(data, list):
                return [float(data)]
            flat = []
            for row in data:
                if isinstance(row, list):
                    for val in row:
                        if val != "":
                            flat.append(float(val))
                elif row != "":
                    flat.append(float(row))
            return flat

        ghi_s = pd.Series(flatten(ghi))
        zenith_s = pd.Series(flatten(solar_zenith))
        dni_extra_s = pd.Series(flatten(dni_extra))

        n = len(ghi_s)
        if not (len(zenith_s) == n and len(dni_extra_s) == n):
            return "Error: All input arrays must have the same length"
        if n == 0:
            return "Error: Arrays cannot be empty"

        lim = str(limits) if limits is not None else "physical"
        if lim not in ["physical", "extreme"]:
            return "Error: limits must be either 'physical' or 'extreme'"

        res = result_func(ghi_s, zenith_s, dni_extra_s, limits=lim)

        return [[bool(v)] for v in res]
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

Global horizontal irradiance (W/m^2).
Solar zenith angle (degrees).
Extraterrestrial normal irradiance (W/m^2).
QCRAD QC limits.

CHECK_IRRAD_QCRAD

This function checks the consistency of GHI, DHI, and DNI using the QCRad criteria.

The algorithm validates that the measured GHI is consistent with the calculated component sum (DNI * cos(zenith) + DHI) and verifies that the ratio of DHI to GHI falls within reasonable bounds. The output is a 2D array with two columns for each point: the first representing component consistency, the second representing diffuse ratio limits.

Excel Usage

=CHECK_IRRAD_QCRAD(solar_zenith, ghi, dhi, dni, outside_domain)
  • solar_zenith (list[list], required): Solar zenith angle (degrees).
  • ghi (list[list], required): Global horizontal irradiance (W/m^2).
  • dhi (list[list], required): Diffuse horizontal irradiance (W/m^2).
  • dni (list[list], required): Direct normal irradiance (W/m^2).
  • outside_domain (bool, optional, default: false): Value to return when tests are not applicable (e.g. night time).

Returns (list[list]): 2D list [[components_consistent, diffuse_ratio_pass]], or an error string.

Example 1: Check irradiance component consistency

Inputs:

solar_zenith ghi dhi dni outside_domain
30 1000 100 1039.23 false
30 0 0 0
90 1000 10 900

Excel formula:

=CHECK_IRRAD_QCRAD({30;30;90}, {1000;0;1000}, {100;0;10}, {1039.23;0;900}, FALSE)

Expected output:

Result
true true
false false
false true
Example 2: Use outside domain flag for non-applicable points

Inputs:

solar_zenith ghi dhi dni outside_domain
95 0 0 0 true
92 5 3 5
89 20 10 50

Excel formula:

=CHECK_IRRAD_QCRAD({95;92;89}, {0;5;20}, {0;3;10}, {0;5;50}, TRUE)

Expected output:

Result
true true
true true
true true
Example 3: Handle scalar irradiance consistency inputs

Inputs:

solar_zenith ghi dhi dni outside_domain
30 900 100 923.76 false

Excel formula:

=CHECK_IRRAD_QCRAD(30, 900, 100, 923.76, FALSE)

Expected output:

Result
true true
Example 4: Process row-vector irradiance consistency input

Inputs:

solar_zenith ghi dhi dni outside_domain
20 40 60 900 700 400 120 150 180 830 720 440 false

Excel formula:

=CHECK_IRRAD_QCRAD({20,40,60}, {900,700,400}, {120,150,180}, {830,720,440}, FALSE)

Expected output:

Result
true true
true true
true true

Python Code

Show Code
import pandas as pd
from pvanalytics.quality.irradiance import check_irradiance_consistency_qcrad as result_func

def check_irrad_qcrad(solar_zenith, ghi, dhi, dni, outside_domain=False):
    """
    Return consistency flags for irradiance-component balance from GHI, DNI, DHI, and zenith ranges.

    See: https://pvanalytics.readthedocs.io/en/stable/generated/pvanalytics.quality.irradiance.check_irradiance_consistency_qcrad.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        solar_zenith (list[list]): Solar zenith angle (degrees).
        ghi (list[list]): Global horizontal irradiance (W/m^2).
        dhi (list[list]): Diffuse horizontal irradiance (W/m^2).
        dni (list[list]): Direct normal irradiance (W/m^2).
        outside_domain (bool, optional): Value to return when tests are not applicable (e.g. night time). Default is False.

    Returns:
        list[list]: 2D list [[components_consistent, diffuse_ratio_pass]], or an error string.
    """
    try:
        def flatten(data):
            if not isinstance(data, list):
                return [float(data)]
            flat = []
            for row in data:
                if isinstance(row, list):
                    for val in row:
                        if val != "":
                            flat.append(float(val))
                elif row != "":
                    flat.append(float(row))
            return flat

        zenith_s = pd.Series(flatten(solar_zenith))
        ghi_s = pd.Series(flatten(ghi))
        dhi_s = pd.Series(flatten(dhi))
        dni_s = pd.Series(flatten(dni))

        n = len(zenith_s)
        if not (len(ghi_s) == n and len(dhi_s) == n and len(dni_s) == n):
            return "Error: All input arrays must have the same length"
        if n == 0:
            return "Error: Arrays cannot be empty"

        out_dom = bool(outside_domain) if outside_domain is not None else False

        comp, ratio = result_func(zenith_s, ghi_s, dhi_s, dni_s, outside_domain=out_dom)

        result = []
        for c, r in zip(comp, ratio):
            result.append([bool(c), bool(r)])
        return result
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

Solar zenith angle (degrees).
Global horizontal irradiance (W/m^2).
Diffuse horizontal irradiance (W/m^2).
Direct normal irradiance (W/m^2).
Value to return when tests are not applicable (e.g. night time).

COMPLETENESS_SCORE

This function computes a daily data completeness score.

The completeness score for a given day is the fraction of the day for which there is valid data (non-missing values). The time duration attributed to each valid reading is equal to the timestamp spacing of the series, or the specified expected frequency. For example, a 24-hour time series with 30-minute spacing and 24 non-missing values translates to 12 hours of valid data, yielding a completeness score of 0.5.

Excel Usage

=COMPLETENESS_SCORE(times, values, freq, keep_index)
  • times (list[list], required): Timestamps in ISO8601 format.
  • values (list[list], required): Numeric data values corresponding to the times.
  • freq (str, optional, default: null): Expected interval between samples as a pandas frequency string (e.g. ‘15min’). If blank, it is inferred.
  • keep_index (bool, optional, default: true): Whether to return values padded to align with the input time array. If True, replicates daily scores to input resolution.

Returns (list[list]): 2D list of completeness floats corresponding to each input value, or an error string.

Example 1: Day with missing points returns partial completeness

Inputs:

times values freq keep_index
2024-01-01T00:00:00Z 10 6H true
2024-01-01T06:00:00Z
2024-01-01T12:00:00Z 15
2024-01-01T18:00:00Z

Excel formula:

=COMPLETENESS_SCORE({"2024-01-01T00:00:00Z";"2024-01-01T06:00:00Z";"2024-01-01T12:00:00Z";"2024-01-01T18:00:00Z"}, {10;"";15;""}, "6H", TRUE)

Expected output:

Result
0.5
0.5
0.5
0.5
Example 2: Day with all present values returns full completeness

Inputs:

times values freq keep_index
2024-01-02T00:00:00Z 5 12H true
2024-01-02T12:00:00Z 7

Excel formula:

=COMPLETENESS_SCORE({"2024-01-02T00:00:00Z";"2024-01-02T12:00:00Z"}, {5;7}, "12H", TRUE)

Expected output:

Result
1
1
Example 3: Return daily index scores when keep index is false

Inputs:

times values freq keep_index
2024-01-03T00:00:00Z 1 6H false
2024-01-03T06:00:00Z 1
2024-01-03T12:00:00Z 1
2024-01-03T18:00:00Z 1
2024-01-04T00:00:00Z 2
2024-01-04T06:00:00Z
2024-01-04T12:00:00Z 2
2024-01-04T18:00:00Z 2

Excel formula:

=COMPLETENESS_SCORE({"2024-01-03T00:00:00Z";"2024-01-03T06:00:00Z";"2024-01-03T12:00:00Z";"2024-01-03T18:00:00Z";"2024-01-04T00:00:00Z";"2024-01-04T06:00:00Z";"2024-01-04T12:00:00Z";"2024-01-04T18:00:00Z"}, {1;1;1;1;2;"";2;2}, "6H", FALSE)

Expected output:

Result
1
0.75
Example 4: Handle scalar timestamp and value inputs

Inputs:

times values freq keep_index
2024-01-05T00:00:00Z 10 1D true

Excel formula:

=COMPLETENESS_SCORE("2024-01-05T00:00:00Z", 10, "1D", TRUE)

Expected output:

1

Python Code

Show Code
import pandas as pd
from pvanalytics.quality.gaps import completeness_score as result_func

def completeness_score(times, values, freq=None, keep_index=True):
    """
    Calculate a data completeness score for each day from a timestamped PV series.

    See: https://pvanalytics.readthedocs.io/en/stable/generated/pvanalytics.quality.gaps.completeness_score.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        times (list[list]): Timestamps in ISO8601 format.
        values (list[list]): Numeric data values corresponding to the times.
        freq (str, optional): Expected interval between samples as a pandas frequency string (e.g. '15min'). If blank, it is inferred. Default is None.
        keep_index (bool, optional): Whether to return values padded to align with the input time array. If True, replicates daily scores to input resolution. Default is True.

    Returns:
        list[list]: 2D list of completeness floats corresponding to each input value, or an error string.
    """
    try:
        def flatten_str(data):
            if not isinstance(data, list): return [str(data)]
            return [str(val) for row in data for val in (row if isinstance(row, list) else [row]) if val != ""]

        def flatten_num(data):
            if not isinstance(data, list): return [float(data)]
            flat = []
            for row in data:
                row = row if isinstance(row, list) else [row]
                for val in row:
                    if val == "": flat.append(float('nan'))
                    else: flat.append(float(val))
            return flat

        def unwrap_scalar(x):
            if isinstance(x, list) and len(x) == 1 and isinstance(x[0], list) and len(x[0]) == 1:
                return x[0][0]
            return x

        time_list = flatten_str(times)
        val_list = flatten_num(values)

        if len(time_list) != len(val_list):
            return "Error: times and values must have the same length"
        if len(time_list) == 0:
            return "Error: input arrays cannot be empty"

        dt_idx = pd.DatetimeIndex(time_list)
        series = pd.Series(val_list, index=dt_idx)

        freq_val = unwrap_scalar(freq)
        keep_val = unwrap_scalar(keep_index)

        f = str(freq_val) if freq_val is not None and str(freq_val).strip() != "" else None
        keep = bool(keep_val) if keep_val is not None else True

        # pvanalytics uses pandas.infer_freq, which requires at least 3 timestamps.
        # For short series with an explicit freq, compute completeness manually.
        if f is not None and len(time_list) < 3:
            try:
                td = pd.to_timedelta(f)
            except Exception:
                td = pd.to_timedelta(str(f))

            if td <= pd.Timedelta(0):
                return "Error: Invalid frequency"

            df = pd.DataFrame({'value': series})
            df['day'] = df.index.normalize()

            scores = {}
            for day, group in df.groupby('day'):
                count = int(group['value'].notna().sum())
                score = float(count * td / pd.Timedelta(days=1))
                scores[day] = min(score, 1.0)

            if keep:
                return [[scores.get(day, "")] for day in df['day']]
            else:
                return [[scores.get(day, "")] for day in sorted(scores.keys())]

        res = result_func(series, freq=f, keep_index=keep)

        if keep:
            return [[float(v) if not pd.isna(v) else ""] for v in res]
        else:
            # If not keep_index, result is indexed by day.
            # Convert Series of daily values to an array
            return [[float(v) if not pd.isna(v) else ""] for v in res]
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

Timestamps in ISO8601 format.
Numeric data values corresponding to the times.
Expected interval between samples as a pandas frequency string (e.g. '15min'). If blank, it is inferred.
Whether to return values padded to align with the input time array. If True, replicates daily scores to input resolution.

PERF_RATIO_NREL

This function computes the NREL weather-corrected Performance Ratio (PR) of a PV system.

It aggregates incident irradiance, ambient temperature, wind speed, and AC power over the provided data ranges. The calculation uses the SAPM temperature model and a weighted reference temperature derived from the data to weather-correct the PR, allowing for more stable long-term performance tracking.

Excel Usage

=PERF_RATIO_NREL(poa_global, temp_air, wind_speed, pac, pdc_ref, a, b, delta_t, gamma_pdc)
  • poa_global (list[list], required): Total incident irradiance (W/m^2).
  • temp_air (list[list], required): Ambient dry bulb temperature (C).
  • wind_speed (list[list], required): Wind speed at 10 meters height (m/s).
  • pac (list[list], required): AC power output (kW).
  • pdc_ref (float, required): PV array STC nameplate rating (kW).
  • a (float, optional, default: -3.56): SAPM temperature model parameter a (unitless).
  • b (float, optional, default: -0.075): SAPM temperature model parameter b (s/m).
  • delta_t (float, optional, default: 3): SAPM temperature model parameter deltaT (C).
  • gamma_pdc (float, optional, default: -0.00433): Power temperature coefficient (1/C).

Returns (float): Weather-corrected Performance Ratio (float), or an error string.

Example 1: Calculate PR with nominal arrays

Inputs:

poa_global temp_air wind_speed pac pdc_ref
1000 25 2 9.5 10
800 20 1 7.6
900 22 3 8.5

Excel formula:

=PERF_RATIO_NREL({1000;800;900}, {25;20;22}, {2;1;3}, {9.5;7.6;8.5}, 10)

Expected output:

0.948148

Example 2: Calculate PR with custom SAPM and temperature coefficient parameters

Inputs:

poa_global temp_air wind_speed pac pdc_ref a b delta_t gamma_pdc
950 30 1.5 8.9 10 -3.7 -0.08 2.5 -0.004
850 28 2.5 8
780 24 3 7.1

Excel formula:

=PERF_RATIO_NREL({950;850;780}, {30;28;24}, {1.5;2.5;3}, {8.9;8;7.1}, 10, -3.7, -0.08, 2.5, -0.004)

Expected output:

0.930233

Example 3: Handle scalar weather and power inputs

Inputs:

poa_global temp_air wind_speed pac pdc_ref
1000 25 2 9.2 10

Excel formula:

=PERF_RATIO_NREL(1000, 25, 2, 9.2, 10)

Expected output:

0.92

Example 4: Process row-vector arrays for PR calculation

Inputs:

poa_global temp_air wind_speed pac pdc_ref
700 900 1000 18 22 27 2 2.5 1.5 6.1 8.2 9 10

Excel formula:

=PERF_RATIO_NREL({700,900,1000}, {18,22,27}, {2,2.5,1.5}, {6.1,8.2,9}, 10)

Expected output:

0.896154

Python Code

Show Code
import pandas as pd
from pvanalytics.metrics import performance_ratio_nrel as result_func

def perf_ratio_nrel(poa_global, temp_air, wind_speed, pac, pdc_ref, a=-3.56, b=-0.075, delta_t=3, gamma_pdc=-0.00433):
    """
    Calculate the NREL weather-corrected performance ratio from irradiance, temperature, and power data.

    See: https://pvanalytics.readthedocs.io/en/stable/generated/pvanalytics.metrics.performance_ratio_nrel.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        poa_global (list[list]): Total incident irradiance (W/m^2).
        temp_air (list[list]): Ambient dry bulb temperature (C).
        wind_speed (list[list]): Wind speed at 10 meters height (m/s).
        pac (list[list]): AC power output (kW).
        pdc_ref (float): PV array STC nameplate rating (kW).
        a (float, optional): SAPM temperature model parameter a (unitless). Default is -3.56.
        b (float, optional): SAPM temperature model parameter b (s/m). Default is -0.075.
        delta_t (float, optional): SAPM temperature model parameter deltaT (C). Default is 3.
        gamma_pdc (float, optional): Power temperature coefficient (1/C). Default is -0.00433.

    Returns:
        float: Weather-corrected Performance Ratio (float), or an error string.
    """
    try:
        def flatten(data):
            if not isinstance(data, list):
                return [float(data)]
            flat = []
            for row in data:
                if isinstance(row, list):
                    for val in row:
                        if val != "":
                            flat.append(float(val))
                elif row != "":
                    flat.append(float(row))
            return flat

        # Convert inputs to pandas Series for pvanalytics
        poa_s = pd.Series(flatten(poa_global))
        temp_s = pd.Series(flatten(temp_air))
        wind_s = pd.Series(flatten(wind_speed))
        pac_s = pd.Series(flatten(pac))

        n = len(poa_s)
        if not (len(temp_s) == n and len(wind_s) == n and len(pac_s) == n):
            return "Error: Input ranges must have the same number of valid elements"

        if n == 0:
            return "Error: Input arrays cannot be empty"

        pdc_ref_val = float(pdc_ref)
        a_val = float(a) if a is not None else -3.56
        b_val = float(b) if b is not None else -0.075
        dT_val = float(delta_t) if delta_t is not None else 3.0
        gamma_val = float(gamma_pdc) if gamma_pdc is not None else -0.00433

        pr = result_func(
            poa_s, temp_s, wind_s, pac_s, pdc_ref_val,
            a=a_val, b=b_val, deltaT=dT_val, gamma_pdc=gamma_val
        )
        return float(pr)
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

Total incident irradiance (W/m^2).
Ambient dry bulb temperature (C).
Wind speed at 10 meters height (m/s).
AC power output (kW).
PV array STC nameplate rating (kW).
SAPM temperature model parameter a (unitless).
SAPM temperature model parameter b (s/m).
SAPM temperature model parameter deltaT (C).
Power temperature coefficient (1/C).

SPACING

This function verifies the time intervals between consecutive timestamps in a series.

It returns True when the difference between one timestamp and the preceding one exactly matches the specified pandas frequency string (e.g., ‘15min’ or ‘1H’). This is used to identify irregular logging intervals or data gaps.

Excel Usage

=SPACING(times, freq)
  • times (list[list], required): Timestamps in ISO8601 format.
  • freq (str, required): Expected interval between samples as a pandas frequency string (e.g. ‘15min’).

Returns (list[list]): 2D list of booleans (True if spacing matches), or an error string.

Example 1: Check regular 1-hour spacing

Inputs:

times freq
2024-01-01T00:00:00Z 1H
2024-01-01T01:00:00Z
2024-01-01T02:00:00Z

Excel formula:

=SPACING({"2024-01-01T00:00:00Z";"2024-01-01T01:00:00Z";"2024-01-01T02:00:00Z"}, "1H")

Expected output:

Result
true
true
true
Example 2: Identify mismatch in irregular timestamp spacing

Inputs:

times freq
2024-01-01T00:00:00Z 1H
2024-01-01T01:00:00Z
2024-01-01T03:00:00Z

Excel formula:

=SPACING({"2024-01-01T00:00:00Z";"2024-01-01T01:00:00Z";"2024-01-01T03:00:00Z"}, "1H")

Expected output:

Result
true
true
false
Example 3: Handle a scalar timestamp input

Inputs:

times freq
2024-01-01T00:00:00Z 1H

Excel formula:

=SPACING("2024-01-01T00:00:00Z", "1H")

Expected output:

true

Example 4: Validate regular 15-minute intervals

Inputs:

times freq
2024-01-01T00:00:00Z 15min
2024-01-01T00:15:00Z
2024-01-01T00:30:00Z
2024-01-01T00:45:00Z

Excel formula:

=SPACING({"2024-01-01T00:00:00Z";"2024-01-01T00:15:00Z";"2024-01-01T00:30:00Z";"2024-01-01T00:45:00Z"}, "15min")

Expected output:

Result
true
true
true
true

Python Code

Show Code
import pandas as pd
from pvanalytics.quality.time import spacing as result_func

def spacing(times, freq):
    """
    Check that the spacing between timestamps conforms to an expected frequency.

    See: https://pvanalytics.readthedocs.io/en/stable/generated/pvanalytics.quality.time.spacing.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        times (list[list]): Timestamps in ISO8601 format.
        freq (str): Expected interval between samples as a pandas frequency string (e.g. '15min').

    Returns:
        list[list]: 2D list of booleans (True if spacing matches), or an error string.
    """
    try:
        def flatten_str(data):
            if not isinstance(data, list): return [str(data)]
            return [str(val) for row in data for val in (row if isinstance(row, list) else [row]) if val != ""]

        time_list = flatten_str(times)

        if len(time_list) == 0:
            return "Error: input array cannot be empty"

        dt_idx = pd.DatetimeIndex(time_list)

        f = str(freq).strip()
        if not f:
            return "Error: freq must be provided"

        res = result_func(dt_idx, freq=f)

        return [[bool(v)] for v in res]
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

Timestamps in ISO8601 format.
Expected interval between samples as a pandas frequency string (e.g. '15min').

STALE_VALUES_ROUND

This function detects “stale” or repeating sensor values that occur when an instrument becomes stuck.

A point is marked as stale if it belongs to a contiguous sequence (of length at least window) of readings that are identical when rounded to the specified number of decimal places.

Excel Usage

=STALE_VALUES_ROUND(x, window, decimals, mark)
  • x (list[list], required): Numeric data values.
  • window (int, optional, default: 6): Number of consecutive identical elements to be considered stale.
  • decimals (int, optional, default: 3): Number of decimal places to round to before comparing.
  • mark (str, optional, default: “tail”): How much of the sequence to mark True. One of ‘tail’, ‘end’, or ‘all’.

Returns (list[list]): 2D list of booleans (True if stale), or an error string.

Example 1: Detect repeating values over a window

Inputs:

x window decimals mark
1.001 3 2 all
1.001
1.001
1.001
2

Excel formula:

=STALE_VALUES_ROUND({1.001;1.001;1.001;1.001;2}, 3, 2, "all")

Expected output:

Result
true
true
true
true
false
Example 2: Detect stale values with tail marking mode

Inputs:

x window decimals mark
5 3 0 tail
5
5
6

Excel formula:

=STALE_VALUES_ROUND({5;5;5;6}, 3, 0, "tail")

Expected output:

Result
false
true
true
false
Example 3: Detect stale values with end marking mode

Inputs:

x window decimals mark
2.2 3 1 end
2.2
2.2
2.2
2.2

Excel formula:

=STALE_VALUES_ROUND({2.2;2.2;2.2;2.2;2.2}, 3, 1, "end")

Expected output:

Result
false
false
true
true
true
Example 4: Handle scalar stale-value input

Inputs:

x window decimals mark
7.5 3 2 all

Excel formula:

=STALE_VALUES_ROUND(7.5, 3, 2, "all")

Expected output:

false

Python Code

Show Code
import pandas as pd
from pvanalytics.quality.gaps import stale_values_round as result_func

def stale_values_round(x, window=6, decimals=3, mark='tail'):
    """
    Identify stale or stuck-sensor values by rounding and checking for repeating sequences.

    See: https://pvanalytics.readthedocs.io/en/stable/generated/pvanalytics.quality.gaps.stale_values_round.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        x (list[list]): Numeric data values.
        window (int, optional): Number of consecutive identical elements to be considered stale. Default is 6.
        decimals (int, optional): Number of decimal places to round to before comparing. Default is 3.
        mark (str, optional): How much of the sequence to mark True. One of 'tail', 'end', or 'all'. Valid options: Tail, End, All. Default is 'tail'.

    Returns:
        list[list]: 2D list of booleans (True if stale), or an error string.
    """
    try:
        def flatten_num(data):
            if not isinstance(data, list): return [float(data)]
            flat = []
            for row in data:
                row = row if isinstance(row, list) else [row]
                for val in row:
                    if val == "": flat.append(float('nan'))
                    else: flat.append(float(val))
            return flat

        val_list = flatten_num(x)

        if len(val_list) == 0:
            return "Error: input array cannot be empty"

        series = pd.Series(val_list)

        w = int(window) if window is not None else 6
        dec = int(decimals) if decimals is not None else 3
        m = str(mark) if mark is not None else "tail"

        if m not in ["tail", "end", "all"]:
            return "Error: mark must be 'tail', 'end', or 'all'"

        res = result_func(series, window=w, decimals=dec, mark=m)

        return [[bool(v)] for v in res]
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

Numeric data values.
Number of consecutive identical elements to be considered stale.
Number of decimal places to round to before comparing.
How much of the sequence to mark True. One of 'tail', 'end', or 'all'.

TRIM_INCOMPLETE

This function trims a data series by marking early and late periods of low data completeness for exclusion.

It computes a rolling completeness score and returns a boolean array (True meaning the data point is kept) with False values up to the first complete day, True between the first and last complete days, and False thereafter.

Excel Usage

=TRIM_INCOMPLETE(times, values, minimum_completeness, days, freq)
  • times (list[list], required): Timestamps in ISO8601 format.
  • values (list[list], required): Numeric data values corresponding to the times.
  • minimum_completeness (float, optional, default: 0.333333): The minimum required daily completeness score.
  • days (int, optional, default: 10): The consecutive complete days needed before transitioning.
  • freq (str, optional, default: null): Expected interval between samples as a pandas frequency string (e.g. ‘15min’). If blank, it is inferred.

Returns (list[list]): 2D list of booleans (True to keep, False to trim), or an error string.

Example 1: Identifies trailing gaps based on completeness

Inputs:

times values minimum_completeness days freq
2024-01-01T00:00:00Z 10 0.5 1 1D
2024-01-02T00:00:00Z 15
2024-01-03T00:00:00Z 20
2024-01-04T00:00:00Z

Excel formula:

=TRIM_INCOMPLETE({"2024-01-01T00:00:00Z";"2024-01-02T00:00:00Z";"2024-01-03T00:00:00Z";"2024-01-04T00:00:00Z"}, {10;15;20;""}, 0.5, 1, "1D")

Expected output:

Result
true
true
true
false
Example 2: Keep all values when completeness threshold is always met

Inputs:

times values minimum_completeness days freq
2024-01-05T00:00:00Z 10 1 1 1D
2024-01-06T00:00:00Z 11
2024-01-07T00:00:00Z 12

Excel formula:

=TRIM_INCOMPLETE({"2024-01-05T00:00:00Z";"2024-01-06T00:00:00Z";"2024-01-07T00:00:00Z"}, {10;11;12}, 1, 1, "1D")

Expected output:

Result
true
true
true
Example 3: Require multiple complete days before keep window starts

Inputs:

times values minimum_completeness days freq
2024-01-08T00:00:00Z 0.5 2 1D
2024-01-09T00:00:00Z 10
2024-01-10T00:00:00Z 10
2024-01-11T00:00:00Z 10

Excel formula:

=TRIM_INCOMPLETE({"2024-01-08T00:00:00Z";"2024-01-09T00:00:00Z";"2024-01-10T00:00:00Z";"2024-01-11T00:00:00Z"}, {"";10;10;10}, 0.5, 2, "1D")

Expected output:

Result
false
true
true
true
Example 4: Handle scalar time and value inputs

Inputs:

times values minimum_completeness days freq
2024-01-12T00:00:00Z 9 0.5 1 1D

Excel formula:

=TRIM_INCOMPLETE("2024-01-12T00:00:00Z", 9, 0.5, 1, "1D")

Expected output:

true

Python Code

Show Code
import pandas as pd
from pvanalytics.quality.gaps import trim_incomplete as result_func
from pvanalytics.quality.gaps import start_stop_dates

def trim_incomplete(times, values, minimum_completeness=0.333333, days=10, freq=None):
    """
    Return a boolean mask to trim leading and trailing low-completeness periods from a PV series.

    See: https://pvanalytics.readthedocs.io/en/stable/generated/pvanalytics.quality.gaps.trim_incomplete.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        times (list[list]): Timestamps in ISO8601 format.
        values (list[list]): Numeric data values corresponding to the times.
        minimum_completeness (float, optional): The minimum required daily completeness score. Default is 0.333333.
        days (int, optional): The consecutive complete days needed before transitioning. Default is 10.
        freq (str, optional): Expected interval between samples as a pandas frequency string (e.g. '15min'). If blank, it is inferred. Default is None.

    Returns:
        list[list]: 2D list of booleans (True to keep, False to trim), or an error string.
    """
    try:
        def flatten_str(data):
            if not isinstance(data, list): return [str(data)]
            return [str(val) for row in data for val in (row if isinstance(row, list) else [row]) if val != ""]

        def flatten_num(data):
            if not isinstance(data, list): return [float(data)]
            flat = []
            for row in data:
                row = row if isinstance(row, list) else [row]
                for val in row:
                    if val == "": flat.append(float('nan'))
                    else: flat.append(float(val))
            return flat

        time_list = flatten_str(times)
        val_list = flatten_num(values)

        if len(time_list) != len(val_list):
            return "Error: times and values must have the same length"
        if len(time_list) == 0:
            return "Error: input arrays cannot be empty"

        dt_idx = pd.DatetimeIndex(time_list)
        series = pd.Series(val_list, index=dt_idx)

        min_comp = float(minimum_completeness) if minimum_completeness is not None else 0.333333
        d = int(days) if days is not None else 10
        f = str(freq) if freq is not None and str(freq).strip() != "" else None

        # pvanalytics requires at least 3 timestamps to infer frequency even if a
        # frequency is provided. For short series, compute the completeness/trim
        # logic ourselves to avoid this limitation.
        if len(series.index) < 3:
            if f is None:
                return "Error: Need at least 3 dates to infer frequency"

            try:
                offset = pd.tseries.frequencies.to_offset(f)
                seconds_per_sample = float(offset.delta.total_seconds())
            except Exception:
                return "Error: Invalid freq"

            daily_counts = series.resample('D').count()
            daily_completeness = (daily_counts * seconds_per_sample) / (1440 * 60)
            daily_good = daily_completeness >= min_comp

            start, end = start_stop_dates(daily_good, days=d)
            mask = pd.Series(False, index=series.index)
            if start:
                mask.loc[start:end] = True
            return [[bool(v)] for v in mask]

        # For longer series, defer to the library implementation.
        res = result_func(series, minimum_completeness=min_comp, days=d, freq=f)

        return [[bool(v)] for v in res]
    except Exception as e:
        return f"Error: {str(e)}"

Online Calculator

Timestamps in ISO8601 format.
Numeric data values corresponding to the times.
The minimum required daily completeness score.
The consecutive complete days needed before transitioning.
Expected interval between samples as a pandas frequency string (e.g. '15min'). If blank, it is inferred.