graph TD
A[Start: PV dataset ready for analysis] --> B{Are timestamps regular?}
B -- No --> C[Use SPACING]
B -- Yes --> D{Is coverage sufficient by day?}
C --> D
D -- No --> E[Use COMPLETENESS_SCORE and TRIM_INCOMPLETE]
D -- Yes --> F{Are irradiance values physically plausible?}
E --> F
F -- No --> G[Use CHECK_GHI_QCRAD / CHECK_DHI_QCRAD / CHECK_DNI_QCRAD]
F -- Yes --> H{Do components agree?}
G --> H
H -- No --> I[Use CHECK_IRRAD_QCRAD]
H -- Yes --> J{Any frozen or stuck sensors?}
I --> J
J -- Yes --> K[Use STALE_VALUES_ROUND]
J -- No --> L[Compute weather-corrected KPI]
K --> L
L --> M[Use PERF_RATIO_NREL]
Data Quality
Overview
Introduction Photovoltaic data quality is the discipline of testing whether time-series measurements from a PV system are physically plausible, internally consistent, and complete enough for analysis, forecasting, performance assessment, and warranty-grade reporting. In practical terms, it sits between raw telemetry and downstream decisions: if the quality layer is weak, every KPI built on top of it is unstable. PV systems produce rich streams of irradiance, temperature, power, and status signals; however, those streams are vulnerable to sensor drift, communication outages, timestamp misalignment, logger freezes, and environmental edge cases. Data-quality methods convert these imperfections into explicit, auditable flags so analysts can separate trustworthy observations from suspect ones.
From a mathematical perspective, PV data quality combines three broad families of checks:
- Range and physics checks: values must lie within bounds implied by geometry and extraterrestrial irradiance.
- Relational checks: variables that should cohere (for example, irradiance components) are tested for consistency.
- Coverage and continuity checks: timestamps and sampling density must support intended calculations.
These ideas map closely to established quality-control frameworks in atmospheric and solar monitoring, including QCRad concepts and broader sensor QA/QC principles described in references such as Data quality. In this category, Boardflare implements production-ready checks through pvanalytics, an open-source PV data-quality library from the PV performance community. Because the calculators are exposed in spreadsheet-friendly form, teams can apply scientific QA logic without writing bespoke code pipelines.
This matters to business users because financial and operational outcomes are sensitive to small data errors. A biased irradiance sensor can make a healthy plant look underperforming; missing intervals can distort contractual availability metrics; stale values can hide inverter trips; and untrimmed commissioning periods can depress annualized performance ratio. The net effect is avoidable uncertainty in O&M prioritization, investor reporting, and root-cause analysis. A disciplined quality workflow, by contrast, improves confidence intervals, reduces false alarms, and shortens time-to-diagnosis when problems are real.
Within this category, the tools span the full quality chain: irradiance limit checks (CHECK_GHI_QCRAD, CHECK_DHI_QCRAD, CHECK_DNI_QCRAD), component consistency validation (CHECK_IRRAD_QCRAD), temporal regularity (SPACING), missingness quantification (COMPLETENESS_SCORE), edge-period trimming (TRIM_INCOMPLETE), sensor-freeze detection (STALE_VALUES_ROUND), and weather-corrected fleet KPI estimation (PERF_RATIO_NREL). Together they support a “quality-first” analytics architecture where each downstream metric can be tied back to transparent inclusion rules.
When to Use It Use this category when the core job is not “compute a metric quickly,” but “compute a metric you can trust.” In operating PV portfolios, many disputes are not about formula choice; they are about whether the input data is valid for the formula at all. The calculators here are especially useful when teams need reproducible quality decisions across many sites, long historical windows, or mixed data infrastructures.
One common scenario is monthly performance reporting for asset managers. The analyst must publish PR trends, identify underperformers, and justify exclusions to owners or lenders. A practical workflow is to verify timestamp cadence with SPACING, quantify missingness with COMPLETENESS_SCORE, exclude low-quality edges with TRIM_INCOMPLETE, flag physically impossible irradiance points via CHECK_GHI_QCRAD, CHECK_DHI_QCRAD, and CHECK_DNI_QCRAD, then compute weather-corrected KPI using PERF_RATIO_NREL. This sequence creates a defensible audit trail: each dropped interval is explained by a specific rule, rather than manual judgment.
Another high-value scenario is SCADA troubleshooting and anomaly triage. During a suspected communications issue, values may appear flat for hours even though the plant is changing. STALE_VALUES_ROUND detects repeated rounded values characteristic of frozen sensors or logger buffering artifacts. In parallel, SPACING reveals irregular logging intervals, and COMPLETENESS_SCORE surfaces the effective coverage loss day by day. Engineers can then distinguish “real plant stability” from “telemetry malfunction,” reducing unnecessary field dispatches.
A third scenario is bankable energy-model backtesting and model calibration. Modelers need high-integrity irradiance decomposition relationships to evaluate transposition, inverter clipping assumptions, and loss factors. CHECK_IRRAD_QCRAD is central here because it tests whether measured GHI, DHI, and DNI satisfy expected component relationships under valid solar geometry. Combined with range screens (CHECK_GHI_QCRAD, CHECK_DHI_QCRAD, CHECK_DNI_QCRAD), this reduces bias from unphysical points that would otherwise skew calibration coefficients.
These tools are also useful during commissioning handoff and data migration projects. When historical feeds are replatformed, timestamp shifts and null-handling differences often appear. A quick run of SPACING and COMPLETENESS_SCORE can detect structural issues before they contaminate dashboards. If the first and last weeks are unstable due to onboarding gaps, TRIM_INCOMPLETE can define the “analysis-valid” window without ad hoc date cuts.
At an organizational level, teams should apply these checks whenever decisions involve cost, risk transfer, or contractual accountability: availability guarantees, underperformance claims, O&M incentive calculations, and degradation studies. In all those cases, a standardized QA layer is cheaper than arguing later about whether the input data should have been included.
How It Works The category is implemented with pvanalytics, with one key metric aligned to NREL methodology via pvanalytics metrics documentation. Conceptually, the calculators apply boolean masks and scoring functions to indexed time series. Each mask marks valid/invalid records under a specific assumption; masks can then be combined using logical operations to produce a final analysis set.
For irradiance limits, QCRad-style checks use solar geometry and extraterrestrial irradiance as dynamic bounds. A simplified representation is:
L(\theta_z, E_0) < I(t) < U(\theta_z, E_0)
where I(t) is measured irradiance (GHI, DHI, or DNI), \theta_z is solar zenith, and E_0 is extraterrestrial normal irradiance. The lower bound enforces non-negative/physically plausible minima, and the upper bound scales with conditions that limit what sunlight can realistically deliver at the sensor plane. Boardflare exposes these checks as CHECK_GHI_QCRAD, CHECK_DHI_QCRAD, and CHECK_DNI_QCRAD, each supporting “physical” and “extreme” limit modes.
Component consistency uses the irradiance closure identity (with geometry projection):
GHI \approx DNI \cos(\theta_z) + DHI
The calculator CHECK_IRRAD_QCRAD evaluates whether this relation is sufficiently satisfied and whether diffuse fraction behavior is reasonable under domain-specific criteria. This is powerful because each individual channel could pass its own range test yet still be inconsistent as a trio, indicating calibration mismatch, shading artifacts, tracker metadata issues, or timestamp offsets.
Temporal integrity is handled in two layers. First, SPACING checks that adjacent timestamps match expected interval \Delta t:
t_i - t_{i-1} = \Delta t
Second, COMPLETENESS_SCORE computes daily coverage as observed valid duration divided by expected duration. In simplified form:
C_d = \frac{\sum_{i \in d} \mathbf{1}_{\text{valid},i} \cdot \Delta t}{24\,\text{h}}
where C_d \in [0,1] is the day-level completeness score. This decouples “are timestamps regular?” from “how much valid data exists?”, both of which are necessary for trustworthy aggregation.
Edge trimming then uses completeness history to define analysis windows. TRIM_INCOMPLETE finds sustained periods meeting a minimum completeness threshold and masks sparse startup/shutdown tails. Operationally, this prevents short incomplete segments from disproportionately affecting monthly or annual KPIs.
Stale-value detection, via STALE_VALUES_ROUND, identifies contiguous runs where rounded values repeat for at least a window length. If x_t is rounded signal value and window size is w, stale conditions are detected where:
x_t = x_{t-1} = \cdots = x_{t-w+1}
under user-selected marking logic (tail, end, or all). This captures a common real-world failure mode: sensors or loggers that stop updating numerically but continue emitting timestamps.
Finally, PERF_RATIO_NREL produces weather-corrected PR by normalizing AC output against irradiance and temperature-adjusted reference behavior. The exact implementation follows NREL-style methodology in pvanalytics, including SAPM temperature model parameters. A conceptual form is:
PR_{wc} = \frac{\sum P_{AC}}{\sum \left(P_{dc,ref}\,\frac{G_{POA}}{G_{ref}}\,f_T\right)}
where f_T applies temperature correction using parameters such as a, b, \Delta T, and \gamma_{pdc}. The business value is comparability: weather-corrected PR is less sensitive to meteorological variation, so trend changes more often reflect asset performance than weather noise.
An important assumption across all methods is that metadata (timezone, sensor orientation, naming conventions, units) is coherent. Quality checks can catch many errors, but they cannot fully compensate for fundamentally misconfigured source systems. For robust operations, teams should pair these calculators with governance controls: fixed unit standards, explicit null policies, and consistent timestamp localization.
Practical Example Consider a utility-scale PV plant where an analyst must publish a monthly operations memo and answer a recurring question: “Did PR decline due to plant issues, or due to data artifacts?” The raw feed includes 5-minute irradiance components (GHI, DHI, DNI), ambient temperature, wind speed, and AC power. Over the month, there are several logger interruptions and a suspected irradiance sensor freeze.
Step 1 is structural validation. The analyst runs SPACING on timestamps with 5min frequency to detect interval discontinuities. Any nonconforming points are flagged for investigation and are excluded from downstream KPI windows. This catches a subtle but common issue: duplicated or skipped records after SCADA reconnect.
Step 2 is coverage quantification. Using COMPLETENESS_SCORE, the analyst computes day-level completeness, optionally preserving original index granularity for easier joins. Days with severe data loss are visible immediately (for example, C_d < 0.7). Instead of manually inspecting logs, the team gets an objective daily coverage metric suitable for governance thresholds.
Step 3 is edge-window hygiene. With TRIM_INCOMPLETE, the analyst trims leading and trailing low-completeness periods caused by maintenance and communications cutover. This avoids contaminating “month-to-date PR” with sparse endpoints that do not represent normal operation.
Step 4 is irradiance plausibility screening. The analyst applies CHECK_GHI_QCRAD, CHECK_DHI_QCRAD, and CHECK_DNI_QCRAD with physical limits for routine reporting and optionally extreme limits for forensic review. This removes unphysical spikes, negative daytime irradiance, and implausible upper-bound excursions.
Step 5 is relational consistency testing. CHECK_IRRAD_QCRAD evaluates whether component closure approximately holds. During the suspected freeze window, the check reveals repeated diffuse patterns inconsistent with DNI\cos(\theta_z)+DHI, confirming instrumentation artifacts rather than true irradiance behavior.
Step 6 is stale-run detection. STALE_VALUES_ROUND is applied to key channels (often irradiance and power) with a window tuned to operational cadence (for 5-minute data, maybe 6–12 points). This identifies frozen-value plateaus that would otherwise bias averages.
Step 7 is KPI computation on the quality-controlled subset. The final mask is built from logical conjunction of spacing validity, acceptable completeness window, irradiance range passes, consistency flags, and stale-value exclusions. PERF_RATIO_NREL is then computed on this curated dataset.
In the memo, the analyst reports both raw and quality-controlled PR, plus counts of excluded intervals by rule. This framing changes the conversation from subjective “data looked odd” to objective “3.8% of records excluded: 1.2% spacing failures, 0.9% stale runs, 1.7% irradiance QC violations.” Stakeholders gain confidence because method and impact are transparent, repeatable, and comparable month to month.
Compared with traditional Excel-only workflows, this approach is faster and more robust. Instead of building fragile custom formulas for each site, teams reuse standardized calculators with well-defined semantics from pvanalytics. The result is lower analyst variance, easier peer review, and better portability across portfolios.
How to Choose Choose calculators based on the question being answered and where uncertainty enters the pipeline: timestamps, missingness, physical plausibility, cross-channel consistency, instrument behavior, or final KPI normalization.
| Function | Best use case | Inputs required | Strengths | Caveats |
|---|---|---|---|---|
| SPACING | Detect irregular or broken timestamp cadence | Time index + expected freq |
Fast structural gate before all other tests | Does not quantify missingness severity by itself |
| COMPLETENESS_SCORE | Measure effective day-level data availability | Time index + values (+ optional freq) |
Produces interpretable 0–1 coverage metric | High completeness does not guarantee physical correctness |
| TRIM_INCOMPLETE | Exclude weak leading/trailing periods | Time index + values + threshold settings | Standardizes analysis window selection | Threshold choice should align with reporting policy |
| CHECK_GHI_QCRAD | Screen GHI for physical/extreme bounds | GHI, zenith, extraterrestrial irradiance | Domain-aware dynamic limits | Needs accurate zenith and units |
| CHECK_DHI_QCRAD | Screen DHI plausibility | DHI, zenith, extraterrestrial irradiance | Captures diffuse-channel outliers | Same metadata dependencies as other QCRad checks |
| CHECK_DNI_QCRAD | Screen DNI plausibility | DNI, zenith, extraterrestrial irradiance | Catches direct-beam spikes/invalid points | Sensitive to sensor alignment and soiling effects |
| CHECK_IRRAD_QCRAD | Validate GHI-DHI-DNI closure consistency | Zenith + all three irradiance components | Detects cross-channel mismatch missed by single-channel bounds | Must interpret night/out-of-domain handling correctly |
| STALE_VALUES_ROUND | Identify frozen sensors/loggers via repeated values | Signal series + window/rounding/mark rules | Excellent for telemetry fault detection | Needs sensible window setting for sample rate |
| PERF_RATIO_NREL | Compute weather-corrected plant performance | POA irradiance, weather, AC power, rating | Produces normalized KPI suitable for trend comparison | Quality of inputs strongly determines interpretability |
A practical selection pattern is to run tools in this order: structural checks, coverage checks, physics checks, relational checks, stale checks, then KPI computation. In other words, PERF_RATIO_NREL should usually be the endpoint, not the starting point. If data quality is uncertain, compute PR twice (raw and filtered) and report the delta; large deltas often indicate data-process problems rather than plant degradation.
For policy design, teams can define a tiered approach: “monitoring tier” with milder thresholds for early warning, and “reporting tier” with stricter exclusions for contractual numbers. The same calculators support both tiers by changing limit modes and completeness requirements. This aligns well with the dual pillar strategy: one pillar for broad operational visibility, another for high-confidence decision support.
In summary, pick each function based on the failure mode you need to control, then combine outputs into a transparent inclusion mask. The category is strongest when used as a system, not a menu of isolated checks.
CHECK_DHI_QCRAD
This function tests for lower and upper limits on Diffuse Horizontal Irradiance (DHI) using the QCRad criteria.
The test is applied to each DHI value. A reading passes if it is strictly greater than the lower bound and less than the upper bound. The bounds can be either ‘physical’ or ‘extreme’ criteria.
Excel Usage
=CHECK_DHI_QCRAD(dhi, solar_zenith, dni_extra, limits)
dhi(list[list], required): Diffuse horizontal irradiance (W/m^2).solar_zenith(list[list], required): Solar zenith angle (degrees).dni_extra(list[list], required): Extraterrestrial normal irradiance (W/m^2).limits(str, optional, default: “physical”): QCRAD QC limits.
Returns (list[list]): 2D list of booleans (True if passed limits), or an error string.
Example 1: Check DHI physical limits pass and fail cases
Inputs:
| dhi | solar_zenith | dni_extra | limits |
|---|---|---|---|
| 100 | 30 | 1367 | physical |
| -10 | 30 | 1367 | |
| 1500 | 30 | 1367 |
Excel formula:
=CHECK_DHI_QCRAD({100;-10;1500}, {30;30;30}, {1367;1367;1367}, "physical")
Expected output:
| Result |
|---|
| true |
| false |
| false |
Example 2: Evaluate DHI values with extreme limits setting
Inputs:
| dhi | solar_zenith | dni_extra | limits |
|---|---|---|---|
| 80 | 20 | 1367 | extreme |
| 400 | 45 | 1367 | |
| 1100 | 60 | 1367 |
Excel formula:
=CHECK_DHI_QCRAD({80;400;1100}, {20;45;60}, {1367;1367;1367}, "extreme")
Expected output:
| Result |
|---|
| true |
| true |
| false |
Example 3: Handle scalar inputs as single observations
Inputs:
| dhi | solar_zenith | dni_extra | limits |
|---|---|---|---|
| 200 | 35 | 1367 | physical |
Excel formula:
=CHECK_DHI_QCRAD(200, 35, 1367, "physical")
Expected output:
true
Example 4: Process a row vector of DHI observations
Inputs:
| dhi | solar_zenith | dni_extra | limits | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 50 | 150 | 300 | 10 | 30 | 50 | 1367 | 1367 | 1367 | physical |
Excel formula:
=CHECK_DHI_QCRAD({50,150,300}, {10,30,50}, {1367,1367,1367}, "physical")
Expected output:
| Result |
|---|
| true |
| true |
| true |
Python Code
Show Code
import pandas as pd
from pvanalytics.quality.irradiance import check_dhi_limits_qcrad as result_func
def check_dhi_qcrad(dhi, solar_zenith, dni_extra, limits='physical'):
"""
Return a pass/fail QC flag array for each DHI reading against QCRad physical or extreme limits.
See: https://pvanalytics.readthedocs.io/en/stable/generated/pvanalytics.quality.irradiance.check_dhi_limits_qcrad.html
This example function is provided as-is without any representation of accuracy.
Args:
dhi (list[list]): Diffuse horizontal irradiance (W/m^2).
solar_zenith (list[list]): Solar zenith angle (degrees).
dni_extra (list[list]): Extraterrestrial normal irradiance (W/m^2).
limits (str, optional): QCRAD QC limits. Valid options: Physical, Extreme. Default is 'physical'.
Returns:
list[list]: 2D list of booleans (True if passed limits), or an error string.
"""
try:
def flatten(data):
if not isinstance(data, list):
return [float(data)]
flat = []
for row in data:
if isinstance(row, list):
for val in row:
if val != "":
flat.append(float(val))
elif row != "":
flat.append(float(row))
return flat
dhi_s = pd.Series(flatten(dhi))
zenith_s = pd.Series(flatten(solar_zenith))
dni_extra_s = pd.Series(flatten(dni_extra))
n = len(dhi_s)
if not (len(zenith_s) == n and len(dni_extra_s) == n):
return "Error: All input arrays must have the same length"
if n == 0:
return "Error: Arrays cannot be empty"
lim = str(limits) if limits is not None else "physical"
if lim not in ["physical", "extreme"]:
return "Error: limits must be either 'physical' or 'extreme'"
res = result_func(dhi_s, zenith_s, dni_extra_s, limits=lim)
return [[bool(v)] for v in res]
except Exception as e:
return f"Error: {str(e)}"Online Calculator
CHECK_DNI_QCRAD
This function tests for lower and upper limits on Direct Normal Irradiance (DNI) using the QCRad criteria.
The test is applied to each DNI value. A reading passes if it is strictly greater than the lower bound and less than the upper bound. The bounds can be either ‘physical’ or ‘extreme’ criteria.
Excel Usage
=CHECK_DNI_QCRAD(dni, solar_zenith, dni_extra, limits)
dni(list[list], required): Direct normal irradiance (W/m^2).solar_zenith(list[list], required): Solar zenith angle (degrees).dni_extra(list[list], required): Extraterrestrial normal irradiance (W/m^2).limits(str, optional, default: “physical”): QCRAD QC limits.
Returns (list[list]): 2D list of booleans (True if passed limits), or an error string.
Example 1: Check DNI physical limits pass and fail cases
Inputs:
| dni | solar_zenith | dni_extra | limits |
|---|---|---|---|
| 800 | 30 | 1367 | physical |
| -10 | 30 | 1367 | |
| 1500 | 30 | 1367 |
Excel formula:
=CHECK_DNI_QCRAD({800;-10;1500}, {30;30;30}, {1367;1367;1367}, "physical")
Expected output:
| Result |
|---|
| true |
| false |
| false |
Example 2: Evaluate DNI values with extreme limits setting
Inputs:
| dni | solar_zenith | dni_extra | limits |
|---|---|---|---|
| 500 | 15 | 1367 | extreme |
| 900 | 30 | 1367 | |
| 1200 | 55 | 1367 |
Excel formula:
=CHECK_DNI_QCRAD({500;900;1200}, {15;30;55}, {1367;1367;1367}, "extreme")
Expected output:
| Result |
|---|
| true |
| true |
| false |
Example 3: Handle scalar DNI inputs as single observations
Inputs:
| dni | solar_zenith | dni_extra | limits |
|---|---|---|---|
| 700 | 25 | 1367 | physical |
Excel formula:
=CHECK_DNI_QCRAD(700, 25, 1367, "physical")
Expected output:
true
Example 4: Process a row vector of DNI observations
Inputs:
| dni | solar_zenith | dni_extra | limits | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 300 | 600 | 900 | 10 | 20 | 40 | 1367 | 1367 | 1367 | physical |
Excel formula:
=CHECK_DNI_QCRAD({300,600,900}, {10,20,40}, {1367,1367,1367}, "physical")
Expected output:
| Result |
|---|
| true |
| true |
| true |
Python Code
Show Code
import pandas as pd
from pvanalytics.quality.irradiance import check_dni_limits_qcrad as result_func
def check_dni_qcrad(dni, solar_zenith, dni_extra, limits='physical'):
"""
Return a pass/fail QC flag array for each DNI reading against QCRad physical or extreme limits.
See: https://pvanalytics.readthedocs.io/en/stable/generated/pvanalytics.quality.irradiance.check_dni_limits_qcrad.html
This example function is provided as-is without any representation of accuracy.
Args:
dni (list[list]): Direct normal irradiance (W/m^2).
solar_zenith (list[list]): Solar zenith angle (degrees).
dni_extra (list[list]): Extraterrestrial normal irradiance (W/m^2).
limits (str, optional): QCRAD QC limits. Valid options: Physical, Extreme. Default is 'physical'.
Returns:
list[list]: 2D list of booleans (True if passed limits), or an error string.
"""
try:
def flatten(data):
if not isinstance(data, list):
return [float(data)]
flat = []
for row in data:
if isinstance(row, list):
for val in row:
if val != "":
flat.append(float(val))
elif row != "":
flat.append(float(row))
return flat
dni_s = pd.Series(flatten(dni))
zenith_s = pd.Series(flatten(solar_zenith))
dni_extra_s = pd.Series(flatten(dni_extra))
n = len(dni_s)
if not (len(zenith_s) == n and len(dni_extra_s) == n):
return "Error: All input arrays must have the same length"
if n == 0:
return "Error: Arrays cannot be empty"
lim = str(limits) if limits is not None else "physical"
if lim not in ["physical", "extreme"]:
return "Error: limits must be either 'physical' or 'extreme'"
res = result_func(dni_s, zenith_s, dni_extra_s, limits=lim)
return [[bool(v)] for v in res]
except Exception as e:
return f"Error: {str(e)}"Online Calculator
CHECK_GHI_QCRAD
This function tests for lower and upper limits on Global Horizontal Irradiance (GHI) using the QCRad criteria.
The test is applied to each GHI value. A reading passes if it is strictly greater than the lower bound and less than the upper bound. The bounds can be either ‘physical’ or ‘extreme’ criteria.
Excel Usage
=CHECK_GHI_QCRAD(ghi, solar_zenith, dni_extra, limits)
ghi(list[list], required): Global horizontal irradiance (W/m^2).solar_zenith(list[list], required): Solar zenith angle (degrees).dni_extra(list[list], required): Extraterrestrial normal irradiance (W/m^2).limits(str, optional, default: “physical”): QCRAD QC limits.
Returns (list[list]): 2D list of booleans (True if passed limits), or an error string.
Example 1: Check GHI physical limits pass and fail cases
Inputs:
| ghi | solar_zenith | dni_extra | limits |
|---|---|---|---|
| 1000 | 30 | 1367 | physical |
| -10 | 30 | 1367 | |
| 1500 | 90 | 1367 |
Excel formula:
=CHECK_GHI_QCRAD({1000;-10;1500}, {30;30;90}, {1367;1367;1367}, "physical")
Expected output:
| Result |
|---|
| true |
| false |
| false |
Example 2: Evaluate GHI values with extreme limits setting
Inputs:
| ghi | solar_zenith | dni_extra | limits |
|---|---|---|---|
| 200 | 10 | 1367 | extreme |
| 800 | 35 | 1367 | |
| 1300 | 70 | 1367 |
Excel formula:
=CHECK_GHI_QCRAD({200;800;1300}, {10;35;70}, {1367;1367;1367}, "extreme")
Expected output:
| Result |
|---|
| true |
| true |
| false |
Example 3: Handle scalar GHI inputs as single observations
Inputs:
| ghi | solar_zenith | dni_extra | limits |
|---|---|---|---|
| 600 | 40 | 1367 | physical |
Excel formula:
=CHECK_GHI_QCRAD(600, 40, 1367, "physical")
Expected output:
true
Example 4: Process a row vector of GHI observations
Inputs:
| ghi | solar_zenith | dni_extra | limits | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 150 | 450 | 900 | 15 | 25 | 45 | 1367 | 1367 | 1367 | physical |
Excel formula:
=CHECK_GHI_QCRAD({150,450,900}, {15,25,45}, {1367,1367,1367}, "physical")
Expected output:
| Result |
|---|
| true |
| true |
| true |
Python Code
Show Code
import pandas as pd
from pvanalytics.quality.irradiance import check_ghi_limits_qcrad as result_func
def check_ghi_qcrad(ghi, solar_zenith, dni_extra, limits='physical'):
"""
Return a pass/fail QC flag array for each GHI reading against QCRad physical or extreme limits.
See: https://pvanalytics.readthedocs.io/en/stable/generated/pvanalytics.quality.irradiance.check_ghi_limits_qcrad.html
This example function is provided as-is without any representation of accuracy.
Args:
ghi (list[list]): Global horizontal irradiance (W/m^2).
solar_zenith (list[list]): Solar zenith angle (degrees).
dni_extra (list[list]): Extraterrestrial normal irradiance (W/m^2).
limits (str, optional): QCRAD QC limits. Valid options: Physical, Extreme. Default is 'physical'.
Returns:
list[list]: 2D list of booleans (True if passed limits), or an error string.
"""
try:
def flatten(data):
if not isinstance(data, list):
return [float(data)]
flat = []
for row in data:
if isinstance(row, list):
for val in row:
if val != "":
flat.append(float(val))
elif row != "":
flat.append(float(row))
return flat
ghi_s = pd.Series(flatten(ghi))
zenith_s = pd.Series(flatten(solar_zenith))
dni_extra_s = pd.Series(flatten(dni_extra))
n = len(ghi_s)
if not (len(zenith_s) == n and len(dni_extra_s) == n):
return "Error: All input arrays must have the same length"
if n == 0:
return "Error: Arrays cannot be empty"
lim = str(limits) if limits is not None else "physical"
if lim not in ["physical", "extreme"]:
return "Error: limits must be either 'physical' or 'extreme'"
res = result_func(ghi_s, zenith_s, dni_extra_s, limits=lim)
return [[bool(v)] for v in res]
except Exception as e:
return f"Error: {str(e)}"Online Calculator
CHECK_IRRAD_QCRAD
This function checks the consistency of GHI, DHI, and DNI using the QCRad criteria.
The algorithm validates that the measured GHI is consistent with the calculated component sum (DNI * cos(zenith) + DHI) and verifies that the ratio of DHI to GHI falls within reasonable bounds. The output is a 2D array with two columns for each point: the first representing component consistency, the second representing diffuse ratio limits.
Excel Usage
=CHECK_IRRAD_QCRAD(solar_zenith, ghi, dhi, dni, outside_domain)
solar_zenith(list[list], required): Solar zenith angle (degrees).ghi(list[list], required): Global horizontal irradiance (W/m^2).dhi(list[list], required): Diffuse horizontal irradiance (W/m^2).dni(list[list], required): Direct normal irradiance (W/m^2).outside_domain(bool, optional, default: false): Value to return when tests are not applicable (e.g. night time).
Returns (list[list]): 2D list [[components_consistent, diffuse_ratio_pass]], or an error string.
Example 1: Check irradiance component consistency
Inputs:
| solar_zenith | ghi | dhi | dni | outside_domain |
|---|---|---|---|---|
| 30 | 1000 | 100 | 1039.23 | false |
| 30 | 0 | 0 | 0 | |
| 90 | 1000 | 10 | 900 |
Excel formula:
=CHECK_IRRAD_QCRAD({30;30;90}, {1000;0;1000}, {100;0;10}, {1039.23;0;900}, FALSE)
Expected output:
| Result | |
|---|---|
| true | true |
| false | false |
| false | true |
Example 2: Use outside domain flag for non-applicable points
Inputs:
| solar_zenith | ghi | dhi | dni | outside_domain |
|---|---|---|---|---|
| 95 | 0 | 0 | 0 | true |
| 92 | 5 | 3 | 5 | |
| 89 | 20 | 10 | 50 |
Excel formula:
=CHECK_IRRAD_QCRAD({95;92;89}, {0;5;20}, {0;3;10}, {0;5;50}, TRUE)
Expected output:
| Result | |
|---|---|
| true | true |
| true | true |
| true | true |
Example 3: Handle scalar irradiance consistency inputs
Inputs:
| solar_zenith | ghi | dhi | dni | outside_domain |
|---|---|---|---|---|
| 30 | 900 | 100 | 923.76 | false |
Excel formula:
=CHECK_IRRAD_QCRAD(30, 900, 100, 923.76, FALSE)
Expected output:
| Result | |
|---|---|
| true | true |
Example 4: Process row-vector irradiance consistency input
Inputs:
| solar_zenith | ghi | dhi | dni | outside_domain | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 20 | 40 | 60 | 900 | 700 | 400 | 120 | 150 | 180 | 830 | 720 | 440 | false |
Excel formula:
=CHECK_IRRAD_QCRAD({20,40,60}, {900,700,400}, {120,150,180}, {830,720,440}, FALSE)
Expected output:
| Result | |
|---|---|
| true | true |
| true | true |
| true | true |
Python Code
Show Code
import pandas as pd
from pvanalytics.quality.irradiance import check_irradiance_consistency_qcrad as result_func
def check_irrad_qcrad(solar_zenith, ghi, dhi, dni, outside_domain=False):
"""
Return consistency flags for irradiance-component balance from GHI, DNI, DHI, and zenith ranges.
See: https://pvanalytics.readthedocs.io/en/stable/generated/pvanalytics.quality.irradiance.check_irradiance_consistency_qcrad.html
This example function is provided as-is without any representation of accuracy.
Args:
solar_zenith (list[list]): Solar zenith angle (degrees).
ghi (list[list]): Global horizontal irradiance (W/m^2).
dhi (list[list]): Diffuse horizontal irradiance (W/m^2).
dni (list[list]): Direct normal irradiance (W/m^2).
outside_domain (bool, optional): Value to return when tests are not applicable (e.g. night time). Default is False.
Returns:
list[list]: 2D list [[components_consistent, diffuse_ratio_pass]], or an error string.
"""
try:
def flatten(data):
if not isinstance(data, list):
return [float(data)]
flat = []
for row in data:
if isinstance(row, list):
for val in row:
if val != "":
flat.append(float(val))
elif row != "":
flat.append(float(row))
return flat
zenith_s = pd.Series(flatten(solar_zenith))
ghi_s = pd.Series(flatten(ghi))
dhi_s = pd.Series(flatten(dhi))
dni_s = pd.Series(flatten(dni))
n = len(zenith_s)
if not (len(ghi_s) == n and len(dhi_s) == n and len(dni_s) == n):
return "Error: All input arrays must have the same length"
if n == 0:
return "Error: Arrays cannot be empty"
out_dom = bool(outside_domain) if outside_domain is not None else False
comp, ratio = result_func(zenith_s, ghi_s, dhi_s, dni_s, outside_domain=out_dom)
result = []
for c, r in zip(comp, ratio):
result.append([bool(c), bool(r)])
return result
except Exception as e:
return f"Error: {str(e)}"Online Calculator
COMPLETENESS_SCORE
This function computes a daily data completeness score.
The completeness score for a given day is the fraction of the day for which there is valid data (non-missing values). The time duration attributed to each valid reading is equal to the timestamp spacing of the series, or the specified expected frequency. For example, a 24-hour time series with 30-minute spacing and 24 non-missing values translates to 12 hours of valid data, yielding a completeness score of 0.5.
Excel Usage
=COMPLETENESS_SCORE(times, values, freq, keep_index)
times(list[list], required): Timestamps in ISO8601 format.values(list[list], required): Numeric data values corresponding to the times.freq(str, optional, default: null): Expected interval between samples as a pandas frequency string (e.g. ‘15min’). If blank, it is inferred.keep_index(bool, optional, default: true): Whether to return values padded to align with the input time array. If True, replicates daily scores to input resolution.
Returns (list[list]): 2D list of completeness floats corresponding to each input value, or an error string.
Example 1: Day with missing points returns partial completeness
Inputs:
| times | values | freq | keep_index |
|---|---|---|---|
| 2024-01-01T00:00:00Z | 10 | 6H | true |
| 2024-01-01T06:00:00Z | |||
| 2024-01-01T12:00:00Z | 15 | ||
| 2024-01-01T18:00:00Z |
Excel formula:
=COMPLETENESS_SCORE({"2024-01-01T00:00:00Z";"2024-01-01T06:00:00Z";"2024-01-01T12:00:00Z";"2024-01-01T18:00:00Z"}, {10;"";15;""}, "6H", TRUE)
Expected output:
| Result |
|---|
| 0.5 |
| 0.5 |
| 0.5 |
| 0.5 |
Example 2: Day with all present values returns full completeness
Inputs:
| times | values | freq | keep_index |
|---|---|---|---|
| 2024-01-02T00:00:00Z | 5 | 12H | true |
| 2024-01-02T12:00:00Z | 7 |
Excel formula:
=COMPLETENESS_SCORE({"2024-01-02T00:00:00Z";"2024-01-02T12:00:00Z"}, {5;7}, "12H", TRUE)
Expected output:
| Result |
|---|
| 1 |
| 1 |
Example 3: Return daily index scores when keep index is false
Inputs:
| times | values | freq | keep_index |
|---|---|---|---|
| 2024-01-03T00:00:00Z | 1 | 6H | false |
| 2024-01-03T06:00:00Z | 1 | ||
| 2024-01-03T12:00:00Z | 1 | ||
| 2024-01-03T18:00:00Z | 1 | ||
| 2024-01-04T00:00:00Z | 2 | ||
| 2024-01-04T06:00:00Z | |||
| 2024-01-04T12:00:00Z | 2 | ||
| 2024-01-04T18:00:00Z | 2 |
Excel formula:
=COMPLETENESS_SCORE({"2024-01-03T00:00:00Z";"2024-01-03T06:00:00Z";"2024-01-03T12:00:00Z";"2024-01-03T18:00:00Z";"2024-01-04T00:00:00Z";"2024-01-04T06:00:00Z";"2024-01-04T12:00:00Z";"2024-01-04T18:00:00Z"}, {1;1;1;1;2;"";2;2}, "6H", FALSE)
Expected output:
| Result |
|---|
| 1 |
| 0.75 |
Example 4: Handle scalar timestamp and value inputs
Inputs:
| times | values | freq | keep_index |
|---|---|---|---|
| 2024-01-05T00:00:00Z | 10 | 1D | true |
Excel formula:
=COMPLETENESS_SCORE("2024-01-05T00:00:00Z", 10, "1D", TRUE)
Expected output:
1
Python Code
Show Code
import pandas as pd
from pvanalytics.quality.gaps import completeness_score as result_func
def completeness_score(times, values, freq=None, keep_index=True):
"""
Calculate a data completeness score for each day from a timestamped PV series.
See: https://pvanalytics.readthedocs.io/en/stable/generated/pvanalytics.quality.gaps.completeness_score.html
This example function is provided as-is without any representation of accuracy.
Args:
times (list[list]): Timestamps in ISO8601 format.
values (list[list]): Numeric data values corresponding to the times.
freq (str, optional): Expected interval between samples as a pandas frequency string (e.g. '15min'). If blank, it is inferred. Default is None.
keep_index (bool, optional): Whether to return values padded to align with the input time array. If True, replicates daily scores to input resolution. Default is True.
Returns:
list[list]: 2D list of completeness floats corresponding to each input value, or an error string.
"""
try:
def flatten_str(data):
if not isinstance(data, list): return [str(data)]
return [str(val) for row in data for val in (row if isinstance(row, list) else [row]) if val != ""]
def flatten_num(data):
if not isinstance(data, list): return [float(data)]
flat = []
for row in data:
row = row if isinstance(row, list) else [row]
for val in row:
if val == "": flat.append(float('nan'))
else: flat.append(float(val))
return flat
def unwrap_scalar(x):
if isinstance(x, list) and len(x) == 1 and isinstance(x[0], list) and len(x[0]) == 1:
return x[0][0]
return x
time_list = flatten_str(times)
val_list = flatten_num(values)
if len(time_list) != len(val_list):
return "Error: times and values must have the same length"
if len(time_list) == 0:
return "Error: input arrays cannot be empty"
dt_idx = pd.DatetimeIndex(time_list)
series = pd.Series(val_list, index=dt_idx)
freq_val = unwrap_scalar(freq)
keep_val = unwrap_scalar(keep_index)
f = str(freq_val) if freq_val is not None and str(freq_val).strip() != "" else None
keep = bool(keep_val) if keep_val is not None else True
# pvanalytics uses pandas.infer_freq, which requires at least 3 timestamps.
# For short series with an explicit freq, compute completeness manually.
if f is not None and len(time_list) < 3:
try:
td = pd.to_timedelta(f)
except Exception:
td = pd.to_timedelta(str(f))
if td <= pd.Timedelta(0):
return "Error: Invalid frequency"
df = pd.DataFrame({'value': series})
df['day'] = df.index.normalize()
scores = {}
for day, group in df.groupby('day'):
count = int(group['value'].notna().sum())
score = float(count * td / pd.Timedelta(days=1))
scores[day] = min(score, 1.0)
if keep:
return [[scores.get(day, "")] for day in df['day']]
else:
return [[scores.get(day, "")] for day in sorted(scores.keys())]
res = result_func(series, freq=f, keep_index=keep)
if keep:
return [[float(v) if not pd.isna(v) else ""] for v in res]
else:
# If not keep_index, result is indexed by day.
# Convert Series of daily values to an array
return [[float(v) if not pd.isna(v) else ""] for v in res]
except Exception as e:
return f"Error: {str(e)}"Online Calculator
PERF_RATIO_NREL
This function computes the NREL weather-corrected Performance Ratio (PR) of a PV system.
It aggregates incident irradiance, ambient temperature, wind speed, and AC power over the provided data ranges. The calculation uses the SAPM temperature model and a weighted reference temperature derived from the data to weather-correct the PR, allowing for more stable long-term performance tracking.
Excel Usage
=PERF_RATIO_NREL(poa_global, temp_air, wind_speed, pac, pdc_ref, a, b, delta_t, gamma_pdc)
poa_global(list[list], required): Total incident irradiance (W/m^2).temp_air(list[list], required): Ambient dry bulb temperature (C).wind_speed(list[list], required): Wind speed at 10 meters height (m/s).pac(list[list], required): AC power output (kW).pdc_ref(float, required): PV array STC nameplate rating (kW).a(float, optional, default: -3.56): SAPM temperature model parameter a (unitless).b(float, optional, default: -0.075): SAPM temperature model parameter b (s/m).delta_t(float, optional, default: 3): SAPM temperature model parameter deltaT (C).gamma_pdc(float, optional, default: -0.00433): Power temperature coefficient (1/C).
Returns (float): Weather-corrected Performance Ratio (float), or an error string.
Example 1: Calculate PR with nominal arrays
Inputs:
| poa_global | temp_air | wind_speed | pac | pdc_ref |
|---|---|---|---|---|
| 1000 | 25 | 2 | 9.5 | 10 |
| 800 | 20 | 1 | 7.6 | |
| 900 | 22 | 3 | 8.5 |
Excel formula:
=PERF_RATIO_NREL({1000;800;900}, {25;20;22}, {2;1;3}, {9.5;7.6;8.5}, 10)
Expected output:
0.948148
Example 2: Calculate PR with custom SAPM and temperature coefficient parameters
Inputs:
| poa_global | temp_air | wind_speed | pac | pdc_ref | a | b | delta_t | gamma_pdc |
|---|---|---|---|---|---|---|---|---|
| 950 | 30 | 1.5 | 8.9 | 10 | -3.7 | -0.08 | 2.5 | -0.004 |
| 850 | 28 | 2.5 | 8 | |||||
| 780 | 24 | 3 | 7.1 |
Excel formula:
=PERF_RATIO_NREL({950;850;780}, {30;28;24}, {1.5;2.5;3}, {8.9;8;7.1}, 10, -3.7, -0.08, 2.5, -0.004)
Expected output:
0.930233
Example 3: Handle scalar weather and power inputs
Inputs:
| poa_global | temp_air | wind_speed | pac | pdc_ref |
|---|---|---|---|---|
| 1000 | 25 | 2 | 9.2 | 10 |
Excel formula:
=PERF_RATIO_NREL(1000, 25, 2, 9.2, 10)
Expected output:
0.92
Example 4: Process row-vector arrays for PR calculation
Inputs:
| poa_global | temp_air | wind_speed | pac | pdc_ref | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 700 | 900 | 1000 | 18 | 22 | 27 | 2 | 2.5 | 1.5 | 6.1 | 8.2 | 9 | 10 |
Excel formula:
=PERF_RATIO_NREL({700,900,1000}, {18,22,27}, {2,2.5,1.5}, {6.1,8.2,9}, 10)
Expected output:
0.896154
Python Code
Show Code
import pandas as pd
from pvanalytics.metrics import performance_ratio_nrel as result_func
def perf_ratio_nrel(poa_global, temp_air, wind_speed, pac, pdc_ref, a=-3.56, b=-0.075, delta_t=3, gamma_pdc=-0.00433):
"""
Calculate the NREL weather-corrected performance ratio from irradiance, temperature, and power data.
See: https://pvanalytics.readthedocs.io/en/stable/generated/pvanalytics.metrics.performance_ratio_nrel.html
This example function is provided as-is without any representation of accuracy.
Args:
poa_global (list[list]): Total incident irradiance (W/m^2).
temp_air (list[list]): Ambient dry bulb temperature (C).
wind_speed (list[list]): Wind speed at 10 meters height (m/s).
pac (list[list]): AC power output (kW).
pdc_ref (float): PV array STC nameplate rating (kW).
a (float, optional): SAPM temperature model parameter a (unitless). Default is -3.56.
b (float, optional): SAPM temperature model parameter b (s/m). Default is -0.075.
delta_t (float, optional): SAPM temperature model parameter deltaT (C). Default is 3.
gamma_pdc (float, optional): Power temperature coefficient (1/C). Default is -0.00433.
Returns:
float: Weather-corrected Performance Ratio (float), or an error string.
"""
try:
def flatten(data):
if not isinstance(data, list):
return [float(data)]
flat = []
for row in data:
if isinstance(row, list):
for val in row:
if val != "":
flat.append(float(val))
elif row != "":
flat.append(float(row))
return flat
# Convert inputs to pandas Series for pvanalytics
poa_s = pd.Series(flatten(poa_global))
temp_s = pd.Series(flatten(temp_air))
wind_s = pd.Series(flatten(wind_speed))
pac_s = pd.Series(flatten(pac))
n = len(poa_s)
if not (len(temp_s) == n and len(wind_s) == n and len(pac_s) == n):
return "Error: Input ranges must have the same number of valid elements"
if n == 0:
return "Error: Input arrays cannot be empty"
pdc_ref_val = float(pdc_ref)
a_val = float(a) if a is not None else -3.56
b_val = float(b) if b is not None else -0.075
dT_val = float(delta_t) if delta_t is not None else 3.0
gamma_val = float(gamma_pdc) if gamma_pdc is not None else -0.00433
pr = result_func(
poa_s, temp_s, wind_s, pac_s, pdc_ref_val,
a=a_val, b=b_val, deltaT=dT_val, gamma_pdc=gamma_val
)
return float(pr)
except Exception as e:
return f"Error: {str(e)}"Online Calculator
SPACING
This function verifies the time intervals between consecutive timestamps in a series.
It returns True when the difference between one timestamp and the preceding one exactly matches the specified pandas frequency string (e.g., ‘15min’ or ‘1H’). This is used to identify irregular logging intervals or data gaps.
Excel Usage
=SPACING(times, freq)
times(list[list], required): Timestamps in ISO8601 format.freq(str, required): Expected interval between samples as a pandas frequency string (e.g. ‘15min’).
Returns (list[list]): 2D list of booleans (True if spacing matches), or an error string.
Example 1: Check regular 1-hour spacing
Inputs:
| times | freq |
|---|---|
| 2024-01-01T00:00:00Z | 1H |
| 2024-01-01T01:00:00Z | |
| 2024-01-01T02:00:00Z |
Excel formula:
=SPACING({"2024-01-01T00:00:00Z";"2024-01-01T01:00:00Z";"2024-01-01T02:00:00Z"}, "1H")
Expected output:
| Result |
|---|
| true |
| true |
| true |
Example 2: Identify mismatch in irregular timestamp spacing
Inputs:
| times | freq |
|---|---|
| 2024-01-01T00:00:00Z | 1H |
| 2024-01-01T01:00:00Z | |
| 2024-01-01T03:00:00Z |
Excel formula:
=SPACING({"2024-01-01T00:00:00Z";"2024-01-01T01:00:00Z";"2024-01-01T03:00:00Z"}, "1H")
Expected output:
| Result |
|---|
| true |
| true |
| false |
Example 3: Handle a scalar timestamp input
Inputs:
| times | freq |
|---|---|
| 2024-01-01T00:00:00Z | 1H |
Excel formula:
=SPACING("2024-01-01T00:00:00Z", "1H")
Expected output:
true
Example 4: Validate regular 15-minute intervals
Inputs:
| times | freq |
|---|---|
| 2024-01-01T00:00:00Z | 15min |
| 2024-01-01T00:15:00Z | |
| 2024-01-01T00:30:00Z | |
| 2024-01-01T00:45:00Z |
Excel formula:
=SPACING({"2024-01-01T00:00:00Z";"2024-01-01T00:15:00Z";"2024-01-01T00:30:00Z";"2024-01-01T00:45:00Z"}, "15min")
Expected output:
| Result |
|---|
| true |
| true |
| true |
| true |
Python Code
Show Code
import pandas as pd
from pvanalytics.quality.time import spacing as result_func
def spacing(times, freq):
"""
Check that the spacing between timestamps conforms to an expected frequency.
See: https://pvanalytics.readthedocs.io/en/stable/generated/pvanalytics.quality.time.spacing.html
This example function is provided as-is without any representation of accuracy.
Args:
times (list[list]): Timestamps in ISO8601 format.
freq (str): Expected interval between samples as a pandas frequency string (e.g. '15min').
Returns:
list[list]: 2D list of booleans (True if spacing matches), or an error string.
"""
try:
def flatten_str(data):
if not isinstance(data, list): return [str(data)]
return [str(val) for row in data for val in (row if isinstance(row, list) else [row]) if val != ""]
time_list = flatten_str(times)
if len(time_list) == 0:
return "Error: input array cannot be empty"
dt_idx = pd.DatetimeIndex(time_list)
f = str(freq).strip()
if not f:
return "Error: freq must be provided"
res = result_func(dt_idx, freq=f)
return [[bool(v)] for v in res]
except Exception as e:
return f"Error: {str(e)}"Online Calculator
STALE_VALUES_ROUND
This function detects “stale” or repeating sensor values that occur when an instrument becomes stuck.
A point is marked as stale if it belongs to a contiguous sequence (of length at least window) of readings that are identical when rounded to the specified number of decimal places.
Excel Usage
=STALE_VALUES_ROUND(x, window, decimals, mark)
x(list[list], required): Numeric data values.window(int, optional, default: 6): Number of consecutive identical elements to be considered stale.decimals(int, optional, default: 3): Number of decimal places to round to before comparing.mark(str, optional, default: “tail”): How much of the sequence to mark True. One of ‘tail’, ‘end’, or ‘all’.
Returns (list[list]): 2D list of booleans (True if stale), or an error string.
Example 1: Detect repeating values over a window
Inputs:
| x | window | decimals | mark |
|---|---|---|---|
| 1.001 | 3 | 2 | all |
| 1.001 | |||
| 1.001 | |||
| 1.001 | |||
| 2 |
Excel formula:
=STALE_VALUES_ROUND({1.001;1.001;1.001;1.001;2}, 3, 2, "all")
Expected output:
| Result |
|---|
| true |
| true |
| true |
| true |
| false |
Example 2: Detect stale values with tail marking mode
Inputs:
| x | window | decimals | mark |
|---|---|---|---|
| 5 | 3 | 0 | tail |
| 5 | |||
| 5 | |||
| 6 |
Excel formula:
=STALE_VALUES_ROUND({5;5;5;6}, 3, 0, "tail")
Expected output:
| Result |
|---|
| false |
| true |
| true |
| false |
Example 3: Detect stale values with end marking mode
Inputs:
| x | window | decimals | mark |
|---|---|---|---|
| 2.2 | 3 | 1 | end |
| 2.2 | |||
| 2.2 | |||
| 2.2 | |||
| 2.2 |
Excel formula:
=STALE_VALUES_ROUND({2.2;2.2;2.2;2.2;2.2}, 3, 1, "end")
Expected output:
| Result |
|---|
| false |
| false |
| true |
| true |
| true |
Example 4: Handle scalar stale-value input
Inputs:
| x | window | decimals | mark |
|---|---|---|---|
| 7.5 | 3 | 2 | all |
Excel formula:
=STALE_VALUES_ROUND(7.5, 3, 2, "all")
Expected output:
false
Python Code
Show Code
import pandas as pd
from pvanalytics.quality.gaps import stale_values_round as result_func
def stale_values_round(x, window=6, decimals=3, mark='tail'):
"""
Identify stale or stuck-sensor values by rounding and checking for repeating sequences.
See: https://pvanalytics.readthedocs.io/en/stable/generated/pvanalytics.quality.gaps.stale_values_round.html
This example function is provided as-is without any representation of accuracy.
Args:
x (list[list]): Numeric data values.
window (int, optional): Number of consecutive identical elements to be considered stale. Default is 6.
decimals (int, optional): Number of decimal places to round to before comparing. Default is 3.
mark (str, optional): How much of the sequence to mark True. One of 'tail', 'end', or 'all'. Valid options: Tail, End, All. Default is 'tail'.
Returns:
list[list]: 2D list of booleans (True if stale), or an error string.
"""
try:
def flatten_num(data):
if not isinstance(data, list): return [float(data)]
flat = []
for row in data:
row = row if isinstance(row, list) else [row]
for val in row:
if val == "": flat.append(float('nan'))
else: flat.append(float(val))
return flat
val_list = flatten_num(x)
if len(val_list) == 0:
return "Error: input array cannot be empty"
series = pd.Series(val_list)
w = int(window) if window is not None else 6
dec = int(decimals) if decimals is not None else 3
m = str(mark) if mark is not None else "tail"
if m not in ["tail", "end", "all"]:
return "Error: mark must be 'tail', 'end', or 'all'"
res = result_func(series, window=w, decimals=dec, mark=m)
return [[bool(v)] for v in res]
except Exception as e:
return f"Error: {str(e)}"Online Calculator
TRIM_INCOMPLETE
This function trims a data series by marking early and late periods of low data completeness for exclusion.
It computes a rolling completeness score and returns a boolean array (True meaning the data point is kept) with False values up to the first complete day, True between the first and last complete days, and False thereafter.
Excel Usage
=TRIM_INCOMPLETE(times, values, minimum_completeness, days, freq)
times(list[list], required): Timestamps in ISO8601 format.values(list[list], required): Numeric data values corresponding to the times.minimum_completeness(float, optional, default: 0.333333): The minimum required daily completeness score.days(int, optional, default: 10): The consecutive complete days needed before transitioning.freq(str, optional, default: null): Expected interval between samples as a pandas frequency string (e.g. ‘15min’). If blank, it is inferred.
Returns (list[list]): 2D list of booleans (True to keep, False to trim), or an error string.
Example 1: Identifies trailing gaps based on completeness
Inputs:
| times | values | minimum_completeness | days | freq |
|---|---|---|---|---|
| 2024-01-01T00:00:00Z | 10 | 0.5 | 1 | 1D |
| 2024-01-02T00:00:00Z | 15 | |||
| 2024-01-03T00:00:00Z | 20 | |||
| 2024-01-04T00:00:00Z |
Excel formula:
=TRIM_INCOMPLETE({"2024-01-01T00:00:00Z";"2024-01-02T00:00:00Z";"2024-01-03T00:00:00Z";"2024-01-04T00:00:00Z"}, {10;15;20;""}, 0.5, 1, "1D")
Expected output:
| Result |
|---|
| true |
| true |
| true |
| false |
Example 2: Keep all values when completeness threshold is always met
Inputs:
| times | values | minimum_completeness | days | freq |
|---|---|---|---|---|
| 2024-01-05T00:00:00Z | 10 | 1 | 1 | 1D |
| 2024-01-06T00:00:00Z | 11 | |||
| 2024-01-07T00:00:00Z | 12 |
Excel formula:
=TRIM_INCOMPLETE({"2024-01-05T00:00:00Z";"2024-01-06T00:00:00Z";"2024-01-07T00:00:00Z"}, {10;11;12}, 1, 1, "1D")
Expected output:
| Result |
|---|
| true |
| true |
| true |
Example 3: Require multiple complete days before keep window starts
Inputs:
| times | values | minimum_completeness | days | freq |
|---|---|---|---|---|
| 2024-01-08T00:00:00Z | 0.5 | 2 | 1D | |
| 2024-01-09T00:00:00Z | 10 | |||
| 2024-01-10T00:00:00Z | 10 | |||
| 2024-01-11T00:00:00Z | 10 |
Excel formula:
=TRIM_INCOMPLETE({"2024-01-08T00:00:00Z";"2024-01-09T00:00:00Z";"2024-01-10T00:00:00Z";"2024-01-11T00:00:00Z"}, {"";10;10;10}, 0.5, 2, "1D")
Expected output:
| Result |
|---|
| false |
| true |
| true |
| true |
Example 4: Handle scalar time and value inputs
Inputs:
| times | values | minimum_completeness | days | freq |
|---|---|---|---|---|
| 2024-01-12T00:00:00Z | 9 | 0.5 | 1 | 1D |
Excel formula:
=TRIM_INCOMPLETE("2024-01-12T00:00:00Z", 9, 0.5, 1, "1D")
Expected output:
true
Python Code
Show Code
import pandas as pd
from pvanalytics.quality.gaps import trim_incomplete as result_func
from pvanalytics.quality.gaps import start_stop_dates
def trim_incomplete(times, values, minimum_completeness=0.333333, days=10, freq=None):
"""
Return a boolean mask to trim leading and trailing low-completeness periods from a PV series.
See: https://pvanalytics.readthedocs.io/en/stable/generated/pvanalytics.quality.gaps.trim_incomplete.html
This example function is provided as-is without any representation of accuracy.
Args:
times (list[list]): Timestamps in ISO8601 format.
values (list[list]): Numeric data values corresponding to the times.
minimum_completeness (float, optional): The minimum required daily completeness score. Default is 0.333333.
days (int, optional): The consecutive complete days needed before transitioning. Default is 10.
freq (str, optional): Expected interval between samples as a pandas frequency string (e.g. '15min'). If blank, it is inferred. Default is None.
Returns:
list[list]: 2D list of booleans (True to keep, False to trim), or an error string.
"""
try:
def flatten_str(data):
if not isinstance(data, list): return [str(data)]
return [str(val) for row in data for val in (row if isinstance(row, list) else [row]) if val != ""]
def flatten_num(data):
if not isinstance(data, list): return [float(data)]
flat = []
for row in data:
row = row if isinstance(row, list) else [row]
for val in row:
if val == "": flat.append(float('nan'))
else: flat.append(float(val))
return flat
time_list = flatten_str(times)
val_list = flatten_num(values)
if len(time_list) != len(val_list):
return "Error: times and values must have the same length"
if len(time_list) == 0:
return "Error: input arrays cannot be empty"
dt_idx = pd.DatetimeIndex(time_list)
series = pd.Series(val_list, index=dt_idx)
min_comp = float(minimum_completeness) if minimum_completeness is not None else 0.333333
d = int(days) if days is not None else 10
f = str(freq) if freq is not None and str(freq).strip() != "" else None
# pvanalytics requires at least 3 timestamps to infer frequency even if a
# frequency is provided. For short series, compute the completeness/trim
# logic ourselves to avoid this limitation.
if len(series.index) < 3:
if f is None:
return "Error: Need at least 3 dates to infer frequency"
try:
offset = pd.tseries.frequencies.to_offset(f)
seconds_per_sample = float(offset.delta.total_seconds())
except Exception:
return "Error: Invalid freq"
daily_counts = series.resample('D').count()
daily_completeness = (daily_counts * seconds_per_sample) / (1440 * 60)
daily_good = daily_completeness >= min_comp
start, end = start_stop_dates(daily_good, days=d)
mask = pd.Series(False, index=series.index)
if start:
mask.loc[start:end] = True
return [[bool(v)] for v in mask]
# For longer series, defer to the library implementation.
res = result_func(series, minimum_completeness=min_comp, days=d, freq=f)
return [[bool(v)] for v in res]
except Exception as e:
return f"Error: {str(e)}"Online Calculator