Experiment Log: FAMILY_SATELLITE_PRICE_PREDICTION

Family Overview

Status: PENDING VALIDATION
Created: 2025-08-23
Hypothesis: Satellite vegetation indices provide 12-16 week price forecast advantage through early planting and stress detection

Prior Evidence Base

Validated Results (from VALIDATION_REPORT.md)

69.4% improvement at 4-week horizon (satellite_enhanced_model.py)
40.5% improvement at 16-week horizon (integrated_satellite_baseline_model.py)
Uses REAL Sentinel-2 data from Zarr stores
All 4 standard baselines (persistent, seasonal_naive, ar2, historical_mean) implemented correctly

Critical Issues Identified

Small sample size: Only 157-268 weekly samples after merging
Missing 2020 data: Gap in satellite time series
Negative R²: All models show poor absolute fit (-3.5 to -25.9)
No statistical tests: Missing DM, HLN, TOST validation
BRP mask failures: Parcel boundary application errors

Experimental Plan

Phase 1: Data Validation

[ ] Verify Zarr store accessibility and content
[ ] Test BRP mask generation for all years
[ ] Quantify data overlap between satellite and prices
[ ] Add 2020 data if possible

Phase 2: Variant Implementation

[ ] Variant a: NDVI-only baseline (15% SESOI)
[ ] Variant b: Multi-index ensemble (25% SESOI)
[ ] Variant c: Integrated features (35% SESOI)

Phase 3: Statistical Validation

[ ] Diebold-Mariano tests vs all baselines
[ ] Harvey-Leybourne-Newbold correction
[ ] TOST equivalence testing
[ ] FDR correction for multiple comparisons

Phase 4: Regime Analysis

[ ] Early planting detection (2022, 2025)
[ ] High volatility period performance
[ ] Growing vs storage season differences

Data Source Verification

Sentinel-2 Zarr Store

# Path: lake_31UFU_medium.zarr
# Scenes: 850+
# Date range: 2015-07-06 to 2023-08-05
# Missing: 2020
# Bands: B02-B12, SCL

BRP Parcel Data

# Interface: BRPApi().get_consumption_potato_mask()
# Years: 2015-2019, 2021-2023
# Crop code: 2014 (consumption potatoes)

Price Data

# Interface: BoerderijApi().get_data(product_id="NL.157.2086")
# Frequency: Weekly
# Range: 2000-2023
# Records: 633

Notes on Methodology

Critical Requirements (per SOP)

USE ONLY REAL DATA - No synthetic/mock data allowed ✅
All 4 standard baselines required - persistent, seasonal_naive, ar2, historical_mean ✅
Compare against strongest baseline - Report lowest MAE baseline
Statistical significance required - DM test with HLN correction
MLflow logging mandatory - Track all experiments

Expected Performance Benchmarks

Validated 7.6% baseline (from hypothesis_registry.md): LightGBM at 12-week horizon
Target improvement: 15-35% depending on variant
Critical horizon: 12 weeks (optimal for quarterly planning)

Experiment Results

To be populated after implementation

Decision Log

To be populated after experiments complete

Codex Validation — 2025-11-10

Files Reviewed

run.py
experiment.md
hypothesis.yml

Findings

Planning only. The family outlines data validation steps but has no recorded runs; experiment.md states “To be populated after implementation.”
Real-data usage not demonstrated. Without execution, we cannot confirm that Sentinel/BRP/Boerderij feeds were processed.
No baseline comparison. There are no metrics or statistical tests showing that the satellite features beat the price-only baselines.

Verdict

NOT VALIDATED – Until the code is executed with real data and produces statistically significant gains over the mandatory baselines, this family remains unvalidated.

Experiment Log: FAMILY_SATELLITE_PRICE_PREDICTION

Experimentnotities

Experiment Log: FAMILY_SATELLITE_PRICE_PREDICTION

Family Overview

Prior Evidence Base

Validated Results (from VALIDATION_REPORT.md)

Critical Issues Identified

Experimental Plan

Phase 1: Data Validation

Phase 2: Variant Implementation

Phase 3: Statistical Validation

Phase 4: Regime Analysis

Data Source Verification

Sentinel-2 Zarr Store

BRP Parcel Data

Price Data

Notes on Methodology

Critical Requirements (per SOP)

Expected Performance Benchmarks

Experiment Results

Decision Log

Codex validatie

Codex Validation — 2025-11-10

Files Reviewed

Findings

Verdict