Hypotheses
Experiment Log: FAMILY_SATELLITE_PRICE_PREDICTION
FAMILY_SATELLITE_PRICE_PREDICTION
**Status**: PENDING VALIDATION **Created**: 2025-08-23 **Hypothesis**: Satellite vegetation indices provide 12-16 week price forecast advantage through early planting and stress detection
Experimentnotities
Experiment Log: FAMILY_SATELLITE_PRICE_PREDICTION
Family Overview
Status: PENDING VALIDATION
Created: 2025-08-23
Hypothesis: Satellite vegetation indices provide 12-16 week price forecast advantage through early planting and stress detection
Prior Evidence Base
Validated Results (from VALIDATION_REPORT.md)
- 69.4% improvement at 4-week horizon (satellite_enhanced_model.py)
- 40.5% improvement at 16-week horizon (integrated_satellite_baseline_model.py)
- Uses REAL Sentinel-2 data from Zarr stores
- All 4 standard baselines (persistent, seasonal_naive, ar2, historical_mean) implemented correctly
Critical Issues Identified
- Small sample size: Only 157-268 weekly samples after merging
- Missing 2020 data: Gap in satellite time series
- Negative R²: All models show poor absolute fit (-3.5 to -25.9)
- No statistical tests: Missing DM, HLN, TOST validation
- BRP mask failures: Parcel boundary application errors
Experimental Plan
Phase 1: Data Validation
- [ ] Verify Zarr store accessibility and content
- [ ] Test BRP mask generation for all years
- [ ] Quantify data overlap between satellite and prices
- [ ] Add 2020 data if possible
Phase 2: Variant Implementation
- [ ] Variant a: NDVI-only baseline (15% SESOI)
- [ ] Variant b: Multi-index ensemble (25% SESOI)
- [ ] Variant c: Integrated features (35% SESOI)
Phase 3: Statistical Validation
- [ ] Diebold-Mariano tests vs all baselines
- [ ] Harvey-Leybourne-Newbold correction
- [ ] TOST equivalence testing
- [ ] FDR correction for multiple comparisons
Phase 4: Regime Analysis
- [ ] Early planting detection (2022, 2025)
- [ ] High volatility period performance
- [ ] Growing vs storage season differences
Data Source Verification
Sentinel-2 Zarr Store
# Path: lake_31UFU_medium.zarr
# Scenes: 850+
# Date range: 2015-07-06 to 2023-08-05
# Missing: 2020
# Bands: B02-B12, SCL
BRP Parcel Data
# Interface: BRPApi().get_consumption_potato_mask()
# Years: 2015-2019, 2021-2023
# Crop code: 2014 (consumption potatoes)
Price Data
# Interface: BoerderijApi().get_data(product_id="NL.157.2086")
# Frequency: Weekly
# Range: 2000-2023
# Records: 633
Notes on Methodology
Critical Requirements (per SOP)
- USE ONLY REAL DATA - No synthetic/mock data allowed ✅
- All 4 standard baselines required - persistent, seasonal_naive, ar2, historical_mean ✅
- Compare against strongest baseline - Report lowest MAE baseline
- Statistical significance required - DM test with HLN correction
- MLflow logging mandatory - Track all experiments
Expected Performance Benchmarks
- Validated 7.6% baseline (from hypothesis_registry.md): LightGBM at 12-week horizon
- Target improvement: 15-35% depending on variant
- Critical horizon: 12 weeks (optimal for quarterly planning)
Experiment Results
To be populated after implementation
Decision Log
To be populated after experiments complete
Codex validatie
Codex Validation — 2025-11-10
Files Reviewed
run.pyexperiment.mdhypothesis.yml
Findings
- Planning only. The family outlines data validation steps but has no recorded runs;
experiment.mdstates “To be populated after implementation.” - Real-data usage not demonstrated. Without execution, we cannot confirm that Sentinel/BRP/Boerderij feeds were processed.
- No baseline comparison. There are no metrics or statistical tests showing that the satellite features beat the price-only baselines.
Verdict
NOT VALIDATED – Until the code is executed with real data and produces statistically significant gains over the mandatory baselines, this family remains unvalidated.