Let op: dit experiment is nog niet Codex-gevalideerd. Gebruik de bevindingen als voorlopige aanwijzingen.

Hypotheses

Experiment Log: FAMILY_VEGETATION_PEAK_FORECASTING

FAMILY_VEGETATION_PEAK_FORECASTING

Testing vegetation peak timing forecasting using real Sentinel-2 satellite data from Zarr stores to predict potato price movements based on growing season vegetation patterns.

Laatste update
2025-12-01
Repo-pad
hypotheses/FAMILY_VEGETATION_PEAK_FORECASTING
Codex-bestand
Aanwezig

Experimentnotities

Experiment Log: FAMILY_VEGETATION_PEAK_FORECASTING

Overview

Testing vegetation peak timing forecasting using real Sentinel-2 satellite data from Zarr stores to predict potato price movements based on growing season vegetation patterns.

Family ID: FAMILY_VEGETATION_PEAK_FORECASTING Created: 2025-08-20 Status: Active Development

Data Sources

  • Price Data: belgian_potato_prices_verified.csv (Belgian potato prices, monthly, 2020-2021)
  • Satellite Data: lake_31UFU_test10.zarr (Real Sentinel-2 data with B02, B03, B04, B08, SCL bands)
  • Coverage: Netherlands/Belgium agricultural region
  • Temporal: Multiple growing seasons available in Zarr data

Variants

Variant A: NDVI Peak Timing Analysis

Hypothesis: Peak NDVI date during May-August predicts harvest pressure and pricing patterns.

Features: - ndvi_peak_date_doy: Day of year when NDVI reaches maximum - ndvi_peak_intensity: Maximum NDVI value during growing season - days_to_peak: Days from season start (May 1) to peak - peak_sharpness: Concentration measure of NDVI peak - pre_peak_stress: Early season stress indicators

Model: RandomForestRegressor

Variant B: EVI Growing Season Intensity

Hypothesis: Maximum EVI and growing season intensity predict quality premiums.

Features: - evi_max: Maximum EVI during growing season (May-August) - evi_growing_season_integral: Total EVI accumulation over growing season - evi_peak_date_doy: Day of year for EVI peak - evi_variability: Within-season EVI variance - evi_late_season_decline: August-September EVI decline rate

Model: GradientBoostingRegressor

Variant C: Peak-to-Harvest Lag Analysis

Hypothesis: Combined NDVI/EVI peak timing provides harvest timing intelligence and price impact prediction.

Features: - ndvi_peak_date_doy: NDVI peak timing - evi_peak_date_doy: EVI peak timing
- peak_convergence: Temporal distance between NDVI and EVI peaks - estimated_harvest_date: Harvest date estimate based on peak timing + historical lags - harvest_timing_score: Early/optimal/late harvest classification - seasonal_position: Relative position of peaks within typical growing season

Model: Ensemble (RandomForest + GradientBoosting)

Implementation Approach

Vegetation Processing Pipeline

  1. Load Zarr Data: Access real Sentinel-2 bands (B02, B03, B04, B08, SCL)
  2. Cloud Masking: Use SCL layer to exclude cloudy pixels (SCL >= 8)
  3. Calculate Indices:
  4. NDVI = (B08 - B04) / (B08 + B04)
  5. EVI = 2.5 * (B08 - B04) / (B08 + 6B04 - 7.5B02 + 1)
  6. Spatial Aggregation: Regional mean over agricultural areas
  7. Temporal Processing: Extract May-August growing season data
  8. Peak Detection: Identify peak dates and characteristics

Baseline Validation

All experiments MUST validate against the 4 mandatory corrected baselines: - persistent: Current price predicts future (corrected implementation) - seasonal_naive: Same period previous year (52-week lag) - ar2: Autoregressive order 2 with trend - **historical_mean: Average of all historical values (alias for persistent)

Evaluation Framework

  • Rolling-origin CV: Minimum 52 weeks training, 4-week steps
  • Horizons: 4, 8, 12 weeks (focus on 8-12 for optimal performance)
  • Statistical Tests: DM, HLN, TOST, FDR correction
  • SESOI: 5% minimum improvement threshold

Expected Results

Based on registry findings showing optimal performance at longer horizons: - 4-week horizon: Moderate improvement (10-15%) - short-term harvest impacts - 8-week horizon: Strong improvement (25-35%) - optimal seasonal signal - 12-week horizon: Maximum improvement (40-50%) - full seasonal advantage

Experiment Runs

Results will be appended here as experiments complete


Status: Ready for implementation and testing Next Actions: 1. Implement VegetationPeakProcessor with real Zarr data 2. Create peak detection algorithms
3. Run all three variants at multiple horizons 4. Validate against corrected baselines 5. Document actual performance achieved

Codex validatie

Codex Validation — 2025-11-10

Files Reviewed

  • hypothesis.yml
  • hypothesis.md
  • experiment.md
  • config/*.yaml

Findings

  1. No executable pipeline. The family provides only configuration documents; there is no run.py, dataset builder, or notebook.
  2. Real-data claims unverified. With no code, we cannot confirm that Sentinel/BRP/price feeds were actually pulled.
  3. Baseline comparison absent. No experiment logs or MLflow runs exist, so there is no evidence that the vegetation-peak features outperform a price-only baseline.

Verdict

NOT VALIDATED – This family has not progressed beyond planning. It remains unvalidated until real data is ingested and statistically significant improvements over the standard baselines are demonstrated.