Experiment Log: FAMILY_VEGETATION_PEAK_FORECASTING

Overview

Testing vegetation peak timing forecasting using real Sentinel-2 satellite data from Zarr stores to predict potato price movements based on growing season vegetation patterns.

Family ID: FAMILY_VEGETATION_PEAK_FORECASTING Created: 2025-08-20 Status: Active Development

Data Sources

Price Data: belgian_potato_prices_verified.csv (Belgian potato prices, monthly, 2020-2021)
Satellite Data: lake_31UFU_test10.zarr (Real Sentinel-2 data with B02, B03, B04, B08, SCL bands)
Coverage: Netherlands/Belgium agricultural region
Temporal: Multiple growing seasons available in Zarr data

Variants

Variant A: NDVI Peak Timing Analysis

Hypothesis: Peak NDVI date during May-August predicts harvest pressure and pricing patterns.

Features: - ndvi_peak_date_doy: Day of year when NDVI reaches maximum - ndvi_peak_intensity: Maximum NDVI value during growing season - days_to_peak: Days from season start (May 1) to peak - peak_sharpness: Concentration measure of NDVI peak - pre_peak_stress: Early season stress indicators

Model: RandomForestRegressor

Variant B: EVI Growing Season Intensity

Hypothesis: Maximum EVI and growing season intensity predict quality premiums.

Features: - evi_max: Maximum EVI during growing season (May-August) - evi_growing_season_integral: Total EVI accumulation over growing season - evi_peak_date_doy: Day of year for EVI peak - evi_variability: Within-season EVI variance - evi_late_season_decline: August-September EVI decline rate

Model: GradientBoostingRegressor

Variant C: Peak-to-Harvest Lag Analysis

Hypothesis: Combined NDVI/EVI peak timing provides harvest timing intelligence and price impact prediction.

Features: - ndvi_peak_date_doy: NDVI peak timing - evi_peak_date_doy: EVI peak timing
- peak_convergence: Temporal distance between NDVI and EVI peaks - estimated_harvest_date: Harvest date estimate based on peak timing + historical lags - harvest_timing_score: Early/optimal/late harvest classification - seasonal_position: Relative position of peaks within typical growing season

Model: Ensemble (RandomForest + GradientBoosting)

Implementation Approach

Vegetation Processing Pipeline

Load Zarr Data: Access real Sentinel-2 bands (B02, B03, B04, B08, SCL)
Cloud Masking: Use SCL layer to exclude cloudy pixels (SCL >= 8)
Calculate Indices:
NDVI = (B08 - B04) / (B08 + B04)
EVI = 2.5 * (B08 - B04) / (B08 + 6B04 - 7.5B02 + 1)
Spatial Aggregation: Regional mean over agricultural areas
Temporal Processing: Extract May-August growing season data
Peak Detection: Identify peak dates and characteristics

Baseline Validation

All experiments MUST validate against the 4 mandatory corrected baselines: - persistent: Current price predicts future (corrected implementation) - seasonal_naive: Same period previous year (52-week lag) - ar2: Autoregressive order 2 with trend - **historical_mean: Average of all historical values (alias for persistent)

Evaluation Framework

Rolling-origin CV: Minimum 52 weeks training, 4-week steps
Horizons: 4, 8, 12 weeks (focus on 8-12 for optimal performance)
Statistical Tests: DM, HLN, TOST, FDR correction
SESOI: 5% minimum improvement threshold

Expected Results

Based on registry findings showing optimal performance at longer horizons: - 4-week horizon: Moderate improvement (10-15%) - short-term harvest impacts - 8-week horizon: Strong improvement (25-35%) - optimal seasonal signal - 12-week horizon: Maximum improvement (40-50%) - full seasonal advantage

Experiment Runs

Results will be appended here as experiments complete

Status: Ready for implementation and testing Next Actions: 1. Implement VegetationPeakProcessor with real Zarr data 2. Create peak detection algorithms
3. Run all three variants at multiple horizons 4. Validate against corrected baselines 5. Document actual performance achieved

Experiment Log: FAMILY_VEGETATION_PEAK_FORECASTING

Experimentnotities

Experiment Log: FAMILY_VEGETATION_PEAK_FORECASTING

Overview

Data Sources

Variants

Variant A: NDVI Peak Timing Analysis

Variant B: EVI Growing Season Intensity

Variant C: Peak-to-Harvest Lag Analysis

Implementation Approach

Vegetation Processing Pipeline

Baseline Validation

Evaluation Framework

Expected Results

Experiment Runs

Codex validatie

Codex Validation — 2025-11-10

Files Reviewed

Findings

Verdict