Experiments

FAMILY_SATELLITE_ANOMALY_DETECTION - Experimental Design

FAMILY_SATELLITE_ANOMALY_DETECTION

**Core Hypothesis**: While normal vegetation variation does not predict continuous potato prices (as proven by NDVI and multi-spectral failures), **extreme satellite-visible anomalies** that represent catastrophic events (droughts, floods, frost) do precede major price spikes. By focusing on binary classification of extreme events rather than continuous regression, satellite data can provide early warning of price disruptions.

Laatste update
2025-12-01
Repo-pad
experiments/FAMILY_SATELLITE_ANOMALY_DETECTION
Codex-bestand
Aanwezig

Experimentnotities

FAMILY_SATELLITE_ANOMALY_DETECTION - Experimental Design

Hypothesis

Core Hypothesis: While normal vegetation variation does not predict continuous potato prices (as proven by NDVI and multi-spectral failures), extreme satellite-visible anomalies that represent catastrophic events (droughts, floods, frost) do precede major price spikes. By focusing on binary classification of extreme events rather than continuous regression, satellite data can provide early warning of price disruptions.

Key Innovation: From Regression to Anomaly Detection

Previous Failed Approaches

  • FAMILY_SATELLITE_NDVI_REAL: -3.0% (tried continuous price prediction)
  • FAMILY_MULTI_SPECTRAL_INDICES: -18.9% (tried multi-index ensemble)
  • Common failure: Weak continuous signal overwhelmed by price persistence

New Approach: Binary Classification

Instead of predicting exact prices, classify periods as: - NORMAL: Price change < 10% over next 4 weeks - EXTREME: Price change > 20% over next 4 weeks

This dramatically simplifies the problem and focuses on what matters for risk management.

Historical Context: Known Extreme Events

Verified Price Spikes in Dutch Potato Market

  1. 2018 Drought: Prices jumped from €15 to €50+/100kg
  2. 2022 Heat Wave: Prices reached €60+/100kg
  3. 2016 Flooding: Regional production losses
  4. Late Frost Events: Sudden supply shocks

These events ARE visible from space as extreme NDVI drops, unusual spectral signatures, or extended cloud cover patterns.

Experimental Design

Data Sources (100% REAL)

1. Satellite Data

  • Primary: /Users/sethvanderbijl/PitchAI Code/potato_supply/lake_31UFU_medium_polder.zarr
  • Bands: B02-B12, SCL for cloud detection
  • Period: 2015-2024 (capturing multiple extreme events)
  • Processing: Weekly aggregation with anomaly detection

2. Price Data

  • Source: BoerderijApi (product_id="NL.157.2086")
  • Period: 2015-2024 weekly prices
  • Labels: Binary classification based on 4-week forward returns

3. Parcel Data

  • Source: BRPApi for consumption potato masks
  • Years: 2015-2024
  • Purpose: Focus analysis on actual potato growing areas

Feature Engineering

Anomaly Detection Features

  1. Z-Score Anomalies
  2. NDVI z-score relative to 3-year rolling mean
  3. Identify values > 2 standard deviations from normal

  4. Temporal Derivatives

  5. Week-over-week NDVI change rate
  6. Sudden drops indicate acute stress

  7. Spatial Coherence

  8. Percentage of parcels showing stress
  9. Widespread vs localized anomalies

  10. Multi-Index Divergence

  11. NDVI-NDWI divergence (drought signature)
  12. NDVI-thermal decoupling (heat stress)

  13. Cloud Patterns

  14. Extended cloud cover duration (storm proxy)
  15. Sudden clear-to-cloudy transitions

Temporal Windows

  • Lead time: 2-4 weeks before price spike
  • Aggregation: 2-week rolling windows
  • Seasonality: Growing season (Apr-Oct) focus

Model Architecture

Variant A: Threshold-Based Anomaly Detection

  • Simple z-score thresholds calibrated on 2018 drought
  • If NDVI z-score < -2 AND coverage > 50% parcels → EXTREME
  • Fast, interpretable, deployable

Variant B: Machine Learning Classification

  • Random Forest binary classifier
  • Features: All anomaly metrics
  • Class balancing: SMOTE for rare extreme events
  • Cross-validation: Time-series aware splits

Variant C: Ensemble Anomaly Scoring

  • Isolation Forest for unsupervised anomaly detection
  • Local Outlier Factor for density-based anomalies
  • One-Class SVM for boundary detection
  • Ensemble voting for final classification

Success Metrics

Primary Metric: Precision for EXTREME Class

  • Target: 70% precision (7 out of 10 alerts are real)
  • Rationale: Better to have high precision than high recall
  • Business value: Avoid false alarms that erode trust

Secondary Metrics

  • Recall: Aim for 50% (catch half of extreme events)
  • Lead Time: Average days before price spike
  • F1 Score: Balance of precision and recall

Business Metrics

  • Value per Alert: €50,000 for 1000-ton operation
  • Cost of False Positive: €5,000 (unnecessary hedging)
  • Cost of False Negative: €100,000 (unhedged exposure)

Validation Strategy

Known Events Testing

  1. 2018 Drought: Must detect by June (prices spiked August)
  2. 2022 Heat Wave: Must detect by July
  3. 2020 COVID Shock: Should NOT trigger (not weather-related)

Cross-Validation

  • Training: 2015-2020 (including 2018 drought)
  • Validation: 2021-2022 (including 2022 heat wave)
  • Test: 2023-2024 (out-of-sample performance)

Baseline Comparisons

  1. Random Classifier: ~10% precision expected
  2. Always-Normal: 0% recall but no false positives
  3. Weather-Only: Temperature/precipitation thresholds
  4. Price Momentum: Using price volatility alone

Implementation Plan

Phase 1: Data Preparation (Week 1)

  1. Extract satellite data for all growing seasons
  2. Calculate weekly anomaly metrics
  3. Label price data with NORMAL/EXTREME classes
  4. Handle class imbalance

Phase 2: Model Development (Week 2)

  1. Implement Variant A (threshold-based)
  2. Train Variant B (ML classifier)
  3. Build Variant C (ensemble)
  4. Optimize hyperparameters

Phase 3: Validation (Week 3)

  1. Test on known extreme events
  2. Calculate business metrics
  3. Generate alert timeline
  4. Document findings

Risk Mitigation

Technical Risks

  • Cloud cover: Use thermal bands or radar as backup
  • Data gaps: Interpolate missing weeks
  • Overfitting: Strict time-series cross-validation

Business Risks

  • Alert fatigue: Limit to max 3 alerts per season
  • Regional bias: Validate across multiple growing regions
  • Market regime changes: Regular model retraining

Expected Outcomes

Success Scenario (70% likely)

  • 70% precision on extreme event detection
  • 2-3 week average lead time
  • €150,000 annual value for 1000-ton operation

Partial Success (20% likely)

  • 50% precision but high recall
  • Useful as one signal among many
  • Requires human validation

Failure Scenario (10% likely)

  • Cannot distinguish weather extremes from market extremes
  • Too many false positives
  • Pivot to pure weather-based approach

Key Differentiators from Failed Experiments

  1. Binary vs Continuous: Simplifies signal detection
  2. Extremes vs Normals: Focuses on strong signals only
  3. Anomaly vs Absolute: Relative changes more robust
  4. Multi-Modal: Combines spectral, temporal, spatial
  5. Business-Aligned: Precision over accuracy

Conclusion

This experiment pivots from failed continuous prediction to focused anomaly detection. By targeting only extreme events with major price impact, we dramatically improve the signal-to-noise ratio and create a deployable early warning system.

The approach is: - Scientifically sound: Extreme weather events DO impact prices - Technically feasible: Anomaly detection is well-established - Business valuable: Even 50% detection rate has huge value - Data-driven: Uses same REAL data sources, different approach

Success would provide the first satellite-based early warning system for potato price spikes in the Netherlands.

Codex validatie

Codex Validation

  • Files inspected: run_experiment.py, result_real.md, result.md.
  • Data integrity: PASS – the rerun ingests Boerderij prices, BRP parcel masks, and Sentinel-2 scenes from data/zarr_stores/lake_31UFU_medium.zarr (850 timestamps) with no synthetic fallbacks; all features are derived from actual NDVI/SCL data.
  • Feature benefit vs baseline: FAIL – none of the variants meet the 70 % precision requirement: Variant A reaches 19.6 % precision / 68.3 % recall, Variant B’s ML classifier collapses to 0 % precision/recall, and Variant C’s ensemble yields only 20 % precision. Each still trails even naive “always normal” baselines.
  • Verdict: INVALID – satellite anomaly detection remains refuted; it cannot reliably predict price spikes or outperform trivial baselines in the current implementation.