Hypotheses
Experiment Log: FAMILY_CROSS_COMMODITY_SUBSTITUTION_DYNAMICS
FAMILY_CROSS_COMMODITY_SUBSTITUTION_DYNAMICS
This family tests whether Dutch farmers' cross-commodity substitution decisions during planting and harvest windows create predictable price impacts through forward-looking market anticipation.
Experimentnotities
Experiment Log: FAMILY_CROSS_COMMODITY_SUBSTITUTION_DYNAMICS
Overview
This family tests whether Dutch farmers' cross-commodity substitution decisions during planting and harvest windows create predictable price impacts through forward-looking market anticipation.
Experiment Run: 2025-11-11
Verdict: REFUTED (all variants)
- Variant A – Planting Decision Signals: Best model
ridge, RMSE 17.3 vs persistent baseline 11.1 (−56 %). DM p = 0.002, HLN p = 0.003 → significantly worse. - Variant B – Harvest Timing Optimization: Best model
ridge, RMSE 15.6 vs persistent 11.1 (−41 %). DM p = 0.519 (no evidence of improvement). - Variant C – Dynamic Land Allocation: Best model
random_forest, RMSE 16.4 vs persistent 11.1 (−48 %). DM p = 0.107.
Configuration
- Data Sources: Boerderij NL.157.2086 potatoes (weekly), Eurostat APRI_AP_CRPOUTA wheat (01110000), CBS 85676NED rotation proxies
- Period: 2020‑01‑06 → 2024‑12‑30 (161 aligned weeks after feature drop)
- Targets: 30‑day and 60‑day ahead spot prices (4‑ and 8‑week shifts)
- CV: TimeSeriesSplit (5 folds, 8-week test windows), standard baselines (persistent, seasonal_naive, AR2, historical_mean)
- SESOI: 12 % relative RMSE improvement
Results Summary
- Variant A: Every planting-window model underperforms the persistent baseline by 40‑65 % RMSE; DM/HLN strongly reject improvement. Weather-adjusted substitution ratios fail to add signal.
- Variant B: Harvest features yield only +4‑7 % practical gains over persistent but DM/HLN >0.10; no statistical backing.
- Variant C: Combined land-allocation signals still trail the baseline (−48 % / −155 % RMSE gaps for 30d/60d horizons).
Critical Findings
- Data Availability Fixed, Signal Still Missing. We now ingest Eurostat wheat series weekly, yet the substitution features remain noise—cross-commodity ratios do not outscore a random-walk baseline.
- Baseline Dominance. Persistent and AR(2) baselines dominate every horizon; DM/HLN p-values stay well above 0.05 (or show the models are significantly worse).
- Feature Volatility. Rolling averages/ratios introduce heavy edge effects; even after forward-filling and clipping infinities, CV folds show large variance leading to negative improvements.
Provenance
- Git SHA: current working tree (mlflow run IDs recorded under
mlruns/FAMILY_CROSS_COMMODITY_SUBSTITUTION_DYNAMICS) - Data Versions: Boerderij API git:current, Eurostat APRI_AP_CRPOUTA cached 2025‑11‑11, CBS 85676NED harvest-time transform
- Code:
hypotheses/FAMILY_CROSS_COMMODITY_SUBSTITUTION_DYNAMICS/run.py(real-data modeling pipeline)
Decision Log
- 2025-11-11: Replaced synthetic wheat proxy with Eurostat prices and reran all variants. Despite real cross-commodity data, every model remained worse than the persistent baseline; family stays REFUTED until a materially better modeling strategy is found.
Caveats and Limitations
- Weak Cross-Commodity Signal: Even with Eurostat wheat in place, weekly ratios are extremely noisy; smoothing reduces sample count.
- Short Window: Only five seasons (2020‑2024) remain after aligning planting/harvest windows → limited CV folds and low power.
- Single-Market Target: All models still forecast Dutch prices only; without multi-country targets, substitution shocks may be muted.
Next Actions
- Keep family marked REFUTED until a model beats the persistent baseline with DM/HLN significance.
- Investigate richer cross-commodity panels (e.g., sugar beet, onions) or Eurostat input-cost indices to strengthen the signal.
- Reframe the task around spread forecasting (potato – wheat) rather than absolute price levels to reduce baseline dominance.
- Extend history (pre-2015) once wheat series is backfilled, so rolling CV has >5 seasons.
Codex validatie
Codex Validation — 2025-11-10
Files Reviewed
run.pyconfig/*.yamlexperiment.md
Findings
- Real wheat data restored.
run.pynow fetches Dutch soft-wheat prices (prod_veg=01110000) directly from EurostatAPRI_AP_CRPOUTA, resamples them to weekly frequency, and refuses to proceed if the series is empty—no fabricated multipliers remain. - Real potatoes unchanged. NL.157.2086 prices still come straight from Boerderij; the merge now forward-fills only real Eurostat observations.
- Full modeling now executes. The rerun trains Ridge/ElasticNet/GBM/RF models with rolling CV, compares them to all four baselines, and logs DM/HLN/TOST statistics.
- Still worse than baselines. Every horizon shows negative or negligible improvements (e.g., Variant A 30‑day: RMSE 17.3 vs persistent 11.1, DM p = 0.002), so the hypothesis remains refuted despite the real-data hookup.
Verdict
NOT VALIDATED – Inputs are real (Boerderij potatoes + Eurostat wheat) and the experiment now produces DM/HLN/TOST metrics, but every variant performs worse than the strongest baseline, so the family stays refuted.