Let op: dit experiment is nog niet Codex-gevalideerd. Gebruik de bevindingen als voorlopige aanwijzingen.

Hypotheses

Experiment Log: FAMILY_CROSS_COMMODITY_SUBSTITUTION_DYNAMICS

FAMILY_CROSS_COMMODITY_SUBSTITUTION_DYNAMICS

This family tests whether Dutch farmers' cross-commodity substitution decisions during planting and harvest windows create predictable price impacts through forward-looking market anticipation.

Laatste update
2025-12-01
Repo-pad
hypotheses/FAMILY_CROSS_COMMODITY_SUBSTITUTION_DYNAMICS
Codex-bestand
Aanwezig

Experimentnotities

Experiment Log: FAMILY_CROSS_COMMODITY_SUBSTITUTION_DYNAMICS

Overview

This family tests whether Dutch farmers' cross-commodity substitution decisions during planting and harvest windows create predictable price impacts through forward-looking market anticipation.


Experiment Run: 2025-11-11

Verdict: REFUTED (all variants)

  • Variant A – Planting Decision Signals: Best model ridge, RMSE 17.3 vs persistent baseline 11.1 (−56 %). DM p = 0.002, HLN p = 0.003 → significantly worse.
  • Variant B – Harvest Timing Optimization: Best model ridge, RMSE 15.6 vs persistent 11.1 (−41 %). DM p = 0.519 (no evidence of improvement).
  • Variant C – Dynamic Land Allocation: Best model random_forest, RMSE 16.4 vs persistent 11.1 (−48 %). DM p = 0.107.

Configuration

  • Data Sources: Boerderij NL.157.2086 potatoes (weekly), Eurostat APRI_AP_CRPOUTA wheat (01110000), CBS 85676NED rotation proxies
  • Period: 2020‑01‑06 → 2024‑12‑30 (161 aligned weeks after feature drop)
  • Targets: 30‑day and 60‑day ahead spot prices (4‑ and 8‑week shifts)
  • CV: TimeSeriesSplit (5 folds, 8-week test windows), standard baselines (persistent, seasonal_naive, AR2, historical_mean)
  • SESOI: 12 % relative RMSE improvement

Results Summary

  • Variant A: Every planting-window model underperforms the persistent baseline by 40‑65 % RMSE; DM/HLN strongly reject improvement. Weather-adjusted substitution ratios fail to add signal.
  • Variant B: Harvest features yield only +4‑7 % practical gains over persistent but DM/HLN >0.10; no statistical backing.
  • Variant C: Combined land-allocation signals still trail the baseline (−48 % / −155 % RMSE gaps for 30d/60d horizons).

Critical Findings

  1. Data Availability Fixed, Signal Still Missing. We now ingest Eurostat wheat series weekly, yet the substitution features remain noise—cross-commodity ratios do not outscore a random-walk baseline.
  2. Baseline Dominance. Persistent and AR(2) baselines dominate every horizon; DM/HLN p-values stay well above 0.05 (or show the models are significantly worse).
  3. Feature Volatility. Rolling averages/ratios introduce heavy edge effects; even after forward-filling and clipping infinities, CV folds show large variance leading to negative improvements.

Provenance

  • Git SHA: current working tree (mlflow run IDs recorded under mlruns/FAMILY_CROSS_COMMODITY_SUBSTITUTION_DYNAMICS)
  • Data Versions: Boerderij API git:current, Eurostat APRI_AP_CRPOUTA cached 2025‑11‑11, CBS 85676NED harvest-time transform
  • Code: hypotheses/FAMILY_CROSS_COMMODITY_SUBSTITUTION_DYNAMICS/run.py (real-data modeling pipeline)

Decision Log

  • 2025-11-11: Replaced synthetic wheat proxy with Eurostat prices and reran all variants. Despite real cross-commodity data, every model remained worse than the persistent baseline; family stays REFUTED until a materially better modeling strategy is found.

Caveats and Limitations

  1. Weak Cross-Commodity Signal: Even with Eurostat wheat in place, weekly ratios are extremely noisy; smoothing reduces sample count.
  2. Short Window: Only five seasons (2020‑2024) remain after aligning planting/harvest windows → limited CV folds and low power.
  3. Single-Market Target: All models still forecast Dutch prices only; without multi-country targets, substitution shocks may be muted.

Next Actions

  1. Keep family marked REFUTED until a model beats the persistent baseline with DM/HLN significance.
  2. Investigate richer cross-commodity panels (e.g., sugar beet, onions) or Eurostat input-cost indices to strengthen the signal.
  3. Reframe the task around spread forecasting (potato – wheat) rather than absolute price levels to reduce baseline dominance.
  4. Extend history (pre-2015) once wheat series is backfilled, so rolling CV has >5 seasons.

Codex validatie

Codex Validation — 2025-11-10

Files Reviewed

  • run.py
  • config/*.yaml
  • experiment.md

Findings

  1. Real wheat data restored. run.py now fetches Dutch soft-wheat prices (prod_veg=01110000) directly from Eurostat APRI_AP_CRPOUTA, resamples them to weekly frequency, and refuses to proceed if the series is empty—no fabricated multipliers remain.
  2. Real potatoes unchanged. NL.157.2086 prices still come straight from Boerderij; the merge now forward-fills only real Eurostat observations.
  3. Full modeling now executes. The rerun trains Ridge/ElasticNet/GBM/RF models with rolling CV, compares them to all four baselines, and logs DM/HLN/TOST statistics.
  4. Still worse than baselines. Every horizon shows negative or negligible improvements (e.g., Variant A 30‑day: RMSE 17.3 vs persistent 11.1, DM p = 0.002), so the hypothesis remains refuted despite the real-data hookup.

Verdict

NOT VALIDATED – Inputs are real (Boerderij potatoes + Eurostat wheat) and the experiment now produces DM/HLN/TOST metrics, but every variant performs worse than the strongest baseline, so the family stays refuted.