Let op: dit experiment is nog niet Codex-gevalideerd. Gebruik de bevindingen als voorlopige aanwijzingen.

Hypotheses

FAMILY_EUROPEAN_FERTILIZER_CRISIS - Experimental Design

FAMILY_EUROPEAN_FERTILIZER_CRISIS

This family exploits the 2022 European fertilizer crisis as a natural experiment to test regional cost transmission effects on Dutch potato prices. The crisis created massive regional divergences (German peak 216.8 vs French +40% increase) providing unique opportunity to test input cost transmission mechanisms using REAL DATA from European statistical agencies.

Laatste update
2025-12-01
Repo-pad
hypotheses/FAMILY_EUROPEAN_FERTILIZER_CRISIS
Codex-bestand
Ontbreekt

Experimentnotities

FAMILY_EUROPEAN_FERTILIZER_CRISIS - Experimental Design

Experimental Overview

This family exploits the 2022 European fertilizer crisis as a natural experiment to test regional cost transmission effects on Dutch potato prices. The crisis created massive regional divergences (German peak 216.8 vs French +40% increase) providing unique opportunity to test input cost transmission mechanisms using REAL DATA from European statistical agencies.

Critical Data Policy

MANDATORY: This experiment uses ONLY REAL DATA from verified repository interfaces: - NO synthetic, mock, or dummy data allowed - NO generated random numbers or fake datasets - NO placeholder values or example data - ALL data must be from actual repository interfaces documented in DATA_SOURCE_INDEX

Data Sources (REAL DATA ONLY)

Target Variable

  • Dutch Potato Prices: BoerderijApi.get_data(product_id="NL.157.2086")
  • Interface: src/sources/boerderij_nl/boerderij_nl_api.py
  • Expected: 260+ weekly observations
  • Version: git_sha_ae5b033

Feature Variables

  • German Fertilizer: DestatisAPI via Eurostat fallback - 1,142+ observations with 2022 crisis peak (216.8)
  • French Fertilizer: INSEEPublicAPI via Eurostat fallback - 1,534+ observations with +40% post-2022 impact
  • EU Harmonized: EurostatAPI APRI_PI15_INQ - harmonized fertilizer indices across countries
  • Weather Controls: OpenMeteoApi (lat=52.55, lon=5.55) - GDD and precipitation for production isolation

Mandatory Baseline Requirements

ALL EXPERIMENTS MUST INCLUDE 4 STANDARD BASELINES: 1. persistent: Current value for next period (random walk) 2. seasonal_naive: Same period previous year (52-week lag) 3. ar2: Autoregressive order 2 with trend 4. **historical_mean: Average of all historical values (alias for persistent)

Implementation: MUST use get_standard_baselines() from experiments._shared.baselines Comparison: Against strongest baseline (lowest error)

Experimental Design

Rolling-Origin Cross-Validation

  • Method: Time series rolling origin
  • Training Window: Minimum 365 days (1 year)
  • Step Size: 7 days (weekly progression)
  • Test Periods: Maximum 10 horizons
  • Validation Period: Latest 6 months held out

Natural Experiment Period

  • Crisis Period: 2022 Q1-Q4 (primary analysis window)
  • Pre-Crisis Baseline: 2020-2021 (normal cost relationships)
  • Post-Crisis Recovery: 2023-2024 (normalization patterns)

Statistical Testing Protocol

Primary Tests (MANDATORY)

  1. Diebold-Mariano Test: Forecast accuracy comparison vs each baseline
  2. Harvey-Leybourne-Newbold Correction: Small sample adjustment for DM test
  3. TOST Equivalence Test: Statistical equivalence testing vs SESOI
  4. Benjamini-Hochberg FDR: Multiple testing correction across variants

SESOI (Smallest Effect Size of Interest)

  • MAPE Improvement: 15.0% vs strongest baseline (business significance threshold)
  • Practical Threshold: 20.0% for strong business impact
  • Alpha Levels: Primary 0.05, Equivalence 0.10, FDR 0.05

Regime Testing (Crisis Detection)

  • Structural Breaks: Bai-Perron test (max 2 breaks, 15% trim)
  • Crisis Detection: CUSUM test (0.05 significance)
  • Regime Switching: Hamilton Markov-switching (2 states)

Variant Specifications

Variant A: German-Dutch Cost Transmission

Focus: Direct transmission from German crisis peak (216.8) to Dutch prices Features: German fertilizer lags (4w, 6w, 8w), crisis indicators, peak proximity, DE-NL differentials Hypothesis: German cost crisis creates production disadvantage → reduced German supply → Dutch price increase

Variant B: French-Dutch Competitive Advantage

Focus: French moderate rise (+40%) creates competitive advantage vs German extreme costs Features: French fertilizer lags (4w, 6w), FR-DE differentials, competitive indices, advantage indicators Hypothesis: French relative advantage vs German costs → competitive pressure on Dutch positioning

Variant C: Regional Cost Arbitrage

Focus: Multi-country cost differentials create arbitrage opportunities affecting Dutch prices Features: Regional CV, max differentials, arbitrage indices, convergence indicators, multi-country pressure Hypothesis: Regional cost arbitrage opportunities → Dutch price adjustments through competitive pressure

Feature Engineering Requirements

Crisis Indicators (2022 Natural Experiment)

  • German Crisis Peak: Binary indicator for periods near 216.8 peak
  • French Moderate Rise: Gradient indicator for +40% cost increase pattern
  • Regional Divergence: Coefficient of variation across DE/FR fertilizer costs
  • Crisis Proximity: Distance measures from peak crisis periods

Lag Structure (Cost Transmission Timing)

  • Primary Lags: 4-8 weeks (based on planting-to-price transmission)
  • Secondary Lags: 2-12 weeks (robustness testing)
  • Seasonal Interactions: Crisis timing × planting season effects

Weather Controls (Production Cost Isolation)

  • Growing Degree Days: Cumulative GDD (base 10°C) for production normalization
  • Precipitation: Weekly/monthly totals for drought stress controls
  • Temperature Stress: Extreme temperature indicators affecting production costs

Quality Controls

Data Quality Requirements

  • Minimum Observations: 200 per data source
  • Maximum Missing Rate: 5% per variable
  • Outlier Detection: Modified Z-score (threshold 3.0)
  • Data Freshness: Maximum 30 days old for current data

Temporal Alignment

  • Frequency: Weekly alignment for all variables
  • Missing Data: Linear interpolation for gaps <7 days
  • Holiday Adjustments: European holiday calendar alignment
  • Business Day Corrections: Convert to business week alignment

MLflow Logging Requirements

Mandatory Logging

  • Data Versions: Git SHA, API versions, data vintage dates
  • **All 4 Baselines standard baselines (persistent, seasonal_naive, ar2, historical_mean
  • Feature Importance: Variable importance rankings and coefficients
  • Crisis Period Analysis: Separate metrics for 2022 crisis period
  • Regional Analysis: Country-specific transmission coefficients

Artifact Requirements

  • CV Results: Complete cross-validation results CSV
  • Plots: Time series plots, residual analysis, crisis period highlighting
  • Feature Analysis: Correlation matrices, lag structure analysis
  • Model Diagnostics: Residual plots, prediction intervals

Verdict Criteria

SUPPORTED Requirements

  • Statistical Significance: p < 0.05 on DM+HLN test vs strongest baseline
  • Practical Significance: >15% MAPE improvement vs strongest baseline
  • Crisis Validation: Significant effects during 2022 crisis period
  • Mechanism Validation: Lag structure consistent with cost transmission theory

REFUTED Criteria

  • No Improvement: Performance worse than any standard baseline
  • Wrong Direction: Price effects opposite to cost transmission predictions
  • No Crisis Signal: No significant effects during 2022 natural experiment period

INCONCLUSIVE Criteria

  • Marginal Significance: 0.05 < p < 0.15 with practical effect size
  • Data Limitations: Insufficient observations for robust testing
  • Mixed Results: Variants show conflicting evidence

Expected Timeline

  • Data Collection: 1-2 days (API calls and validation)
  • Feature Engineering: 2-3 days (crisis indicators, lag structures)
  • Model Implementation: 2-3 days (3 variants with full baseline testing)
  • Analysis and Documentation: 1-2 days (results interpretation and verdict)
  • Total: 6-10 days for complete family evaluation

Risk Factors

Data Risks

  • API Reliability: European statistics APIs may have downtime
  • Data Vintage: Fertilizer data may have publication lags
  • Crisis Period Coverage: Limited to 2022 observation density

Methodological Risks

  • Lag Specification: Optimal lag structure may vary by country
  • Structural Breaks: Crisis may create regime changes affecting model stability
  • Confounding: Other 2022 events (energy crisis, war) may confound fertilizer effects

Computational Risks

  • Feature Complexity: Regional differential calculations may be computationally intensive
  • Cross-Validation: Multiple variants with regime testing increases computation time

Success Metrics

Scientific Success

  • Mechanism Validation: Clear evidence of regional cost transmission during crisis
  • Natural Experiment: Successful exploitation of 2022 crisis as identification strategy
  • Methodological Innovation: First systematic regional fertilizer cost analysis in repository

Business Success

  • Forecast Improvement: 15-25% improvement over standard baselines
  • Crisis Preparedness: Framework for detecting future input cost shocks
  • Regional Intelligence: Understanding of cross-border cost dynamics for Dutch market positioning

Notes

This hypothesis represents a unique opportunity to exploit a documented natural experiment (2022 fertilizer crisis) using verified REAL DATA from multiple European statistical agencies. The massive regional cost divergences (German 216.8 peak vs French +40%) provide unprecedented identification power for testing regional cost transmission mechanisms that affect Dutch potato market dynamics.

Geen Codex-samenvatting

Voeg codex_validated.md toe om de status te documenteren.