Hypotheses
FAMILY_EUROPEAN_FERTILIZER_CRISIS - Experimental Design
FAMILY_EUROPEAN_FERTILIZER_CRISIS
This family exploits the 2022 European fertilizer crisis as a natural experiment to test regional cost transmission effects on Dutch potato prices. The crisis created massive regional divergences (German peak 216.8 vs French +40% increase) providing unique opportunity to test input cost transmission mechanisms using REAL DATA from European statistical agencies.
Experimentnotities
FAMILY_EUROPEAN_FERTILIZER_CRISIS - Experimental Design
Experimental Overview
This family exploits the 2022 European fertilizer crisis as a natural experiment to test regional cost transmission effects on Dutch potato prices. The crisis created massive regional divergences (German peak 216.8 vs French +40% increase) providing unique opportunity to test input cost transmission mechanisms using REAL DATA from European statistical agencies.
Critical Data Policy
MANDATORY: This experiment uses ONLY REAL DATA from verified repository interfaces: - NO synthetic, mock, or dummy data allowed - NO generated random numbers or fake datasets - NO placeholder values or example data - ALL data must be from actual repository interfaces documented in DATA_SOURCE_INDEX
Data Sources (REAL DATA ONLY)
Target Variable
- Dutch Potato Prices: BoerderijApi.get_data(product_id="NL.157.2086")
- Interface: src/sources/boerderij_nl/boerderij_nl_api.py
- Expected: 260+ weekly observations
- Version: git_sha_ae5b033
Feature Variables
- German Fertilizer: DestatisAPI via Eurostat fallback - 1,142+ observations with 2022 crisis peak (216.8)
- French Fertilizer: INSEEPublicAPI via Eurostat fallback - 1,534+ observations with +40% post-2022 impact
- EU Harmonized: EurostatAPI APRI_PI15_INQ - harmonized fertilizer indices across countries
- Weather Controls: OpenMeteoApi (lat=52.55, lon=5.55) - GDD and precipitation for production isolation
Mandatory Baseline Requirements
ALL EXPERIMENTS MUST INCLUDE 4 STANDARD BASELINES: 1. persistent: Current value for next period (random walk) 2. seasonal_naive: Same period previous year (52-week lag) 3. ar2: Autoregressive order 2 with trend 4. **historical_mean: Average of all historical values (alias for persistent)
Implementation: MUST use get_standard_baselines() from experiments._shared.baselines
Comparison: Against strongest baseline (lowest error)
Experimental Design
Rolling-Origin Cross-Validation
- Method: Time series rolling origin
- Training Window: Minimum 365 days (1 year)
- Step Size: 7 days (weekly progression)
- Test Periods: Maximum 10 horizons
- Validation Period: Latest 6 months held out
Natural Experiment Period
- Crisis Period: 2022 Q1-Q4 (primary analysis window)
- Pre-Crisis Baseline: 2020-2021 (normal cost relationships)
- Post-Crisis Recovery: 2023-2024 (normalization patterns)
Statistical Testing Protocol
Primary Tests (MANDATORY)
- Diebold-Mariano Test: Forecast accuracy comparison vs each baseline
- Harvey-Leybourne-Newbold Correction: Small sample adjustment for DM test
- TOST Equivalence Test: Statistical equivalence testing vs SESOI
- Benjamini-Hochberg FDR: Multiple testing correction across variants
SESOI (Smallest Effect Size of Interest)
- MAPE Improvement: 15.0% vs strongest baseline (business significance threshold)
- Practical Threshold: 20.0% for strong business impact
- Alpha Levels: Primary 0.05, Equivalence 0.10, FDR 0.05
Regime Testing (Crisis Detection)
- Structural Breaks: Bai-Perron test (max 2 breaks, 15% trim)
- Crisis Detection: CUSUM test (0.05 significance)
- Regime Switching: Hamilton Markov-switching (2 states)
Variant Specifications
Variant A: German-Dutch Cost Transmission
Focus: Direct transmission from German crisis peak (216.8) to Dutch prices Features: German fertilizer lags (4w, 6w, 8w), crisis indicators, peak proximity, DE-NL differentials Hypothesis: German cost crisis creates production disadvantage → reduced German supply → Dutch price increase
Variant B: French-Dutch Competitive Advantage
Focus: French moderate rise (+40%) creates competitive advantage vs German extreme costs Features: French fertilizer lags (4w, 6w), FR-DE differentials, competitive indices, advantage indicators Hypothesis: French relative advantage vs German costs → competitive pressure on Dutch positioning
Variant C: Regional Cost Arbitrage
Focus: Multi-country cost differentials create arbitrage opportunities affecting Dutch prices Features: Regional CV, max differentials, arbitrage indices, convergence indicators, multi-country pressure Hypothesis: Regional cost arbitrage opportunities → Dutch price adjustments through competitive pressure
Feature Engineering Requirements
Crisis Indicators (2022 Natural Experiment)
- German Crisis Peak: Binary indicator for periods near 216.8 peak
- French Moderate Rise: Gradient indicator for +40% cost increase pattern
- Regional Divergence: Coefficient of variation across DE/FR fertilizer costs
- Crisis Proximity: Distance measures from peak crisis periods
Lag Structure (Cost Transmission Timing)
- Primary Lags: 4-8 weeks (based on planting-to-price transmission)
- Secondary Lags: 2-12 weeks (robustness testing)
- Seasonal Interactions: Crisis timing × planting season effects
Weather Controls (Production Cost Isolation)
- Growing Degree Days: Cumulative GDD (base 10°C) for production normalization
- Precipitation: Weekly/monthly totals for drought stress controls
- Temperature Stress: Extreme temperature indicators affecting production costs
Quality Controls
Data Quality Requirements
- Minimum Observations: 200 per data source
- Maximum Missing Rate: 5% per variable
- Outlier Detection: Modified Z-score (threshold 3.0)
- Data Freshness: Maximum 30 days old for current data
Temporal Alignment
- Frequency: Weekly alignment for all variables
- Missing Data: Linear interpolation for gaps <7 days
- Holiday Adjustments: European holiday calendar alignment
- Business Day Corrections: Convert to business week alignment
MLflow Logging Requirements
Mandatory Logging
- Data Versions: Git SHA, API versions, data vintage dates
- **All 4 Baselines standard baselines (persistent, seasonal_naive, ar2, historical_mean
- Feature Importance: Variable importance rankings and coefficients
- Crisis Period Analysis: Separate metrics for 2022 crisis period
- Regional Analysis: Country-specific transmission coefficients
Artifact Requirements
- CV Results: Complete cross-validation results CSV
- Plots: Time series plots, residual analysis, crisis period highlighting
- Feature Analysis: Correlation matrices, lag structure analysis
- Model Diagnostics: Residual plots, prediction intervals
Verdict Criteria
SUPPORTED Requirements
- Statistical Significance: p < 0.05 on DM+HLN test vs strongest baseline
- Practical Significance: >15% MAPE improvement vs strongest baseline
- Crisis Validation: Significant effects during 2022 crisis period
- Mechanism Validation: Lag structure consistent with cost transmission theory
REFUTED Criteria
- No Improvement: Performance worse than any standard baseline
- Wrong Direction: Price effects opposite to cost transmission predictions
- No Crisis Signal: No significant effects during 2022 natural experiment period
INCONCLUSIVE Criteria
- Marginal Significance: 0.05 < p < 0.15 with practical effect size
- Data Limitations: Insufficient observations for robust testing
- Mixed Results: Variants show conflicting evidence
Expected Timeline
- Data Collection: 1-2 days (API calls and validation)
- Feature Engineering: 2-3 days (crisis indicators, lag structures)
- Model Implementation: 2-3 days (3 variants with full baseline testing)
- Analysis and Documentation: 1-2 days (results interpretation and verdict)
- Total: 6-10 days for complete family evaluation
Risk Factors
Data Risks
- API Reliability: European statistics APIs may have downtime
- Data Vintage: Fertilizer data may have publication lags
- Crisis Period Coverage: Limited to 2022 observation density
Methodological Risks
- Lag Specification: Optimal lag structure may vary by country
- Structural Breaks: Crisis may create regime changes affecting model stability
- Confounding: Other 2022 events (energy crisis, war) may confound fertilizer effects
Computational Risks
- Feature Complexity: Regional differential calculations may be computationally intensive
- Cross-Validation: Multiple variants with regime testing increases computation time
Success Metrics
Scientific Success
- Mechanism Validation: Clear evidence of regional cost transmission during crisis
- Natural Experiment: Successful exploitation of 2022 crisis as identification strategy
- Methodological Innovation: First systematic regional fertilizer cost analysis in repository
Business Success
- Forecast Improvement: 15-25% improvement over standard baselines
- Crisis Preparedness: Framework for detecting future input cost shocks
- Regional Intelligence: Understanding of cross-border cost dynamics for Dutch market positioning
Notes
This hypothesis represents a unique opportunity to exploit a documented natural experiment (2022 fertilizer crisis) using verified REAL DATA from multiple European statistical agencies. The massive regional cost divergences (German 216.8 peak vs French +40%) provide unprecedented identification power for testing regional cost transmission mechanisms that affect Dutch potato market dynamics.
Geen Codex-samenvatting
Voeg codex_validated.md toe om de status te documenteren.