Hypotheses
FAMILY_SEED_POTATO_FORWARD_SIGNALS: Experiment Log
FAMILY_SEED_POTATO_FORWARD_SIGNALS
Testing seed potato prices as 3-4 month forward indicators for consumption potato prices through cost transmission, acreage decisions, and market expectations. Seed potatoes represent 25-30% of production costs and embed forward-looking supply information. This hypothesis uses REAL DATA ONLY from repository interfaces.
Experimentnotities
FAMILY_SEED_POTATO_FORWARD_SIGNALS: Experiment Log
Overview
Testing seed potato prices as 3-4 month forward indicators for consumption potato prices through cost transmission, acreage decisions, and market expectations. Seed potatoes represent 25-30% of production costs and embed forward-looking supply information. This hypothesis uses REAL DATA ONLY from repository interfaces.
Hypothesis Origins
- FAMILY_SEASONAL_PLANTING: INCONCLUSIVE but showed planting period critical
- FAMILY_PRODUCTION_CYCLE: Used production data but overlooked seed price signals
- Data Discovery: BoerderijApi contains seed potato prices (NL.157.seed codes) previously unutilized
- Industry Evidence: Farmers report seed prices drive planting decisions; 25-30% of total costs
- Academic Basis: Haile et al. (2016) input prices and supply response; Kouyaté et al. (2016) seed systems
Experiment Design
- Method: Rolling-origin cross-validation
- Training Window: 365 days minimum
- Step Size: 7 days (weekly)
- Test Window: 60 days maximum
- Baselines: ALL mandatory standard baselines (persistent, seasonal_naive, ar2, historical_mean)
- REAL DATA ONLY: BoerderijApi + CBS API
Data Sources (REAL DATA ONLY)
- Seed Prices: BoerderijApi - NL.157.seed varieties (legacy=true) - git:current
- Consumption Prices: BoerderijApi - NL.157.2086 consumption potatoes - git:current
- Planted Area: CBS API - Table 85677NED for acreage validation - git:current
- Biological Lag: 16 weeks (3-4 months) from planting to harvest
Experiment Runs
Variant A: Simple Forward Signal Model
Status: Not started - Model: RandomForest with seed prices at 3-4 month lags - Features: Seed prices at 12-16 week lags, momentum, seasonal flags - Horizons: 30-day, 60-day - Mechanism: Direct forward price transmission - Expected: 20-25% improvement over seasonal_naive
Variant B: Cost Transmission Ratio Model
Status: Not started - Model: GradientBoosting with seed/consumption price ratios - Features: Price ratios, profitability signals, margin expectations - Horizons: 30-day, 60-day - Innovation: Ratio analysis reveals supply response incentives - Expected: 23-28% improvement (captures economic decision-making)
Variant C: Dynamic Expectations Model
Status: Not started - Model: Ensemble (GB 0.4, RF 0.4, Ridge 0.2) with market expectations - Features: Seed momentum, volatility, correlation dynamics, harvest expectations - Horizons: 30-day, 60-day - Complexity: Forward-looking market sentiment from seed trading - Expected: 25-30% improvement (highest due to expectation embedding)
Statistical Tests
- Diebold-Mariano test with Harvey-Leybourne-Newbold correction
- TOST equivalence test with SESOI = 20% improvement
- Granger causality test for seed→consumption transmission
- Cross-correlation analysis for optimal lag identification
- FDR correction for multiple comparisons
- ALL 4 standard baselines (persistent, seasonal_naive, ar2, historical_mean) included
Transmission Mechanism Analysis
- Biological lag: 16 weeks (planting to harvest)
- Cost share: 25-30% of production costs
- Transmission rate: ~60% of seed cost changes pass through
- Seed rate: 2,500 kg/ha typical planting density
- Yield: 45 tons/ha average
- Minimum margin: 25% for planting decision
Verdicts
Data Availability Assessment - 2025-08-19
Verdict: DATA_UNAVAILABLE Issue: Insufficient seed potato price data for weekly/monthly forecasting
Data Investigation Results: - BoerderijApi: NO seed potato product codes found (checked NL.157.seed, NL.157.5200, NL.157.pootgoed) - Eurostat: Found seed potato prices (product code 05200000) but ANNUAL frequency only - Only 10 records available (2020-2022) - Annual data insufficient for 30/60-day forecasting horizons - Cannot support 3-4 month biological lag analysis at weekly resolution
Conclusion: FAMILY_SEED_POTATO_FORWARD_SIGNALS cannot be tested without weekly/monthly seed potato price data. Hypothesis requires either: 1. Weekly seed potato prices from market sources (not available) 2. Monthly seed potato prices from statistical offices (not found) 3. Alternative proxy data for seed potato costs (would violate REAL_DATA_ONLY policy)
Recommendation: Archive hypothesis until appropriate data source is identified.
HE Notes
- Created 2025-08-18 to exploit newly discovered seed potato price data
- First analysis using seed prices for forward signaling
- 3-4 month biological lag provides genuine forward visibility
- Seed costs are second largest input after land
- All variants use ONLY REAL DATA from BoerderijApi
- SESOI set at 20% due to strong forward information content
- Critical for early harvest price anticipation
Decision Log
(To be updated after experiment completion)
Codex validatie
Codex Validation — 2025-11-10
Files Reviewed
run_experiment.pyexperiment.mdhypothesis.yml
Findings
- Real seed index integrated. The runner now imports the INSEE IPAMPA “Semences et plants” series via
pynseeand resamples it to weekly frequency instead of fabricating seed prices from consumption data. - Consumption data also real. Boerderij weekly consumption prices continue to serve as the target series, so all inputs come from verified sources.
- Experiments still pending.
experiment.mdremains “DATA_UNAVAILABLE,” meaning no cross-validation or baseline comparisons have been executed yet.
Verdict
NOT VALIDATED – The synthetic fallback is gone and the code is connected to real INSEE + Boerderij data, but until the experiments are actually run and compared to price-only baselines, the hypothesis remains unvalidated.