FAMILY_STORAGE_INFORMATION_ASYMMETRY: Experiment Log

Overview

Testing information asymmetry mechanisms in storage markets where operators possess private knowledge about quality deterioration, storage costs, and optimal release timing that creates predictable patterns 2-6 weeks ahead of spot price movements.

Hypothesis Origins

Prior experiments:
FAMILY_WEATHER_ACCUMULATION (92.4% improvement) demonstrated value of cumulative patterns
FAMILY_STORAGE_DECAY/OPTIMIZATION failed by ignoring information dynamics
FAMILY_CBS_NOWCASTING showed market inefficiency in processing information
Industry catalyst:
October 2024 storage crisis where operators knew problems 3-4 weeks before price spike
February 2023 quality issues discovered by operators weeks before market
Trader observations: "Watch releases, not announcements"
Academic basis:
Kyle (1985) informed trader models
Working (1949) storage with private information
Information economics (Akerlof 1970)

Experiment Design

Method: Rolling-origin cross-validation
Initial window: 365 days (1 year minimum)
Step size: 7 days (weekly)
Test windows: 10 horizons maximum
Refit frequency: Every 4 weeks
Baselines: ALL 4 MANDATORY - persistent, seasonal_naive, ar2, historical_mean

Data Sources (REAL DATA ONLY - NO SYNTHETIC/MOCK DATA)

CBS API: Table 84506NED - monthly storage stocks (git:current)
Boerderij.nl API:
NL.157.2086 (consumption potatoes) - weekly prices
NL.157.2083 (fries potatoes) - weekly prices for quality spreads
Open-Meteo API: Hourly temperature/humidity at 52.55°N, 5.55°E
Version control: All sources at git:current, pinned at experiment runtime

Experiment Runs

Variant A: Storage Release Signaling

Status: Completed 2025-08-18 - Model: Gradient Boosting - Features: Release volumes (1-2w lags), acceleration, small release indicators, cumulative 4w patterns - Information lead: 2-4 weeks - Target: Test if early small releases signal larger movements

Variant B: Quality Information Asymmetry

Status: Completed 2025-08-18 - Model: Random Forest - Features: Quality spreads, spread widening rate, temperature/humidity stress, storage duration - Information lead: 2-6 weeks - Target: Test if operators know deterioration before market discovery

Variant C: Inventory Position Revelation

Status: Completed 2025-08-18 - Model: Ensemble (GB + RF + Ridge) - Features: Large holder proxy, release clustering, inventory drawdown, market power, strategic timing - Information lead: 3-4 weeks - Target: Test if large holder positioning reveals through cumulative patterns

Statistical Tests

Diebold-Mariano test with Harvey-Leybourne-Newbold correction
TOST equivalence test with SESOI = 10% improvement
Chow test for storage season regime breaks (Oct, Jan)
CUSUM for gradual information revelation
FDR correction for multiple comparisons

Verdicts

Run 2025-08-18: All Variants - Initial Implementation

Variant A: Storage Release Signaling

Data Versions: - Boerderij.nl prices: 2021-01-01 to 2024-12-31 (168 weekly observations) - CBS storage data: Not available (used price volatility proxies) - Git SHA: exp/FAMILY_SEASONAL_PLANTING/variants_abc

Rolling CV Results: - Training window: 52 weeks minimum - Test periods: 20 folds - Horizon: 4 weeks ahead

Model Performance: - Model MAPE: 22.36% - Model RMSE: Not calculated - Directional accuracy: Not calculated

Baseline Comparison: - Model: MAPE = 22.36% - Persistent baseline: MAPE = 22.04% (improvement: -1.4%) - Seasonal naive baseline: MAPE = 40.16% (improvement: +44.3%) - AR2 baseline: MAPE = 23.56% (improvement: +5.1%) - Naive baseline: MAPE = 22.04% (improvement: -1.4%) - Strongest competitor: persistent (22.04%) - Primary improvement: -1.4% vs persistent baseline

Statistical Tests: - DM test vs persistent: p-value = 0.7551 (not significant) - SESOI threshold: 10% - Practical significance: NO

Verdict: REFUTED (worse than baseline)

Caveats: - Model performed worse than simple persistent baseline - No actual storage volume data available (CBS table didn't have the expected data) - Used price volatility as proxy for release patterns, which may not capture true information signals - Information lead of 2-4 weeks not validated due to poor performance

Variant B: Quality Information Asymmetry

Data Versions: - Boerderij.nl consumption prices: 2021-01-01 to 2024-12-31 - Boerderij.nl fries prices: 2021-01-01 to 2024-12-31 - Quality spread calculated from price differential - Weather data: Failed to load (connection issue) - Git SHA: exp/FAMILY_SEASONAL_PLANTING/variants_abc

Rolling CV Results: - Training window: 52 weeks minimum - Test periods: 20 folds - Horizon: 4 weeks ahead

Model Performance: - Model MAPE: 23.72% - Random Forest with quality spread features

Baseline Comparison: - Model: MAPE = 23.72% - Persistent baseline: MAPE = 22.04% (improvement: -7.6%) - Seasonal naive baseline: MAPE = 40.16% (improvement: +40.9%) - AR2 baseline: MAPE = 23.56% (improvement: -0.7%) - Naive baseline: MAPE = 22.04% (improvement: -7.6%) - Strongest competitor: persistent (22.04%) - Primary improvement: -7.6% vs persistent baseline

Statistical Tests: - DM test vs persistent: p-value = 0.9875 (not significant) - SESOI threshold: 12% - Practical significance: NO

Verdict: REFUTED (worse than baseline)

Caveats: - Quality spread signal not predictive at 2-6 week horizon - Weather data unavailable, limiting deterioration modeling - Model performed significantly worse than persistent baseline - Information asymmetry hypothesis not supported by empirical evidence

Variant C: Inventory Position Revelation

Data Versions: - Boerderij.nl NL prices: 2021-01-01 to 2024-12-31 - Attempted BE/DE cross-market data but no overlapping observations in date range - Market microstructure proxied through bid-ask spreads - Git SHA: exp/FAMILY_SEASONAL_PLANTING/variants_abc

Rolling CV Results: - Training window: 52 weeks minimum
- Test periods: 20 folds - Horizon: 4 weeks ahead

Model Performance: - Ensemble model (GB 40% + RF 40% + Ridge 20%) - Model MAPE: 22.89%

Baseline Comparison: - Model: MAPE = 22.89% - Persistent baseline: MAPE = 22.04% (improvement: -3.8%) - Seasonal naive baseline: MAPE = 40.16% (improvement: +43.0%) - AR2 baseline: MAPE = 23.56% (improvement: +2.9%) - Naive baseline: MAPE = 22.04% (improvement: -3.8%) - Strongest competitor: persistent (22.04%) - Primary improvement: -3.8% vs persistent baseline

Statistical Tests: - DM test vs persistent: p-value = 0.7312 (not significant) - SESOI threshold: 11% - Practical significance: NO

Verdict: REFUTED (worse than baseline)

Caveats: - Cross-market data not available for the test period - Large holder positioning proxied through price patterns only - Ensemble model still underperformed simple baselines - Information revelation hypothesis not supported

MLflow Experiment: FAMILY_STORAGE_INFORMATION_ASYMMETRY Artifacts: CV results saved as csv files

HE Notes

Created 2025-08-18 focusing on INFORMATION ASYMMETRY not physical processes
Key innovation: Exploits revelation patterns over 2-6 week windows
Differentiator: Unlike failed storage families, focuses on information timing advantages
All variants use ONLY REAL DATA from repository interfaces
Critical validation periods: 2024 Q4 crisis, 2023 Q1 quality issues
Expected 10-15% improvement through information lead time advantages

Decision Log

2025-08-18: Initial Run - All Variants REFUTED

Summary: All three variants of the storage information asymmetry hypothesis were refuted. Models performed worse than the simple persistent baseline.

Key Findings: 1. Data Limitations: CBS storage volume data was not available in the expected format. Had to use price volatility as proxy for storage releases, which likely doesn't capture true information signals. 2. Baseline Strength: The persistent baseline (random walk) proved remarkably strong for 4-week ahead predictions, suggesting limited predictability from information asymmetry features. 3. Missing Cross-Market Data: International price data (BE/DE) had no overlapping observations with the test period, limiting cross-market information flow analysis. 4. Weather Data Issues: Connection problems prevented incorporation of temperature/humidity stress features for quality deterioration modeling.

Lessons Learned: - Information asymmetry may exist but not manifest in predictable price patterns at 2-6 week horizons - Quality spreads (consumption vs fries prices) do not contain predictive information about future spot prices - Market microstructure proxies from bid-ask spreads insufficient to capture large holder positioning

Next Steps: 1. Data Enhancement: Need actual CBS storage volume data (may be in different table) 2. Shorter Horizons: Information advantage may be more relevant at 1-2 week horizons rather than 4 weeks 3. Alternative Hypothesis: Consider that markets may be more efficient than expected in incorporating storage information

Verdict: FAMILY_STORAGE_INFORMATION_ASYMMETRY hypothesis REFUTED across all variants. The hypothesized information revelation patterns over 2-6 week windows do not provide predictive power beyond simple baselines.

FAMILY_STORAGE_INFORMATION_ASYMMETRY: Experiment Log

Experimentnotities

FAMILY_STORAGE_INFORMATION_ASYMMETRY: Experiment Log

Overview

Hypothesis Origins

Experiment Design

Data Sources (REAL DATA ONLY - NO SYNTHETIC/MOCK DATA)

Experiment Runs

Variant A: Storage Release Signaling

Variant B: Quality Information Asymmetry

Variant C: Inventory Position Revelation

Statistical Tests

Verdicts

Run 2025-08-18: All Variants - Initial Implementation

Variant A: Storage Release Signaling

Variant B: Quality Information Asymmetry

Variant C: Inventory Position Revelation

HE Notes

Decision Log

2025-08-18: Initial Run - All Variants REFUTED

Codex validatie

Codex Validation — 2025-11-10

Files Reviewed

Findings

Verdict