Experiments
FAMILY_SATELLITE_STORAGE_INFERENCE: Complete Experimental Framework
FAMILY_SATELLITE_STORAGE_INFERENCE
**CRITICAL**: This experiment will use REAL DATA ONLY from repository interfaces. NO synthetic/mock data allowed.
Experimentnotities
FAMILY_SATELLITE_STORAGE_INFERENCE: Complete Experimental Framework
Executive Summary
CRITICAL: This experiment will use REAL DATA ONLY from repository interfaces. NO synthetic/mock data allowed.
The FAMILY_SATELLITE_STORAGE_INFERENCE experiment framework validates the hypothesis that satellite vegetation indices during critical growth periods (60-70 DAP) combined with storage depletion modeling can predict 3-6 month aggregated potato prices at quarterly and strategic planning horizons.
Key Innovation: Unlike immediate yield-price prediction models, this framework focuses on storage quality inference from growth-season satellite stress indicators, targeting longer-horizon aggregated price forecasts essential for quarterly planning and strategic decision-making.
Building on Proven Success
FAMILY_SEASONAL_PLANTING Foundation
Established Performance:
- 36.9% improvement at 4 weeks (STRONGLY SUPPORTED)
- 24.2% improvement at 8 weeks (STRONGLY SUPPORTED)
- 7.0% improvement at 12 weeks (MODERATELY SUPPORTED)
Key Lessons Applied: 1. Diminishing returns with horizon: Performance decreases as forecast horizon increases 2. Satellite signal optimization: 60-70 DAP critical growth window identified 3. Real data methodology: Complete resolution of data leakage and validation issues 4. BRP masking success: Effective parcel-level potato area identification
Performance Expectations
Based on FAMILY_SEASONAL_PLANTING diminishing returns pattern, extended to longer horizons: - 3-month aggregated prices: Target 10-20% improvement - 6-month aggregated prices: Target 5-15% improvement - Strategic focus: Quarterly/semi-annual planning rather than tactical decisions
Hypothesis Origins and Provenance
Academic Foundation
- Storage quality prediction: Van Der Velde et al. (2012) - Crop monitoring predicts post-harvest characteristics
- Vegetation stress impacts: Lobell & Burke (2010) - Satellite-derived stress affects storage potential
- Dutch potato storage patterns: Haverkort & Hillier (2011) - Quality-based release timing
Industry Context
- FIWAP April stock surveys: Storage facilities report quality-based release patterns
- Processing industry feedback: McCain, Lamb Weston report quality variations affect contract timing
- Dutch trader observations: Vegetation stress during growth predicts storage season price volatility
Prior Experimental Evidence
- FAMILY_SEASONAL_PLANTING: Validated satellite-price mechanism at shorter horizons
- Storage quality indicators: April 1st stock surveys show quality-price relationships
- Growth period criticality: 60-70 DAP established as optimal vegetation monitoring window
Core Hypothesis and Mechanisms
Primary Hypothesis
Satellite vegetation indices during critical growth periods (60-70 DAP) combined with storage depletion modeling can predict 3-6 month aggregated potato prices at 3, 6, and 9 month forward horizons with meaningful business impact.
Expected Mechanisms
1. Growth Stress → Storage Quality Pathway
- NDVI stress events during tuber development (Jun-Jul) predict storage potential
- Lower vegetation health → Reduced storage longevity → Earlier market release
- Higher stress levels → Accelerated deterioration → Compressed storage season
2. Quality → Seasonal Pricing Dynamics
- Better storage quality = Longer retention capability = Lower spring prices
- Reduced quality = Forced early sales = Higher autumn prices, lower spring
- Storage duration variability = Predictable seasonal price patterns
3. Depletion Modeling Framework
- Initial quality assessment (satellite-derived) → Storage release curves
- Weather-adjusted deterioration → Monthly depletion estimates
- Market timing prediction → Aggregated price forecasts
Experimental Design and Implementation
Target Variables (3-6 Month Aggregated Prices)
3-Month Aggregated Predictions
- Target horizons: 3, 6, 9 months forward
- Aggregation method: Rolling 3-month mean prices (quarterly planning)
- Business application: Quarterly budget planning, contract negotiation timing
6-Month Aggregated Predictions
- Target horizons: 3, 6, 9 months forward
- Aggregation method: Rolling 6-month mean prices (strategic planning)
- Business application: Annual planning, storage facility investment decisions
Variant Structure
Variant A: Growth Stress Quality Assessment (SESOI: 15%)
- Focus: Pure satellite stress indicators during 60-70 DAP
- Features: NDVI stress events, vegetation decline rates, drought indicators
- Model: Random Forest with temporal stress aggregations
- Horizon: 3, 6, 9 months (3-month aggregated targets)
Variant B: Storage Depletion Modeling (SESOI: 20%)
- Focus: Quality-based storage release curve modeling
- Features: Initial stress assessment + storage condition modeling
- Model: Ensemble combining stress detection with depletion curves
- Horizon: 3, 6, 9 months (6-month aggregated targets)
Variant C: Integrated Storage Intelligence (SESOI: 25%)
- Focus: Complete storage season intelligence framework
- Features: Satellite stress + April stocks + weather + seasonal patterns
- Model: Gradient Boosting with multi-source feature integration
- Horizon: 3, 6, 9 months (both 3 and 6-month aggregated targets)
Data Sources and Architecture
100% REAL DATA SOURCES
1. Satellite Data: Sentinel-2 Zarr Store
data_source:
name: sentinel_zarr
interface: "xr.open_zarr"
path: "data/zarr_stores/lake_31UFU_medium.zarr"
version: "2015-2023 historical"
verification: "75GB real satellite imagery, no synthetic data"
2. Price Data: Full 25-Year History
data_source:
name: price_data
interface: "BoerderijApi().get_data()"
product_id: "NL.157.2086"
coverage: "2000-2024, 25 years weekly data"
verification: "Real market data, €4.12-€57.50 range validated"
3. Storage Intelligence: April Stock Surveys
data_source:
name: stock_surveys
interface: "StockAPI()"
coverage: "FIWAP Belgium 2010-2025, CNIPT France 2022-2024"
verification: "Real survey data, contract/free market splits"
4. Weather Context: Storage Conditions
data_source:
name: weather_data
interface: "OpenMeteoApi()"
variables: "temperature, humidity, precipitation for storage modeling"
verification: "Real meteorological observations, no synthetic generation"
5. Field Boundaries: Spatial Aggregation
data_source:
name: parcel_data
interface: "BRPApi().get_consumption_potato_mask()"
verification: "Real Dutch agricultural parcel boundaries, 304 hectares validated"
Methodology and Statistical Framework
Time Series Validation (NO DATA LEAKAGE)
Building on FAMILY_SEASONAL_PLANTING corrected methodology:
# CORRECT walk-forward validation (FAMILY_SEASONAL_PLANTING verified)
for i in range(min_train_size, n_samples):
train_data = df[:i] # Only historical data
target_period = df[i:i+horizon_months*4] # 4 weeks per month
aggregated_target = target_period['price'].mean() # 3 or 6-month mean
# Predict aggregated price without using any future data
Mandatory Baseline Testing (ALL 4 REQUIRED)
from experiments._shared.baselines import get_standard_baselines
baseline_forecasts = get_standard_baselines(
train_data=train_df,
horizon=horizon_weeks, # 12, 24, 36 weeks for 3, 6, 9 month horizons
target_col='aggregated_price'
)
# Must include: persistent, seasonal_naive, ar2, historical_mean
# Compare against STRONGEST baseline (lowest MAE)
Statistical Validation Framework
- Diebold-Mariano Test: Statistical significance of improvement
- Harvey-Leybourne-Newbold Correction: Small sample adjustment
- TOST Equivalence Test: Practical significance validation
- FDR Correction: Multiple comparison adjustment
Technical Implementation Plan
Phase 1: Data Pipeline Construction (Week 1)
Satellite Processing Pipeline
# 1. Load real satellite data (NO synthetics)
sentinel_data = xr.open_zarr("data/zarr_stores/lake_31UFU_medium.zarr")
# 2. Apply BRP masking for potato parcels
potato_mask = brp_api.get_consumption_potato_mask(
bbox=netherlands_bbox,
date_range=(datetime(2015,1,1), datetime(2023,12,31)),
target_crs='EPSG:32631'
)
# 3. Extract vegetation indices for 60-70 DAP period
ndvi_growth = calculate_growth_period_ndvi(
sentinel_data, potato_mask,
dap_start=60, dap_end=70
)
# 4. Detect stress events and quality indicators
stress_indicators = detect_growth_stress(
ndvi_time_series,
weather_context=openmeteo_api.get_weather_data()
)
Price Aggregation Pipeline
# Load full 25-year price history (REAL data only)
price_data = boerderij_api.get_data(
product_id="NL.157.2086",
date_range=("2000-01-01", "2024-12-31"),
legacy=True
)
# Create 3-month and 6-month aggregated targets
price_data['price_3m_agg'] = price_data['avg_price'].rolling(12).mean() # 12 weeks = 3 months
price_data['price_6m_agg'] = price_data['avg_price'].rolling(24).mean() # 24 weeks = 6 months
Phase 2: Feature Engineering (Week 2)
Variant A: Growth Stress Features
growth_stress_features = {
'ndvi_stress_events': count_stress_periods(ndvi_data, threshold=0.05),
'vegetation_decline_rate': calculate_decline_velocity(ndvi_data),
'drought_indicator': detect_drought_stress(weather_data),
'heat_stress_accumulation': calculate_heat_stress_index(weather_data),
'recovery_potential': assess_post_stress_recovery(ndvi_data)
}
Variant B: Storage Depletion Features
depletion_features = {
'initial_quality_score': assess_harvest_quality(growth_stress_features),
'storage_deterioration_rate': model_deterioration_curve(quality_score, weather_data),
'expected_storage_duration': predict_storage_life(quality_score),
'release_timing_probability': estimate_release_curves(storage_duration),
'market_pressure_buildup': model_accumulating_release_pressure()
}
Variant C: Integrated Intelligence Features
integrated_features = {
**growth_stress_features,
**depletion_features,
'april_stock_tightness': stock_api.get_market_tightness_indicator(),
'seasonal_patterns': create_storage_season_features(),
'price_momentum': calculate_price_trends(price_data),
'cross_market_signals': get_belgian_german_indicators()
}
Phase 3: Model Development and Validation (Week 3)
Model Architecture by Variant
# Variant A: Random Forest (stress focus)
variant_a_model = RandomForestRegressor(
n_estimators=200, max_depth=8,
random_state=42, n_jobs=-1
)
# Variant B: Ensemble (depletion modeling)
variant_b_model = VotingRegressor([
('rf', RandomForestRegressor()),
('gbm', GradientBoostingRegressor()),
('xgb', XGBRegressor())
])
# Variant C: Gradient Boosting (integrated intelligence)
variant_c_model = GradientBoostingRegressor(
n_estimators=300, learning_rate=0.05,
max_depth=6, subsample=0.8,
random_state=42
)
Cross-Validation Framework
# Walk-forward validation with NO data leakage
cv_results = []
for train_end in range(156, len(data), 12): # 3-year initial window, quarterly steps
train_data = data[:train_end]
test_data = data[train_end:train_end+horizon_weeks]
# Train on historical data only
model.fit(train_features[:train_end], train_targets[:train_end])
# Predict aggregated prices
aggregated_prediction = model.predict(test_features[train_end])
actual_aggregated = test_data['price_aggregated'].mean()
cv_results.append({
'prediction': aggregated_prediction,
'actual': actual_aggregated,
'horizon': horizon_weeks
})
Expected Performance and Business Impact
Performance Targets by Variant
Variant A: Growth Stress Quality Assessment
- Expected Improvement: 10-15% over strongest baseline
- SESOI Threshold: 15% (realistic for 3-month aggregated targets)
- Business Application: Quarterly inventory planning
- Value Estimate: €10-15 per 100kg accuracy improvement
Variant B: Storage Depletion Modeling
- Expected Improvement: 15-20% over strongest baseline
- SESOI Threshold: 20% (higher complexity justified)
- Business Application: Storage facility operations planning
- Value Estimate: €15-20 per 100kg accuracy improvement
Variant C: Integrated Storage Intelligence
- Expected Improvement: 20-25% over strongest baseline
- SESOI Threshold: 25% (comprehensive framework)
- Business Application: Strategic 6-month planning and investment decisions
- Value Estimate: €20-25 per 100kg accuracy improvement
ROI Analysis for 10,000 Ton Operation
Conservative Performance Scenario (10% improvement)
- Annual value: €100,000 (10,000 tons × €10/100kg improvement)
- Development cost: €75,000 (25 person-days × €3,000/day)
- Payback period: 9 months
- 3-year NPV: €225,000
Target Performance Scenario (20% improvement)
- Annual value: €200,000 (10,000 tons × €20/100kg improvement)
- 5-year NPV: €750,000 (assuming 10% discount rate)
- Strategic advantage: First-mover advantage in storage intelligence
Risk Assessment and Mitigation
Technical Risks
1. Cloud Cover Limitations (MEDIUM RISK)
- Issue: Netherlands 49% clear scenes (validated in FAMILY_SEASONAL_PLANTING)
- Impact: Reduced satellite signal quality during critical growth period
- Mitigation: Multi-temporal compositing, cloud gap interpolation, weather-based adjustments
2. Storage Quality Signal Strength (HIGH RISK)
- Issue: Indirect relationship between growth stress and storage outcomes
- Impact: Weaker predictive signal than direct yield-price relationships
- Mitigation: Ensemble approach combining multiple stress indicators, validation with storage facility data
3. Aggregation Smoothing Effects (MEDIUM RISK)
- Issue: 3-6 month price aggregation reduces volatility and predictability
- Impact: Lower signal-to-noise ratio compared to weekly predictions
- Mitigation: Longer validation periods, seasonal regime modeling, volatility-adjusted metrics
Data Risks
1. Limited Historical Storage Data (HIGH RISK)
- Issue: FIWAP data only available 2010-2025 (16 years)
- Impact: Insufficient storage-price relationship validation
- Mitigation: Focus on satellite-price direct relationships, proxy storage indicators
2. Temporal Alignment Challenges (MEDIUM RISK)
- Issue: Growth period (Jun-Jul) to storage season (Oct-May) alignment
- Impact: Complex feature engineering required for proper temporal sequencing
- Mitigation: Careful DAP calculation, multiple growth windows tested
3. Market Efficiency Risk (LOW RISK)
- Issue: Storage quality information may already be priced into markets
- Impact: Reduced predictive value of satellite-derived quality assessments
- Mitigation: Focus on quality assessment timing advantages, private storage data integration
Limitations and Future Development
Current Framework Limitations
1. Single-Region Coverage
- Limitation: Netherlands only (31UFU tile coverage)
- Impact: Cannot capture cross-border storage effects
- Future Development: Belgium (31UDS) and Germany (32UMA) integration
2. Aggregated Quality Assessment
- Limitation: Area-weighted average stress indicators
- Impact: Missing spatial heterogeneity in storage quality
- Future Development: Parcel-level quality scoring, facility-specific modeling
3. Storage Facility Integration Gap
- Limitation: No direct storage facility data integration
- Impact: Proxy-based storage condition modeling
- Future Development: IoT sensor integration, facility partnership data
Enhancement Roadmap (2025-2026)
Phase 1: Multi-Region Expansion (Q1 2025)
- Belgian FIWAP data integration
- German BLE storage statistics
- Cross-border arbitrage opportunities
Phase 2: Facility-Level Intelligence (Q2 2025)
- Storage facility temperature/humidity data
- Individual facility quality assessments
- Facility-specific release timing models
Phase 3: Real-Time Quality Monitoring (Q3 2025)
- IoT sensor integration
- Continuous quality assessment updates
- Dynamic storage duration adjustments
Phase 4: Supply Chain Intelligence (Q4 2025)
- Processing facility demand integration
- Contract timing optimization
- End-to-end supply chain prediction
Success Criteria and Deliverables
Statistical Success Criteria
Primary Success (ANY variant achieves)
- MAE improvement ≥ SESOI threshold vs strongest baseline
- Statistical significance: p < 0.05 after Diebold-Mariano + HLN correction
- Practical significance: TOST equivalence test confirmation
- Consistency: Improvement across ≥ 2/3 CV folds
Secondary Success Indicators
- Directional accuracy ≥ 60% for aggregated price movements
- Volatility prediction: Capture seasonal storage volatility patterns
- Regime robustness: Performance maintained across storage seasons
Business Success Criteria
Deployment Readiness
- Model interpretability: Feature importance traceable to business logic
- Production scalability: Processing time < 30 minutes per update
- Integration feasibility: Compatible with existing systems architecture
- Risk management: Confidence intervals and uncertainty quantification
Value Demonstration
- ROI validation: Demonstrated value > development costs
- Competitive advantage: Unique storage intelligence capability
- Stakeholder buy-in: Clear business case for operational deployment
Technical Deliverables
Core Artifacts
- Complete experimental implementation (all 3 variants)
- MLflow experiment tracking with full reproducibility
- Statistical validation reports (DM, HLN, TOST results)
- Business impact assessment with ROI calculations
- Production deployment architecture documentation
Documentation Suite
- Methodology documentation with data leakage prevention
- Feature engineering cookbook for storage quality assessment
- Model interpretation guide for business stakeholders
- Deployment playbook for production implementation
- Performance monitoring framework for ongoing validation
Conclusion
The FAMILY_SATELLITE_STORAGE_INFERENCE experiment framework represents a strategic evolution from tactical satellite price prediction to strategic storage intelligence. Building on the proven success of FAMILY_SEASONAL_PLANTING (36.9%, 24.2%, 7.0% improvements), this framework targets the critical gap between immediate yield prediction and longer-term strategic planning.
Key Innovation Summary
- Storage Quality Inference: First satellite-based storage potential assessment framework
- Aggregated Price Targets: 3-6 month aggregated predictions for strategic planning
- Depletion Modeling: Quality-based storage release curve prediction
- Real Data Methodology: 100% real data with validated time series methodology
- Business-Aligned Horizons: Quarterly/strategic planning focus vs tactical decisions
Strategic Value Proposition
This framework addresses a critical market intelligence gap: how growth-season stress indicators predict storage-season market dynamics at strategic planning horizons. Unlike existing tactical models focused on immediate price movements, this approach enables:
- Quarterly business planning with 3-month aggregated price forecasts
- Storage facility optimization based on quality-duration modeling
- Strategic inventory decisions using 6-month aggregated predictions
- Risk management enhancement through seasonal volatility prediction
Success Pathway
With expected 10-25% improvements over baseline forecasting, this framework provides: - €100,000-250,000 annual value for 10,000 ton operations - Strategic competitive advantage in storage season planning - Foundation for supply chain intelligence expansion - Proven methodology scalable to other crops and regions
The FAMILY_SATELLITE_STORAGE_INFERENCE framework transforms satellite monitoring from tactical yield prediction to strategic storage intelligence, enabling data-driven decision-making across quarterly and annual planning cycles.
Experiment Framework Status: COMPLETE - Ready for Implementation
Data Sources: 100% REAL - NO synthetic data
Methodology: Validated - NO data leakage
Expected Timeline: 3 weeks implementation + 2 weeks validation
Business Impact: High - Strategic planning transformation potential
Codex validatie
Codex Validation
- Files inspected:
cross_validation_real_zarr.py,experiment.md,cross_validation_real_zarr_results.json - Data integrity: PASS – the rerun loads Boerderij prices plus Sentinel-2 scenes from
data/zarr_stores/lake_31UFU_small.zarrand filters clouds without any synthetic fallbacks (cross_validation_real_zarr.py:1-150). - Feature benefit vs price-only baseline: FAIL – leave-one-year-out CV (latest rerun 2025-11-11) produced only +3.6 % mean MAE improvement vs the persistence baseline with σ≈50 %; half the years (2023/2024) were worse than baseline by −20 % to −102 % (
cross_validation_real_zarr_results.json). - Verdict: INVALID – despite using real satellite data, the current storage inference pipeline does not consistently beat the mandatory baselines and cannot be considered validated.