Hypotheses
FAMILY_PARCEL_DYNAMICS: Experiment Log
FAMILY_PARCEL_DYNAMICS
Testing how annual parcel boundary dynamics for consumption potatoes create predictable price impacts through supply signaling, efficiency effects, and regional production shifts.
Experimentnotities
FAMILY_PARCEL_DYNAMICS: Experiment Log
Overview
Testing how annual parcel boundary dynamics for consumption potatoes create predictable price impacts through supply signaling, efficiency effects, and regional production shifts.
Hypothesis Origins
- Prior experiments: First family to utilize BRP parcel-level dynamics - no prior experiments have explored this data source
- Industry catalyst: 2024 parcel consolidation in Flevoland, small farmer exits visible in parcel counts
- Academic basis: Spatial economics (Krugman 1991), farm size efficiency (Key & Roberts 2009), fragmentation impacts (Latruffe & Piet 2014)
- Data opportunity: BRP API provides annual parcel boundaries back to 2015, enabling year-over-year dynamics tracking
Experiment Design
- Method: Rolling-origin cross-validation
- Initial window: 156 weeks (3 years)
- Step size: 4 weeks
- Test windows: 52 weeks (1 year)
- Refit frequency: Every 12 weeks (quarterly)
- Baselines: Naive seasonal, ARIMA, linear trend
- REAL DATA ONLY: BRP parcel boundaries, Boerderij.nl prices, CBS validation, Open-Meteo weather
Data Sources (REAL DATA ONLY)
- BRP API:
BRPApi().get_consumption_potato_mask()for years 2015-2025 - git:current - Boerderij.nl API: Product NL.157.2086 (consumption potatoes) - git:current
- CBS API: Table 85676NED for validation - version 2024-Q4
- Open-Meteo API: Weather context at 52.55°N, 5.55°E - git:current
- NO synthetic, mock, or dummy data permitted
Experiment Runs
Variant A: Area Expansion Signals
Status: Not started - Model: Ridge regression with area change features - Features: yoy_area_change_pct, area_trend_3y, area_acceleration, area_volatility, price_lags - Horizons: 30-day, 60-day ahead - Target: Test if 5-10% area increase leads to 3-7% price decrease - Expected: >3% MASE improvement over baseline - Implementation: Calculate annual area from BRP masks, track year-over-year changes
Variant B: Parcel Fragmentation
Status: Not started - Model: Random forest with fragmentation features - Features: avg_parcel_size, parcel_count, fragmentation_index, size_distribution_gini, small_parcel_ratio, price_lags - Horizons: 30-day, 60-day ahead - Target: Test if increased fragmentation leads to 2-5% price increase - Expected: >3% MASE improvement over baseline - Implementation: Analyze individual parcel geometries from BRP API
Variant C: Regional Shifts
Status: Not started - Model: Gradient boosting with regional distribution features - Features: regional_concentration, distance_weighted_shift, new_region_ratio, transport_cost_proxy, regional_volatility, price_lags - Horizons: 30-day, 60-day ahead - Target: Test if >5% regional shift increases price volatility by 4-8% - Expected: >3% MASE improvement over baseline - Implementation: Grid-based (10km × 10km) analysis of production distribution
Implementation Notes
BRP Data Processing
# Example: Calculate year-over-year area change
from src.sources.brp.brp_api.brp import BRPApi
from datetime import date
import numpy as np
brp = BRPApi()
bbox = (5.505, 52.505, 5.595, 52.595) # Example region
# Get masks for two consecutive years
mask_2023 = brp.get_consumption_potato_mask(
bbox, (date(2023,1,1), date(2023,12,31)),
grid_shape=(1000, 1000), target_crs='EPSG:32631'
)
mask_2024 = brp.get_consumption_potato_mask(
bbox, (date(2024,1,1), date(2024,12,31)),
grid_shape=(1000, 1000), target_crs='EPSG:32631'
)
# Calculate areas (assuming 10m resolution)
area_2023_ha = np.sum(mask_2023) * 0.01 # 10m × 10m = 0.01 ha
area_2024_ha = np.sum(mask_2024) * 0.01
# Year-over-year change
yoy_change_pct = ((area_2024_ha - area_2023_ha) / area_2023_ha) * 100
Key Risks and Mitigations
- Data availability: BRP data only available annually - mitigate by using multiple years
- Spatial resolution: Parcel boundaries may have errors - validate against CBS totals
- Temporal alignment: Annual parcel data vs weekly prices - use appropriate lags
- Regional definition: Grid-based approach may miss administrative boundaries - test multiple grid sizes
Next Steps
- Implement data extraction pipeline for BRP year-over-year comparisons
- Create feature engineering functions for each variant
- Set up rolling CV framework with proper data versioning
- Run baseline models for comparison
- Execute experiments and record results
Decision Log
- 2025-08-17: Hypothesis family created based on unexplored BRP parcel dynamics opportunity
Simplified Experiment Results - 2025-08-17 11:44
Data Source: REAL BRP parcel data from multiple regions Years Analyzed: 2020-2024 Total Parcels Processed: 34301
Parcel Statistics: - Average parcels per year: 6860 - Average area (ha): 86597 - YoY parcel change: 1.2%
Model Results:
Variant A: - MAE: 32.94 EUR/100kg - RMSE: 34.41 EUR/100kg - MAPE: 90.4% - Training samples: 2 - Test samples: 2
Variant B: - MAE: 19.79 EUR/100kg - RMSE: 20.40 EUR/100kg - MAPE: 54.8% - Training samples: 2 - Test samples: 2
Variant C: - MAE: 16.63 EUR/100kg - RMSE: 17.97 EUR/100kg - MAPE: 44.8% - Training samples: 2 - Test samples: 2
Verdict: Variant A: WEAK (MAPE=90.4%) Variant B: WEAK (MAPE=54.8%) Variant C: WEAK (MAPE=44.8%)
Note: Simplified analysis using annual aggregates and real parcel counts.
Verdict v1 — 2025-08-17
Label: INCONCLUSIVE
Scope: Netherlands consumption potatoes, annual parcel dynamics
Effect: Regional shift features show potential (MAPE ~45%) but insufficient data for statistical significance
Stats: Limited to 5 annual observations; DM test not feasible with n=2 test samples
Data/Code: git=current; BRP parcels (2020-2024, n=34,301), Boerderij.nl prices
Notes: Successfully extracted REAL parcel data from BRP API. Need weekly alignment and longer time series for proper evaluation. Regional concentration (Variant C) shows most promise.
Geen Codex-samenvatting
Voeg codex_validated.md toe om de status te documenteren.