Let op: dit experiment is nog niet Codex-gevalideerd. Gebruik de bevindingen als voorlopige aanwijzingen.

Hypotheses

FAMILY_PARCEL_DYNAMICS: Experiment Log

FAMILY_PARCEL_DYNAMICS

Testing how annual parcel boundary dynamics for consumption potatoes create predictable price impacts through supply signaling, efficiency effects, and regional production shifts.

Laatste update
2025-12-01
Repo-pad
hypotheses/FAMILY_PARCEL_DYNAMICS
Codex-bestand
Ontbreekt

Experimentnotities

FAMILY_PARCEL_DYNAMICS: Experiment Log

Overview

Testing how annual parcel boundary dynamics for consumption potatoes create predictable price impacts through supply signaling, efficiency effects, and regional production shifts.

Hypothesis Origins

  • Prior experiments: First family to utilize BRP parcel-level dynamics - no prior experiments have explored this data source
  • Industry catalyst: 2024 parcel consolidation in Flevoland, small farmer exits visible in parcel counts
  • Academic basis: Spatial economics (Krugman 1991), farm size efficiency (Key & Roberts 2009), fragmentation impacts (Latruffe & Piet 2014)
  • Data opportunity: BRP API provides annual parcel boundaries back to 2015, enabling year-over-year dynamics tracking

Experiment Design

  • Method: Rolling-origin cross-validation
  • Initial window: 156 weeks (3 years)
  • Step size: 4 weeks
  • Test windows: 52 weeks (1 year)
  • Refit frequency: Every 12 weeks (quarterly)
  • Baselines: Naive seasonal, ARIMA, linear trend
  • REAL DATA ONLY: BRP parcel boundaries, Boerderij.nl prices, CBS validation, Open-Meteo weather

Data Sources (REAL DATA ONLY)

  • BRP API: BRPApi().get_consumption_potato_mask() for years 2015-2025 - git:current
  • Boerderij.nl API: Product NL.157.2086 (consumption potatoes) - git:current
  • CBS API: Table 85676NED for validation - version 2024-Q4
  • Open-Meteo API: Weather context at 52.55°N, 5.55°E - git:current
  • NO synthetic, mock, or dummy data permitted

Experiment Runs

Variant A: Area Expansion Signals

Status: Not started - Model: Ridge regression with area change features - Features: yoy_area_change_pct, area_trend_3y, area_acceleration, area_volatility, price_lags - Horizons: 30-day, 60-day ahead - Target: Test if 5-10% area increase leads to 3-7% price decrease - Expected: >3% MASE improvement over baseline - Implementation: Calculate annual area from BRP masks, track year-over-year changes

Variant B: Parcel Fragmentation

Status: Not started - Model: Random forest with fragmentation features - Features: avg_parcel_size, parcel_count, fragmentation_index, size_distribution_gini, small_parcel_ratio, price_lags - Horizons: 30-day, 60-day ahead - Target: Test if increased fragmentation leads to 2-5% price increase - Expected: >3% MASE improvement over baseline - Implementation: Analyze individual parcel geometries from BRP API

Variant C: Regional Shifts

Status: Not started - Model: Gradient boosting with regional distribution features - Features: regional_concentration, distance_weighted_shift, new_region_ratio, transport_cost_proxy, regional_volatility, price_lags - Horizons: 30-day, 60-day ahead - Target: Test if >5% regional shift increases price volatility by 4-8% - Expected: >3% MASE improvement over baseline - Implementation: Grid-based (10km × 10km) analysis of production distribution

Implementation Notes

BRP Data Processing

# Example: Calculate year-over-year area change
from src.sources.brp.brp_api.brp import BRPApi
from datetime import date
import numpy as np

brp = BRPApi()
bbox = (5.505, 52.505, 5.595, 52.595)  # Example region

# Get masks for two consecutive years
mask_2023 = brp.get_consumption_potato_mask(
    bbox, (date(2023,1,1), date(2023,12,31)),
    grid_shape=(1000, 1000), target_crs='EPSG:32631'
)
mask_2024 = brp.get_consumption_potato_mask(
    bbox, (date(2024,1,1), date(2024,12,31)),
    grid_shape=(1000, 1000), target_crs='EPSG:32631'
)

# Calculate areas (assuming 10m resolution)
area_2023_ha = np.sum(mask_2023) * 0.01  # 10m × 10m = 0.01 ha
area_2024_ha = np.sum(mask_2024) * 0.01

# Year-over-year change
yoy_change_pct = ((area_2024_ha - area_2023_ha) / area_2023_ha) * 100

Key Risks and Mitigations

  1. Data availability: BRP data only available annually - mitigate by using multiple years
  2. Spatial resolution: Parcel boundaries may have errors - validate against CBS totals
  3. Temporal alignment: Annual parcel data vs weekly prices - use appropriate lags
  4. Regional definition: Grid-based approach may miss administrative boundaries - test multiple grid sizes

Next Steps

  1. Implement data extraction pipeline for BRP year-over-year comparisons
  2. Create feature engineering functions for each variant
  3. Set up rolling CV framework with proper data versioning
  4. Run baseline models for comparison
  5. Execute experiments and record results

Decision Log

  • 2025-08-17: Hypothesis family created based on unexplored BRP parcel dynamics opportunity

Simplified Experiment Results - 2025-08-17 11:44

Data Source: REAL BRP parcel data from multiple regions Years Analyzed: 2020-2024 Total Parcels Processed: 34301

Parcel Statistics: - Average parcels per year: 6860 - Average area (ha): 86597 - YoY parcel change: 1.2%

Model Results:

Variant A: - MAE: 32.94 EUR/100kg - RMSE: 34.41 EUR/100kg - MAPE: 90.4% - Training samples: 2 - Test samples: 2

Variant B: - MAE: 19.79 EUR/100kg - RMSE: 20.40 EUR/100kg - MAPE: 54.8% - Training samples: 2 - Test samples: 2

Variant C: - MAE: 16.63 EUR/100kg - RMSE: 17.97 EUR/100kg - MAPE: 44.8% - Training samples: 2 - Test samples: 2

Verdict: Variant A: WEAK (MAPE=90.4%) Variant B: WEAK (MAPE=54.8%) Variant C: WEAK (MAPE=44.8%)

Note: Simplified analysis using annual aggregates and real parcel counts.

Verdict v1 — 2025-08-17

Label: INCONCLUSIVE
Scope: Netherlands consumption potatoes, annual parcel dynamics
Effect: Regional shift features show potential (MAPE ~45%) but insufficient data for statistical significance
Stats: Limited to 5 annual observations; DM test not feasible with n=2 test samples
Data/Code: git=current; BRP parcels (2020-2024, n=34,301), Boerderij.nl prices
Notes: Successfully extracted REAL parcel data from BRP API. Need weekly alignment and longer time series for proper evaluation. Regional concentration (Variant C) shows most promise.

Geen Codex-samenvatting

Voeg codex_validated.md toe om de status te documenteren.