Yield Prediction Model - Open Source Demo

Overview

This project demonstrates machine learning models to predict vines yield at harvest time using remote sensing data, weather information, soil properties, and agronomic attributes.

The models predict:

  • TCH (Tons of Grapes per Hectare): Vines yield at harvest

Files

  • training_features_anonymized.csv - Dataset for model training (926 rows × 589 columns)
  • BuildModels_open_source.ipynb - Self-contained notebook for training prediction models

BuildModels_open_source.ipynb - Quick Start Guide

What It Does

This notebook trains machine learning models to predict vines yield (TCH) at harvest time.

Input

  • File: training_features_anonymized.csv (926 harvest observations)
  • Features: Satellite data, weather, soil properties, and crop characteristics

Process

  1. Load and prepare data

    • Read CSV file
    • Encode categorical variables (variety, rootstock type)
  2. Train models using Leave-One-Season-Out Cross-Validation

    • For each season: train on all other seasons, test on held-out season
    • Algorithm: LightGBM with 31 leaves, 100 trees
    • Remove outliers: TCH (0.1-60 tons/ha)
  3. Evaluate performance

    • Calculate metrics: RMSE, MAE, R², MAPE
    • Generate scatter plots and feature importance charts
  4. Save final models

    • Train on complete dataset
    • Export as .joblib files for future use

Output Files

  • tch_model.joblib - Yield prediction model
  • tch_encoders.joblib - Label encoders for categorical variables

Feature Set Used

Weather + Soil + Extra:

  • 5 satellite spectral indices (NDVI, EVI, VARI, NDRE, NDWI) × 42 time steps = 210 features
  • Weather time series: precipitation, temperature, degree days (28 features)
  • Soil properties: clay, sand, nitrogen, at 4 depths (25 features)
  • Agronomic: variety, age, cut cycle, day of year (4 features)
  • Extra: rootstock type, spacing, coordinates (5 features)
  • Total: ~272 features

Requirements

numpy>=1.21.0
pandas>=1.3.0
scikit-learn>=1.0.0
lightgbm>=3.3.0
matplotlib>=3.4.0
seaborn>=0.11.0
jupyter>=1.0.0
joblib>=1.0.0

Usage

Installation

# Install dependencies
pip install numpy pandas scikit-learn lightgbm matplotlib seaborn jupyter joblib

Running the Notebook

# Navigate to the notebook directory
cd open_source_model/

# Launch Jupyter
jupyter notebook

# Open BuildModels_open_source.ipynb and run all cells

Using the Trained Models

import joblib
import pandas as pd

# Load models and encoders
tch_model = joblib.load('tch_model.joblib')
tch_encoders = joblib.load('tch_encoders.joblib')

# Prepare your data (must have the same features)
# X = pd.DataFrame(...)  # Your feature data

# Make predictions
tch_predictions = tch_model.predict(X)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train AGRARIAN/vineyards-yield-model