Yield Prediction Model - Open Source Demo
Overview
This project demonstrates machine learning models to predict vines yield at harvest time using remote sensing data, weather information, soil properties, and agronomic attributes.
The models predict:
- TCH (Tons of Grapes per Hectare): Vines yield at harvest
Files
training_features_anonymized.csv- Dataset for model training (926 rows × 589 columns)BuildModels_open_source.ipynb- Self-contained notebook for training prediction models
BuildModels_open_source.ipynb - Quick Start Guide
What It Does
This notebook trains machine learning models to predict vines yield (TCH) at harvest time.
Input
- File:
training_features_anonymized.csv(926 harvest observations) - Features: Satellite data, weather, soil properties, and crop characteristics
Process
Load and prepare data
- Read CSV file
- Encode categorical variables (variety, rootstock type)
Train models using Leave-One-Season-Out Cross-Validation
- For each season: train on all other seasons, test on held-out season
- Algorithm: LightGBM with 31 leaves, 100 trees
- Remove outliers: TCH (0.1-60 tons/ha)
Evaluate performance
- Calculate metrics: RMSE, MAE, R², MAPE
- Generate scatter plots and feature importance charts
Save final models
- Train on complete dataset
- Export as
.joblibfiles for future use
Output Files
tch_model.joblib- Yield prediction modeltch_encoders.joblib- Label encoders for categorical variables
Feature Set Used
Weather + Soil + Extra:
- 5 satellite spectral indices (NDVI, EVI, VARI, NDRE, NDWI) × 42 time steps = 210 features
- Weather time series: precipitation, temperature, degree days (28 features)
- Soil properties: clay, sand, nitrogen, at 4 depths (25 features)
- Agronomic: variety, age, cut cycle, day of year (4 features)
- Extra: rootstock type, spacing, coordinates (5 features)
- Total: ~272 features
Requirements
numpy>=1.21.0
pandas>=1.3.0
scikit-learn>=1.0.0
lightgbm>=3.3.0
matplotlib>=3.4.0
seaborn>=0.11.0
jupyter>=1.0.0
joblib>=1.0.0
Usage
Installation
# Install dependencies
pip install numpy pandas scikit-learn lightgbm matplotlib seaborn jupyter joblib
Running the Notebook
# Navigate to the notebook directory
cd open_source_model/
# Launch Jupyter
jupyter notebook
# Open BuildModels_open_source.ipynb and run all cells
Using the Trained Models
import joblib
import pandas as pd
# Load models and encoders
tch_model = joblib.load('tch_model.joblib')
tch_encoders = joblib.load('tch_encoders.joblib')
# Prepare your data (must have the same features)
# X = pd.DataFrame(...) # Your feature data
# Make predictions
tch_predictions = tch_model.predict(X)
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support