πΊοΈ Core-JEPA: LeJEPA for Earth Observation
Core-JEPA is a specialized Earth Observation foundation vision backbone that bridges the gap between scalable self-supervised learning and theoretical rigor. Built on top of the DINOv3 backbone, this model integrates the LeJEPA (Lean Joint-Embedding Predictive Architecture) methodology.
Unlike standard approaches, Core-JEPA (based on LeJEPA) is a mathematical formulation for self-supervised learning (SSL) that is Free of Teacher-Student Paradigm and devoid of complex engineering heuristics.
π§ Experimental Status: This model is currently under active development. Weights and architectures may change as we refine the training objectives.
π¬ Methodology: Mathematics over Heuristics
The Problem with Standard DINOs & JEPAs
Traditional SSL architectures (such as DINOv3 or I-JEPA) rely on a complex stack of engineering heuristics to prevent representation collapse (where the model outputs the same vector for every image). These brittle mechanisms include:
- Teacher-Student Networks (requiring double memory).
- Exponential Moving Averages (EMA) for weight updates.
- Stop-Gradient operations.
- Asymmetric View Generation and complex hyperparameter scheduling.
The LeJEPA Solution
Core-JEPA breaks this cycle. It replaces ad-hoc engineering with a rigorous, provable objective function.
- Optimal Distribution: We rely on the theoretical proof that the Isotropic Gaussian is the optimal distribution that embeddings should follow to minimize downstream prediction risk.
- SIGReg (Sketched Isotropic Gaussian Regularization): To achieve this, we employ SIGReg. This objective constrains embeddings to reach that ideal distribution via random projections and characteristic-function matching.
- Heuristic-Free: By enforcing this distribution mathematically, Core-JEPA eliminates the need for a Teacher network or Stop-Gradients. It is stable across hyperparameters and exhibits linear time and memory complexity.
Training Details
- Backbone: ViT-L/16
- Dataset: Trained on the Core-Five HighRes dataset (Earth Observation).
- Strategy: Initialized from a pretrained DINOv3 point, then trained for ~100 epochs with the LeJEPA objective.
π Benchmark Results
After integrating SIGReg and training on Core-Five, we observed consistent improvements over the standard DINOv3 baseline across multiple global Earth Observation downstream tasks:
| Benchmark Dataset | Task Type | Improvement (Top-1 Acc) wrt DINOv3 |
|---|---|---|
| AID | Scene Classification | +1.8% π |
| UC Merced | Scene Classification | +1.5% π |
| RSSCN-7 | Scene Classification | +0.3% π |
| NWPU-RESISC45 | Scene Classification | +0.3% π |
β οΈ Limitations & Trade-offs
While LeJEPA demonstrates superior performance on global tasks (such as Scene Classification), our current experiments show a trade-off regarding local tasks.
- β Global Tasks: The isotropic Gaussian regularization forces strong separability between classes at a global level, boosting scene classification accuracy.
- β οΈ Local Tasks: Currently, the model underperforms the standard DINOv3 baseline on dense prediction tasks, such as Pixel-wise Segmentation.
We hypothesize that the strong regularization on the embedding distribution might be suppressing some of the high-frequency local features required for fine-grained segmentation. We are actively investigating this behavior.
π» Usage
You can load the Core-JEPA weights directly using mapminer and torch.hub. The weights are hosted in our artifacts repository for optimized bandwidth.
1. Installation
uv pip install mapminer
2. Inference Code
import torch
from mapminer import models
# 1. Initialize the architecture (DINOv3 backbone)
jepa = models.DINOv3(pretrained=False)
# 2. Download and Load Weights
# We point to the specific LeJEPA-Large checkpoint in the artifacts repo
ckpt_url = "https://huggingface.co/datasets/gajeshladharai/artifacts/resolve/main/core-jepa/lejepa-l.pt"
ckpt = torch.hub.load_state_dict_from_url(ckpt_url, map_location='cpu')
# 3. Remap Keys & Load State
# This maps the checkpoint keys to the mapminer model structure
jepa.load_state_dict(
{k.replace('encoder.model.', 'model.'): v for k, v in ckpt.items()},
strict=False
)
print("Core-JEPA loaded successfully!")
# 4. Run Inference
# Example: Random input image (Batch, Channels, Height, Width)
x = torch.randint(0, 255, (1, 3, 224, 224), dtype=torch.uint8)
# Normalize using the model's specific preprocessing
x = jepa.normalize(x)
# Forward pass
with torch.no_grad():
emb = jepa(x)
print(f"Embedding shape: {emb.shape}")
π Colab Notebooks
Get hands-on with Core-JEPA. Select a tutorial below to open it directly in Google Colab.
π Citation & Acknowledgements
This work relies heavily on the research presented in LeJEPA (Lean Joint-Embedding Predictive Architecture). If you find this model useful, please consider citing the original paper:
LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics
@article{balestriero2025lejepa,
title={LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics},
author={Balestriero, Randall and LeCun, Yann},
journal={arXiv preprint arXiv:2511.08544},
year={2025}
}
Check out the official LeJEPA repository here: https://github.com/rbalestr-lab/lejepa
π³ License
This project is released under the Creative Commons Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0) license.
β Free to use, share, and adapt for non-commercial research
β Commercial use is not permitted without explicit permission
π Please provide appropriate credit when using this dataset in publications or projects.
Model tree for gajeshladhar/core-jepa
Base model
facebook/dinov3-vit7b16-pretrain-sat493m