GeoSpot Base
A geolocation model built on SigLIP2-so400m (512px) that predicts GPS coordinates from images.
Model Details
- Backbone: google/siglip2-so400m-patch16-512 (frozen)
- Image Resolution: 512x512
- Embedding Dim: 512
- Training Steps: 206k
- Training Data: ~10.6M streetview images
Architecture
GeoCLIP-style contrastive learning between:
- Image Encoder: SigLIP2 vision tower + MLP projection (1152 → 512)
- Location Encoder: Multi-scale RFF encoding with learnable capsules
Usage
from geoclip.model.GeoCLIP import GeoCLIP
import torch
model = GeoCLIP(from_pretrained=False, encoder_name="siglip2")
state_dict = torch.load("model.safetensors")
model.load_state_dict(state_dict)
# Predict location from image
top_gps, top_probs = model.predict("image.jpg", top_k=5)
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support