Model Card for Meow-Omni 1-Base
Meow-Omni 1-Base is the foundational quad-modal Multimodal Large Language Model (MLLM) architecture. It represents the successful "model surgery" and latent-space alignment of text, vision, audio, and biological time-series, prior to specific fine-tuning for feline intention decoding.
Model Description
Meow-Omni 1-Base was engineered to bridge the "modality gap" in foundation models. While standard MLLMs are restricted to audio-visual data, this base model natively integrates high-frequency biological time-series (TS) into the linguistic latent space.
It serves as a scalable template for researchers who wish to apply quad-modal reasoning to other non-human species or human clinical diagnostics.
- Model Type: Quad-modal Omni-MLLM (Text, Video, Audio, Time-Series)
- Base Backbones: MiniCPM-o 4.5 & Intern-S1 Pro (Scientific TS Encoders)
- License: Apache 2.0
Technical Specifications
Architectural "Model Surgery"
The base model is the result of deep architectural integration:
- Backbone: Utilizes MiniCPM-o 4.5 for core reasoning.
- TS Integration: Specialized scientific encoders from Intern-S1 Pro were grafted into the architecture via a custom-designed Linear Projection Layer.
- Tokenizer Expansion: The tokenizer was expanded to handle biological streams natively through the introduction of
<|ts_start|>,<|ts_unit|>, and<|ts_end|>control tokens.
Latent Space Alignment
This base version has not undergone initial alignment of the time series projector, to enable further development for other species.
Uses
Direct Use
- Research Foundation: A starting point for fine-tuning quad-modal models on different animal species (e.g., canines, primates, or endangered wildlife).
Out-of-Scope Use
- Direct Intent Inference: This base model has not been intent-aligned via the time series projector or the meow-10K dataset and may not provide accurate feline intent decoding without further training.
- Clinical Use: Not certified for immediate veterinary or medical diagnostic use.
π The Meow-Omni Ecosystem
To facilitate reproducibility and further research in computational ethology, we have released the following components:
- Main Model: Meow-Omni 1 β The full fine-tuned quad-modal MLLM aligned for intent.
- Training Dataset: Meow-10K β The 10,000 sample dataset used for training.
- Evaluation Benchmark: MeowBench β The expert-verified quad-modal benchmark suite.
Citation
Coming Soon.
- Downloads last month
- 25