Model Card for Meow-Omni 1-Base

Meow-Omni 1-Base is the foundational quad-modal Multimodal Large Language Model (MLLM) architecture. It represents the successful "model surgery" and latent-space alignment of text, vision, audio, and biological time-series, prior to specific fine-tuning for feline intention decoding.

Model Description

Meow-Omni 1-Base was engineered to bridge the "modality gap" in foundation models. While standard MLLMs are restricted to audio-visual data, this base model natively integrates high-frequency biological time-series (TS) into the linguistic latent space.

It serves as a scalable template for researchers who wish to apply quad-modal reasoning to other non-human species or human clinical diagnostics.

Model Type: Quad-modal Omni-MLLM (Text, Video, Audio, Time-Series)
Base Backbones: MiniCPM-o 4.5 & Intern-S1 Pro (Scientific TS Encoders)
License: Apache 2.0

Technical Specifications

Architectural "Model Surgery"

The base model is the result of deep architectural integration:

Backbone: Utilizes MiniCPM-o 4.5 for core reasoning.
TS Integration: Specialized scientific encoders from Intern-S1 Pro were grafted into the architecture via a custom-designed Linear Projection Layer.
Tokenizer Expansion: The tokenizer was expanded to handle biological streams natively through the introduction of <|ts_start|>, <|ts_unit|>, and <|ts_end|> control tokens.

Latent Space Alignment

This base version has not undergone initial alignment of the time series projector, to enable further development for other species.

Uses

Direct Use

Research Foundation: A starting point for fine-tuning quad-modal models on different animal species (e.g., canines, primates, or endangered wildlife).

Out-of-Scope Use

Direct Intent Inference: This base model has not been intent-aligned via the time series projector or the meow-10K dataset and may not provide accurate feline intent decoding without further training.
Clinical Use: Not certified for immediate veterinary or medical diagnostic use.

🔗 The Meow-Omni Ecosystem

To facilitate reproducibility and further research in computational ethology, we have released the following components:

Main Model: Meow-Omni 1 — The full fine-tuned quad-modal MLLM aligned for intent.
Training Dataset: Meow-10K — The 10,000 sample dataset used for training.
Evaluation Benchmark: MeowBench — The expert-verified quad-modal benchmark suite.

Citation

Coming Soon.

Downloads last month: 25

Safetensors

Model size

10B params

Tensor type

F32

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support