Model Card for Meow-Omni 1-Base

Meow-Omni 1-Base is the foundational quad-modal Multimodal Large Language Model (MLLM) architecture. It represents the successful "model surgery" and latent-space alignment of text, vision, audio, and biological time-series, prior to specific fine-tuning for feline intention decoding.

Model Description

Meow-Omni 1-Base was engineered to bridge the "modality gap" in foundation models. While standard MLLMs are restricted to audio-visual data, this base model natively integrates high-frequency biological time-series (TS) into the linguistic latent space.

It serves as a scalable template for researchers who wish to apply quad-modal reasoning to other non-human species or human clinical diagnostics.

  • Model Type: Quad-modal Omni-MLLM (Text, Video, Audio, Time-Series)
  • Base Backbones: MiniCPM-o 4.5 & Intern-S1 Pro (Scientific TS Encoders)
  • License: Apache 2.0

Technical Specifications

Architectural "Model Surgery"

The base model is the result of deep architectural integration:

  1. Backbone: Utilizes MiniCPM-o 4.5 for core reasoning.
  2. TS Integration: Specialized scientific encoders from Intern-S1 Pro were grafted into the architecture via a custom-designed Linear Projection Layer.
  3. Tokenizer Expansion: The tokenizer was expanded to handle biological streams natively through the introduction of <|ts_start|>, <|ts_unit|>, and <|ts_end|> control tokens.

Latent Space Alignment

This base version has not undergone initial alignment of the time series projector, to enable further development for other species.

Uses

Direct Use

  • Research Foundation: A starting point for fine-tuning quad-modal models on different animal species (e.g., canines, primates, or endangered wildlife).

Out-of-Scope Use

  • Direct Intent Inference: This base model has not been intent-aligned via the time series projector or the meow-10K dataset and may not provide accurate feline intent decoding without further training.
  • Clinical Use: Not certified for immediate veterinary or medical diagnostic use.

πŸ”— The Meow-Omni Ecosystem

To facilitate reproducibility and further research in computational ethology, we have released the following components:

  • Main Model: Meow-Omni 1 β€” The full fine-tuned quad-modal MLLM aligned for intent.
  • Training Dataset: Meow-10K β€” The 10,000 sample dataset used for training.
  • Evaluation Benchmark: MeowBench β€” The expert-verified quad-modal benchmark suite.

Citation

Coming Soon.

Downloads last month
25
Safetensors
Model size
10B params
Tensor type
F32
Β·
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support