HY-MT1.5-1.8B_GPTQ_INT4

This version of HY-MT1.5-1.8B_GPTQ_INT4 has been converted to run on the Axera NPU using w4a16 quantization.

This model has been optimized with the following LoRA:

Compatible with Pulsar2 version: > 5.1-patch1-dirty.

Please note that the context of the model is 2k and the maximum prefill length is 1k.

Convert tools links:

For those who are interested in model conversion, you can try to export axmodel through the original repo:

https://huggingface.co/tencent/HY-MT1.5-1.8B

How to Convert LLM from Huggingface to axmodel

AXera NPU HOST LLM Runtime

AXera NPU AXCL LLM Runtime

Support Platform

AX650
- AX650N DEMO Board
- M4N-Dock (爱芯派 Pro)
- M.2 Accelerator card
AX620E
- AX620E DEMO Board

Chips	ttft	w4a16
AX650	1514.3 ms (1k prefill)	15.7 tokens/sec
AX620E	11538.6 ms (512 prefill)	4.05 tokens/sec

How to use

安装 axllm

方式一：克隆仓库后执行安装脚本：

git clone -b axllm https://github.com/AXERA-TECH/ax-llm.git
cd ax-llm
./install.sh

方式二：一行命令安装（默认分支 axllm）：

curl -fsSL https://raw.githubusercontent.com/AXERA-TECH/ax-llm/axllm/install.sh | bash

方式三：下载Github Actions CI 导出的可执行程序（适合没有编译环境的用户）：

如果没有编译环境，请到： https://github.com/AXERA-TECH/ax-llm/actions?query=branch%3Aaxllm 下载 最新 CI 导出的可执行程序（axllm），然后：

chmod +x axllm
sudo mv axllm /usr/bin/axllm

模型下载（Hugging Face）

mkdir -p AXERA-TECH/HY-MT1.5-1.8B
cd AXERA-TECH/HY-MT1.5-1.8B
hf download AXERA-TECH/HY-MT1.5-1.8B --local-dir .

运行（CLI）

axllm run AXERA-TECH/HY-MT1.5-1.8B

启动服务（OpenAI 兼容）

axllm serve AXERA-TECH/HY-MT1.5-1.8B

OpenAI 调用示例

from openai import OpenAI

API_URL = "http://127.0.0.1:8000/v1"
MODEL = "AXERA-TECH/HY-MT1.5-1.8B"

messages = [
    {"role": "system", "content": [{"type": "text", "text": "you are a helpful assistant."}]},
    {"role": "user", "content": "hello"},
]

client = OpenAI(api_key="not-needed", base_url=API_URL)
completion = client.chat.completions.create(
    model=MODEL,
    messages=messages,
)

print(completion.choices[0].message.content)

OpenAI 流式调用示例

from openai import OpenAI

API_URL = "http://127.0.0.1:8000/v1"
MODEL = "AXERA-TECH/HY-MT1.5-1.8B"

messages = [
    {"role": "system", "content": [{"type": "text", "text": "you are a helpful assistant."}]},
    {"role": "user", "content": "hello"},
]

client = OpenAI(api_key="not-needed", base_url=API_URL)
stream = client.chat.completions.create(
    model=MODEL,
    messages=messages,
    stream=True,
)

print("assistant:")
for ev in stream:
    delta = getattr(ev.choices[0], "delta", None)
    if delta and getattr(delta, "content", None):
        print(delta.content, end="", flush=True)
print("
")

Downloads last month: 26

Model tree for AXERA-TECH/HY-MT1.5-1.8B_GPTQ_INT4

Base model

tencent/HY-MT1.5-1.8B

Finetuned

(14)

this model