HY-MT1.5-1.8B_GPTQ_INT4
This version of HY-MT1.5-1.8B_GPTQ_INT4 has been converted to run on the Axera NPU using w4a16 quantization.
This model has been optimized with the following LoRA:
Compatible with Pulsar2 version: > 5.1-patch1-dirty.
Please note that the context of the model is 2k and the maximum prefill length is 1k.
Convert tools links:
For those who are interested in model conversion, you can try to export axmodel through the original repo:
https://huggingface.co/tencent/HY-MT1.5-1.8B
How to Convert LLM from Huggingface to axmodel
Support Platform
AX650
- AX650N DEMO Board
- M4N-Dock (爱芯派 Pro)
- M.2 Accelerator card
AX620E
- AX620E DEMO Board
| Chips | ttft | w4a16 |
|---|---|---|
| AX650 | 1514.3 ms (1k prefill) | 15.7 tokens/sec |
| AX620E | 11538.6 ms (512 prefill) | 4.05 tokens/sec |
How to use
安装 axllm
方式一:克隆仓库后执行安装脚本:
git clone -b axllm https://github.com/AXERA-TECH/ax-llm.git
cd ax-llm
./install.sh
方式二:一行命令安装(默认分支 axllm):
curl -fsSL https://raw.githubusercontent.com/AXERA-TECH/ax-llm/axllm/install.sh | bash
方式三:下载Github Actions CI 导出的可执行程序(适合没有编译环境的用户):
如果没有编译环境,请到:
https://github.com/AXERA-TECH/ax-llm/actions?query=branch%3Aaxllm
下载 最新 CI 导出的可执行程序(axllm),然后:
chmod +x axllm
sudo mv axllm /usr/bin/axllm
模型下载(Hugging Face)
先创建模型目录并进入,然后下载到该目录:
mkdir -p AXERA-TECH/HY-MT1.5-1.8B
cd AXERA-TECH/HY-MT1.5-1.8B
hf download AXERA-TECH/HY-MT1.5-1.8B --local-dir .
运行(CLI)
axllm run AXERA-TECH/HY-MT1.5-1.8B
启动服务(OpenAI 兼容)
axllm serve AXERA-TECH/HY-MT1.5-1.8B
OpenAI 调用示例
from openai import OpenAI
API_URL = "http://127.0.0.1:8000/v1"
MODEL = "AXERA-TECH/HY-MT1.5-1.8B"
messages = [
{"role": "system", "content": [{"type": "text", "text": "you are a helpful assistant."}]},
{"role": "user", "content": "hello"},
]
client = OpenAI(api_key="not-needed", base_url=API_URL)
completion = client.chat.completions.create(
model=MODEL,
messages=messages,
)
print(completion.choices[0].message.content)
OpenAI 流式调用示例
from openai import OpenAI
API_URL = "http://127.0.0.1:8000/v1"
MODEL = "AXERA-TECH/HY-MT1.5-1.8B"
messages = [
{"role": "system", "content": [{"type": "text", "text": "you are a helpful assistant."}]},
{"role": "user", "content": "hello"},
]
client = OpenAI(api_key="not-needed", base_url=API_URL)
stream = client.chat.completions.create(
model=MODEL,
messages=messages,
stream=True,
)
print("assistant:")
for ev in stream:
delta = getattr(ev.choices[0], "delta", None)
if delta and getattr(delta, "content", None):
print(delta.content, end="", flush=True)
print("
")
- Downloads last month
- 26
Model tree for AXERA-TECH/HY-MT1.5-1.8B_GPTQ_INT4
Base model
tencent/HY-MT1.5-1.8B