Instructions to use tencent/Hy3-preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use tencent/Hy3-preview with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="tencent/Hy3-preview") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("tencent/Hy3-preview") model = AutoModelForCausalLM.from_pretrained("tencent/Hy3-preview") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use tencent/Hy3-preview with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "tencent/Hy3-preview" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tencent/Hy3-preview", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/tencent/Hy3-preview
- SGLang
How to use tencent/Hy3-preview with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "tencent/Hy3-preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tencent/Hy3-preview", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "tencent/Hy3-preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tencent/Hy3-preview", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use tencent/Hy3-preview with Docker Model Runner:
docker model run hf.co/tencent/Hy3-preview
Special Token Disaster: Your Tech Lead Has Zero Design Taste
chat_template.jinja
{#- ----------‑‑‑ special token variables ‑‑‑---------- -#}
{%- set bos_token = '<|hy_begin▁of▁sentence|>' %}
{%- set pad_token = '<|hy_▁pad▁|>' %}
{%- set user_token = '<|hy_User|>' %}
See <|hy_begin▁of▁sentence|> and <|hy_▁pad▁|>, | and | and _ and ▁
The special token design for Hunyuan is a visual disaster that screams "zero design taste" from leadership. Using a bloated mix of fullwidth pipes (|) and obscure geometric blocks (▁) doesn't make the model look high-tech, it makes it look like a corrupted encoding error.
A tech leader with an actual eye for polish understands that functional infrastructure should be clean and harmonious. Instead, we got a syntax that creates jagged, uneven visual noise.
Please take back your shit.
That the hell lol, thank you for sharing such a nice model <3 don’t listen here
Hi, thanks for the feedback!
The special token design using fullwidth pipes (|) and block characters (▁) is actually an intentional engineering decision rather than an oversight. During pretraining and continual training, the model is trained on massive, diverse corpora where conventional special tokens like or <|im_start|> frequently appear as plain text. These collisions make it ambiguous whether a token is a genuine control signal or just content, which can degrade model behavior. Using visually distinctive Unicode characters significantly reduces collision probability and ensures a clean separation between control tokens and content.
It's also worth noting that these special tokens are handled internally by the tokenizer and chat template, so they should be completely transparent to end users and developers during normal usage — you won't need to type or deal with them directly.
That said, we completely understand the ergonomic concerns. The current token set in Hy3-preview prioritizes robustness, but we're actively working on an optimized version in a future release that better balances collision resistance with readability and developer experience. Stay tuned!
Thanks again for the candid feedback — it's genuinely appreciated.
Hi, thanks for the feedback!
The special token design using fullwidth pipes (|) and block characters (▁) is actually an intentional engineering decision rather than an oversight. During pretraining and continual training, the model is trained on massive, diverse corpora where conventional special tokens like or <|im_start|> frequently appear as plain text. These collisions make it ambiguous whether a token is a genuine control signal or just content, which can degrade model behavior. Using visually distinctive Unicode characters significantly reduces collision probability and ensures a clean separation between control tokens and content.
It's also worth noting that these special tokens are handled internally by the tokenizer and chat template, so they should be completely transparent to end users and developers during normal usage — you won't need to type or deal with them directly.
That said, we completely understand the ergonomic concerns. The current token set in Hy3-preview prioritizes robustness, but we're actively working on an optimized version in a future release that better balances collision resistance with readability and developer experience. Stay tuned!
Thanks again for the candid feedback — it's genuinely appreciated.
totally unconvincing, when you see "<|im_start|>" in your pre-training corpus, you should parse that data and convert it to conversational format.
@yiqichen01
It is possible to rename special tokens without impacting the model at all, by modifying the tokenizer.json files! It is the token ID that matters, as this is a special token without any merges.
As long as any downstream users do not use their own chat template, this is a non-breaking change.
It may be best to consider this in the next preview or the full release of the model, compared to breaking the model in a new rev.