Instructions to use nferruz/ProtGPT2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nferruz/ProtGPT2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="nferruz/ProtGPT2")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("nferruz/ProtGPT2") model = AutoModelForCausalLM.from_pretrained("nferruz/ProtGPT2") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use nferruz/ProtGPT2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nferruz/ProtGPT2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nferruz/ProtGPT2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/nferruz/ProtGPT2
- SGLang
How to use nferruz/ProtGPT2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nferruz/ProtGPT2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nferruz/ProtGPT2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "nferruz/ProtGPT2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nferruz/ProtGPT2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use nferruz/ProtGPT2 with Docker Model Runner:
docker model run hf.co/nferruz/ProtGPT2
Making ProtGPT2-medium and ProtGPT2-small available?
@nferruz Hi Noelia,
Will consider also having ProtGPT2-medium and ProtGPT2-small, please?
This will be of great help to people who want to debug or don't have large-capacity GPU machines.
Currently with this parameter running on AWS p3.16xlarge, the program crash with a CUDA memory error.
AWS EC2 p3.16xlarge instance type is powered by 8 NVIDIA Tesla V100 GPUs, each with 16 GB of GPU memory.
In total, the p3.16xlarge instance provides 128 GB of GPU memory
Do you have any suggest what parameter I can use to avoid that?
TRAINING_FILE="data/ha_filtered_108k.train.gpt2_format.txt" # 80K lines
VALIDATION_FILE="data/ha_filtered_108k.validation.gpt2_format.txt" # 20K lines
MODEL_OUTPUT_DIR="gpt2_model/ha_filtered_108k"
python run_clm.py --model_name_or_path nferruz/ProtGPT2 \
--train_file ${TRAINING_FILE} \
--validation_file ${VALIDATION_FILE} \
--tokenizer_name nferruz/ProtGPT2 \
--do_train \
--do_eval \
--output_dir ${MODEL_OUTPUT_DIR} \
--overwrite_output_dir \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps=16 \
--fp16 \
--learning_rate 1e-06
Sincerely,
Littleworth
Hi Littleworth,
I never trained a small or medium version, I am afraid. I trained another model but it is even bigger.
Sorry I don't have better news for now!
@nferruz Hi Noelia,
Thanks. I finally managed to get it running with the help of DeepSpeed.
Here is the full code:
#!/bin/bash
export LC_ALL=C
TRAINING_FILE="data/ha_filtered_108k.train.gpt2_format.txt"
VALIDATION_FILE="data/ha_filtered_108k.validation.gpt2_format.txt"
MODEL_OUTPUT_DIR="gpt2_model/ha_filtered_108k"
DS_CONFIG_FILE="ds_config.json"
/home/ubuntu/storage1/conda_envs/py38/bin/deepspeed --num_gpus=8 run_clm.py --model_name_or_path nferruz/ProtGPT2 \
--train_file ${TRAINING_FILE} \
--validation_file ${VALIDATION_FILE} \
--tokenizer_name nferruz/ProtGPT2 \
--do_train \
--do_eval \
--output_dir ${MODEL_OUTPUT_DIR} \
--overwrite_output_dir \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps=16 \
--fp16 \
--learning_rate 1e-06 \
--deepspeed ${DS_CONFIG_FILE}
And ds_config.json file content is:
{
"fp16": {
"enabled": true
},
"zero_optimization": {
"stage": 2,
"allgather_partitions": true,
"allgather_bucket_size": 2e8,
"overlap_comm": true,
"reduce_scatter": true,
"reduce_bucket_size": 2e8,
"contiguous_gradients": true
},
"optimizer": {
"type": "AdamW",
"params": {
"lr": 1e-6,
"betas": [
0.9,
0.999
],
"eps": 1e-8,
"weight_decay": 0
}
},
"scheduler": {
"type": "WarmupLR",
"params": {
"warmup_min_lr": 0,
"warmup_max_lr": 1e-6,
"warmup_num_steps": "auto"
}
},
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": 1,
"gradient_accumulation_steps": 16,
"gradient_clipping": 1.0
}
Everything is completed in less than 10 minutes with p3.16xlarge.
Hope this information will help others.
Regards,
littleworth
hi, thank you for sharing all these tricks. may i ask are you still using the 8x v100 GPUs with 16gb in this case with DeepSpeed? Thanks!