RadiPro Chatbot - Llama 3.2 1B Instruct
A fine-tuned conversational AI model based on Meta's Llama 3.2 1B Instruct, specifically designed to serve as a demonstration chatbot for an AI agency. This model enables clients to experience and interact with an AI chatbot, providing a tangible example of what AI-powered conversational interfaces can offer.
Overview
This model has been fine-tuned from Llama-3.2-1B-Instruct to function as an interactive chatbot demonstration tool. It serves as a showcase for potential clients, allowing them to experience firsthand how AI chatbots can engage in natural conversations and assist with various queries.
Model Details
- Base Model: Llama-3.2-1B-Instruct
- Model Type: Llama 3.2
- Parameters: 1 Billion
- Fine-tuning Purpose: Chatbot demonstration for AI agency clients
Purpose
This model is designed to:
- Demonstrate AI Capabilities: Show clients what AI chatbots can do in a hands-on, interactive way
- Provide User Experience: Allow potential clients to experience natural language interactions with AI
- Showcase Technology: Serve as a tangible example of the agency's AI development capabilities
- Engage Prospects: Create an engaging demonstration tool that helps clients understand the value of AI chatbots
Usage
Loading the Model
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "./radipro-chatbot-Llama-3.2-1B-Instruct"
# Auto-detect device & best dtype / attention
if torch.cuda.is_available():
device = "cuda"
dtype = torch.bfloat16
attn_impl = "flash_attention_2"
print(f"CUDA โ {torch.cuda.get_device_name(0)}")
elif torch.backends.mps.is_available():
device = "mps"
dtype = torch.float16
attn_impl = "eager"
print("MPS โ Apple Silicon")
else:
device = "cpu"
dtype = torch.float32
attn_impl = "eager"
print("CPU (slow)")
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(
model_name,
dtype=dtype,
attn_implementation=attn_impl if device != "cpu" else "eager",
low_cpu_mem_usage=True,
)
model = model.to(device)
Basic Chat Example
# Prepare chat messages
messages = [
{"role": "user", "content": "Hello! Can you tell me about AI chatbots?"}
]
# Apply chat template
input_text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
# Tokenize
inputs = tokenizer(input_text, return_tensors="pt").to(device)
# Generate response
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.6,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
# Decode response
response = tokenizer.decode(outputs[0], skip_special_tokens=False)
print("\nResponse:")
print(response)
Using the Chat Template
The model uses a Jinja2-based chat template that follows the Llama 3.2 format:
messages = [
{"role": "system", "content": "You are RadiPro Assistant, a helpful AI specialized in RadiPro's custom AI solutions for businesses. Only respond to questions related to RadiPro's services. For unrelated questions, politely redirect to company topics."},
{"role": "user", "content": "What can you help me with?"},
{"role": "assistant", "content": "RadiPro's AI services include custom fine-tuning, automations, and consulting tailored to your business needs. We discuss options for enhancing your digital footprint."},
{"role": "user", "content": "Tell me more about your capabilities."}
]
# Apply chat template
formatted = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
Files Included
config.json- Model configuration and architecture parametersgeneration_config.json- Default generation parameterstokenizer_config.json- Tokenizer configurationtokenizer.json- Tokenizer vocabulary and mergesspecial_tokens_map.json- Special token mappingschat_template.jinja- Chat template for conversation formattingmodel.safetensors- Model weights (safetensors format)model.safetensors.index.json- Model weights index file
Requirements
- Python 3.8+
- PyTorch 2.0+
- Transformers 4.45.0+
- GPU Options:
- CUDA-capable GPU (recommended for inference) - ~2-4 GB VRAM (for bfloat16 precision)
- Apple Silicon (M1/M2/M3/M4) with Metal Performance Shaders (MPS) support
Installation
Standard Installation
pip install torch transformers accelerate
GPU Support
For NVIDIA CUDA:
pip install torch transformers accelerate --index-url https://download.pytorch.org/whl/cu118
For Apple Silicon (M1/M2/M3/M4):
pip install torch transformers accelerate
PyTorch will automatically use Metal Performance Shaders (MPS) on Apple Silicon when available. Ensure you have macOS 12.3+ for MPS support.
Notes
- This model is optimized for conversational interactions and demonstration purposes
- The model uses bfloat16 precision for efficient inference
- The chat template follows Llama 3.2's instruction format with special tokens
- For best results, use the provided generation parameters (temperature=0.6, top_p=0.9)
License
Please refer to the base model's license (Llama 3.2 Community License) and ensure compliance with Meta's terms of use.
Disclaimer
This model is intended for demonstration purposes to showcase AI chatbot capabilities to potential clients. It should be used responsibly and in accordance with applicable AI usage guidelines and regulations.
- Downloads last month
- 3
Model tree for raditotev/radipro-chatbot-Llama-3.2-1B-Instruct
Base model
meta-llama/Llama-3.2-1B-Instruct