RadiPro Chatbot - Llama 3.2 1B Instruct

A fine-tuned conversational AI model based on Meta's Llama 3.2 1B Instruct, specifically designed to serve as a demonstration chatbot for an AI agency. This model enables clients to experience and interact with an AI chatbot, providing a tangible example of what AI-powered conversational interfaces can offer.

Overview

This model has been fine-tuned from Llama-3.2-1B-Instruct to function as an interactive chatbot demonstration tool. It serves as a showcase for potential clients, allowing them to experience firsthand how AI chatbots can engage in natural conversations and assist with various queries.

Model Details

Base Model: Llama-3.2-1B-Instruct
Model Type: Llama 3.2
Parameters: 1 Billion
Fine-tuning Purpose: Chatbot demonstration for AI agency clients

Purpose

This model is designed to:

Demonstrate AI Capabilities: Show clients what AI chatbots can do in a hands-on, interactive way
Provide User Experience: Allow potential clients to experience natural language interactions with AI
Showcase Technology: Serve as a tangible example of the agency's AI development capabilities
Engage Prospects: Create an engaging demonstration tool that helps clients understand the value of AI chatbots

Usage

Loading the Model

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "./radipro-chatbot-Llama-3.2-1B-Instruct"

# Auto-detect device & best dtype / attention
if torch.cuda.is_available():
    device = "cuda"
    dtype = torch.bfloat16
    attn_impl   = "flash_attention_2"
    print(f"CUDA → {torch.cuda.get_device_name(0)}")
elif torch.backends.mps.is_available():
    device = "mps"
    dtype = torch.float16
    attn_impl   = "eager"
    print("MPS → Apple Silicon")
else:
    device = "cpu"
    dtype = torch.float32
    attn_impl   = "eager"
    print("CPU (slow)")

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    dtype=dtype,
    attn_implementation=attn_impl if device != "cpu" else "eager",
    low_cpu_mem_usage=True,
)

model = model.to(device)

Basic Chat Example

# Prepare chat messages
messages = [
    {"role": "user", "content": "Hello! Can you tell me about AI chatbots?"}
]

# Apply chat template
input_text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

# Tokenize
inputs = tokenizer(input_text, return_tensors="pt").to(device)

# Generate response
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.6,
        top_p=0.9,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

# Decode response
response = tokenizer.decode(outputs[0], skip_special_tokens=False)
print("\nResponse:")
print(response)

Using the Chat Template

The model uses a Jinja2-based chat template that follows the Llama 3.2 format:

messages = [
    {"role": "system", "content": "You are RadiPro Assistant, a helpful AI specialized in RadiPro's custom AI solutions for businesses. Only respond to questions related to RadiPro's services. For unrelated questions, politely redirect to company topics."},
    {"role": "user", "content": "What can you help me with?"},
    {"role": "assistant", "content": "RadiPro's AI services include custom fine-tuning, automations, and consulting tailored to your business needs. We discuss options for enhancing your digital footprint."},
    {"role": "user", "content": "Tell me more about your capabilities."}
]

# Apply chat template
formatted = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

Files Included

config.json - Model configuration and architecture parameters
generation_config.json - Default generation parameters
tokenizer_config.json - Tokenizer configuration
tokenizer.json - Tokenizer vocabulary and merges
special_tokens_map.json - Special token mappings
chat_template.jinja - Chat template for conversation formatting
model.safetensors - Model weights (safetensors format)
model.safetensors.index.json - Model weights index file

Requirements

Python 3.8+
PyTorch 2.0+
Transformers 4.45.0+
GPU Options:
- CUDA-capable GPU (recommended for inference) - ~2-4 GB VRAM (for bfloat16 precision)
- Apple Silicon (M1/M2/M3/M4) with Metal Performance Shaders (MPS) support

Installation

Standard Installation

pip install torch transformers accelerate

GPU Support

For NVIDIA CUDA:

pip install torch transformers accelerate --index-url https://download.pytorch.org/whl/cu118

For Apple Silicon (M1/M2/M3/M4):

pip install torch transformers accelerate

PyTorch will automatically use Metal Performance Shaders (MPS) on Apple Silicon when available. Ensure you have macOS 12.3+ for MPS support.

Notes

This model is optimized for conversational interactions and demonstration purposes
The model uses bfloat16 precision for efficient inference
The chat template follows Llama 3.2's instruction format with special tokens
For best results, use the provided generation parameters (temperature=0.6, top_p=0.9)

License

Please refer to the base model's license (Llama 3.2 Community License) and ensure compliance with Meta's terms of use.

Disclaimer

This model is intended for demonstration purposes to showcase AI chatbot capabilities to potential clients. It should be used responsibly and in accordance with applicable AI usage guidelines and regulations.

Downloads last month: 3

Safetensors

Model size

1B params

Tensor type

BF16

Model tree for raditotev/radipro-chatbot-Llama-3.2-1B-Instruct

Base model

meta-llama/Llama-3.2-1B-Instruct

Finetuned

(1216)

this model