RadiPro Chatbot - Llama 3.2 1B Instruct

A fine-tuned conversational AI model based on Meta's Llama 3.2 1B Instruct, specifically designed to serve as a demonstration chatbot for an AI agency. This model enables clients to experience and interact with an AI chatbot, providing a tangible example of what AI-powered conversational interfaces can offer.

Overview

This model has been fine-tuned from Llama-3.2-1B-Instruct to function as an interactive chatbot demonstration tool. It serves as a showcase for potential clients, allowing them to experience firsthand how AI chatbots can engage in natural conversations and assist with various queries.

Model Details

  • Base Model: Llama-3.2-1B-Instruct
  • Model Type: Llama 3.2
  • Parameters: 1 Billion
  • Fine-tuning Purpose: Chatbot demonstration for AI agency clients

Purpose

This model is designed to:

  1. Demonstrate AI Capabilities: Show clients what AI chatbots can do in a hands-on, interactive way
  2. Provide User Experience: Allow potential clients to experience natural language interactions with AI
  3. Showcase Technology: Serve as a tangible example of the agency's AI development capabilities
  4. Engage Prospects: Create an engaging demonstration tool that helps clients understand the value of AI chatbots

Usage

Loading the Model

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "./radipro-chatbot-Llama-3.2-1B-Instruct"

# Auto-detect device & best dtype / attention
if torch.cuda.is_available():
    device = "cuda"
    dtype = torch.bfloat16
    attn_impl   = "flash_attention_2"
    print(f"CUDA โ†’ {torch.cuda.get_device_name(0)}")
elif torch.backends.mps.is_available():
    device = "mps"
    dtype = torch.float16
    attn_impl   = "eager"
    print("MPS โ†’ Apple Silicon")
else:
    device = "cpu"
    dtype = torch.float32
    attn_impl   = "eager"
    print("CPU (slow)")

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    dtype=dtype,
    attn_implementation=attn_impl if device != "cpu" else "eager",
    low_cpu_mem_usage=True,
)

model = model.to(device)

Basic Chat Example

# Prepare chat messages
messages = [
    {"role": "user", "content": "Hello! Can you tell me about AI chatbots?"}
]

# Apply chat template
input_text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

# Tokenize
inputs = tokenizer(input_text, return_tensors="pt").to(device)

# Generate response
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.6,
        top_p=0.9,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

# Decode response
response = tokenizer.decode(outputs[0], skip_special_tokens=False)
print("\nResponse:")
print(response)

Using the Chat Template

The model uses a Jinja2-based chat template that follows the Llama 3.2 format:

messages = [
    {"role": "system", "content": "You are RadiPro Assistant, a helpful AI specialized in RadiPro's custom AI solutions for businesses. Only respond to questions related to RadiPro's services. For unrelated questions, politely redirect to company topics."},
    {"role": "user", "content": "What can you help me with?"},
    {"role": "assistant", "content": "RadiPro's AI services include custom fine-tuning, automations, and consulting tailored to your business needs. We discuss options for enhancing your digital footprint."},
    {"role": "user", "content": "Tell me more about your capabilities."}
]

# Apply chat template
formatted = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

Files Included

  • config.json - Model configuration and architecture parameters
  • generation_config.json - Default generation parameters
  • tokenizer_config.json - Tokenizer configuration
  • tokenizer.json - Tokenizer vocabulary and merges
  • special_tokens_map.json - Special token mappings
  • chat_template.jinja - Chat template for conversation formatting
  • model.safetensors - Model weights (safetensors format)
  • model.safetensors.index.json - Model weights index file

Requirements

  • Python 3.8+
  • PyTorch 2.0+
  • Transformers 4.45.0+
  • GPU Options:
    • CUDA-capable GPU (recommended for inference) - ~2-4 GB VRAM (for bfloat16 precision)
    • Apple Silicon (M1/M2/M3/M4) with Metal Performance Shaders (MPS) support

Installation

Standard Installation

pip install torch transformers accelerate

GPU Support

For NVIDIA CUDA:

pip install torch transformers accelerate --index-url https://download.pytorch.org/whl/cu118

For Apple Silicon (M1/M2/M3/M4):

pip install torch transformers accelerate

PyTorch will automatically use Metal Performance Shaders (MPS) on Apple Silicon when available. Ensure you have macOS 12.3+ for MPS support.

Notes

  • This model is optimized for conversational interactions and demonstration purposes
  • The model uses bfloat16 precision for efficient inference
  • The chat template follows Llama 3.2's instruction format with special tokens
  • For best results, use the provided generation parameters (temperature=0.6, top_p=0.9)

License

Please refer to the base model's license (Llama 3.2 Community License) and ensure compliance with Meta's terms of use.

Disclaimer

This model is intended for demonstration purposes to showcase AI chatbot capabilities to potential clients. It should be used responsibly and in accordance with applicable AI usage guidelines and regulations.

Downloads last month
3
Safetensors
Model size
1B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for raditotev/radipro-chatbot-Llama-3.2-1B-Instruct

Finetuned
(1216)
this model