Spaces:
Running
on
Zero
Running
on
Zero
SmolLM3 Features Implementation
This document describes the SmolLM3 features implemented in the Petite Elle L'Aime 3 chat interface.
๐ฏ SmolLM3 Features
1. Thinking Mode
SmolLM3 supports extended thinking mode with reasoning traces. The implementation includes:
- Automatic thinking flags: System prompts automatically get
/thinkor/no_thinkflags - Manual control: Users can manually add thinking flags to system prompts
- UI toggle: Checkbox to enable/disable thinking mode
- Response cleaning: Thinking tags are properly cleaned from responses
Usage Examples:
# With thinking enabled (default)
system_prompt = "Tu es TonicIA, un assistant francophone rigoureux et bienveillant./think"
# With thinking disabled
system_prompt = "Tu es TonicIA, un assistant francophone rigoureux et bienveillant./no_think"
# Manual control in UI
enable_thinking = True # or False
2. Tool Calling
SmolLM3 supports both XML and Python tool calling formats:
XML Tools (Default)
[
{
"name": "get_weather",
"description": "Get the weather in a city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city to get the weather for"
}
}
}
}
]
Python Tools
# Tools are called as Python functions in <code> tags
# Example: <code>get_weather(city="Paris")</code>
3. Generation Parameters
Following SmolLM3 recommendations:
- Temperature: 0.6 (recommended default)
- Top-p: 0.95 (recommended default)
- Repetition Penalty: 1.1 (recommended default)
- Max tokens: 2048 (configurable up to 32,768)
- Context length: Up to 65,536 tokens (extensible to 128k/256k with YaRN)
4. Long Context Processing
The model supports:
- Base context: 65,536 tokens
- Extended context: Up to 256k tokens with YaRN scaling
- YaRN configuration: Available for longer inputs
{
"rope_scaling": {
"factor": 2.0,
"original_max_position_embeddings": 65536,
"type": "yarn"
}
}
๐ง Implementation Details
Chat Template Integration
The app uses SmolLM3's chat template with proper thinking and tool calling:
def create_prompt(system_message, user_message, enable_thinking=True, tools=None, use_xml_tools=True):
formatted_messages = []
# Handle thinking flags
if system_message and system_message.strip():
has_think_flag = "/think" in system_message
has_no_think_flag = "/no_think" in system_message
if not enable_thinking and not has_no_think_flag:
system_message += "/no_think"
elif enable_thinking and not has_think_flag and not has_no_think_flag:
system_message += "/think"
formatted_messages.append({"role": "system", "content": system_message})
formatted_messages.append({"role": "user", "content": user_message})
# Apply chat template with SmolLM3 features
template_kwargs = {
"tokenize": False,
"add_generation_prompt": True,
"enable_thinking": enable_thinking
}
# Add tool calling
if tools and len(tools) > 0:
if use_xml_tools:
template_kwargs["xml_tools"] = tools
else:
template_kwargs["python_tools"] = tools
return tokenizer.apply_chat_template(formatted_messages, **template_kwargs)
Tool Call Detection
The app detects and formats tool calls in responses:
# Handle tool calls if present
if parsed_tools and ("<tool_call>" in assistant_response or "<code>" in assistant_response):
if "<tool_call>" in assistant_response:
tool_call_match = re.search(r'<tool_call>(.*?)</tool_call>', assistant_response, re.DOTALL)
if tool_call_match:
tool_call = tool_call_match.group(1)
assistant_response += f"\n\n๐ง Tool Call Detected: {tool_call}\n\nNote: This is a simulated tool call."
elif "<code>" in assistant_response:
code_match = re.search(r'<code>(.*?)</code>', assistant_response, re.DOTALL)
if code_match:
code_call = code_match.group(1)
assistant_response += f"\n\n๐ Python Tool Call: {code_call}\n\nNote: This is a simulated Python tool call."
๐ฎ UI Features
Advanced Settings Panel
- Temperature slider: 0.01 to 1.0 (default: 0.6)
- Top-p slider: 0.1 to 1.0 (default: 0.95)
- Repetition Penalty slider: 1.0 to 2.0 (default: 1.1)
- Max length slider: 10 to 32,768 tokens (default: 2048)
- Thinking mode checkbox: Enable/disable reasoning traces
- Tool calling checkbox: Enable/disable function calling
- XML vs Python tools: Choose tool calling format
- Tool definition editor: JSON editor for custom tools
Default Tool Set
The app includes two default tools for demonstration:
- get_weather: Get weather information for a city
- calculate: Perform mathematical calculations
๐ Usage Examples
Basic Chat with Thinking
system_prompt = "Tu es TonicIA, un assistant francophone rigoureux et bienveillant./think"
user_message = "Explique-moi la gravitรฉ en termes simples."
Chat with Tool Calling
tools = [
{
"name": "get_weather",
"description": "Get the weather in a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "The city name"}
}
}
}
]
user_message = "Quel temps fait-il ร Paris aujourd'hui?"
Agentic Usage
# The model can call tools automatically based on user requests
# Example: "Calculate 15 * 23" will trigger the calculate tool
# Example: "What's the weather in London?" will trigger the get_weather tool
๐ Requirements
- Transformers: v4.53.0+ (required for SmolLM3 support)
- PyTorch: Latest version
- Gradio: For the web interface
- Hugging Face Spaces: For deployment
๐ Migration from Previous Version
The updated app includes:
- SmolLM3-compatible generation parameters
- Thinking mode with proper flag handling
- Tool calling support (XML and Python)
- Extended context support
- Improved response cleaning
๐ฏ Best Practices
- Use recommended parameters: temperature=0.6, top_p=0.95, repetition_penalty=1.1
- Enable thinking for complex reasoning tasks
- Use tool calling for structured tasks
- Keep context within limits: 65k tokens base, 256k with YaRN
- Test tool definitions before deployment
- Adjust repetition penalty: Use 1.0-1.2 for creative tasks, 1.1-1.3 for factual responses