Model Name Updates in five_step_analysis.ipynb
Changes Made
Updated the notebook to display cleaner, more readable model names in all plots while maintaining the correct cost lookups.
Before → After Transformations
| Original Name | Display Name (with Version Date) |
|---|---|
anthropic/claude-haiku:latest |
Claude Haiku 4.5 (2025-10-01) |
anthropic/claude-opus:latest |
Claude Opus 4.1 (2025-08-05) |
anthropic/claude-sonnet:latest |
Claude Sonnet 4.5 (2025-09-29) |
claude-3-5-haiku-latest |
Claude 3.5 Haiku (2024-10-22) |
google/gemini:latest |
Gemini 2.5 Pro |
google/gemini-flash |
Gemini Flash |
gemini-2.0-flash-lite |
Gemini 2.0 Flash Lite |
openai/o:latest |
O3 (2025-04-16, Azure) |
openai/gpt-5 |
GPT-5 (2025-08-07) |
openai/gpt-5-mini |
GPT-5 Mini |
openai/o3 |
O3 |
openai/o3-mini |
O3 Mini |
openai/o4-mini |
O4 Mini |
xai/grok:latest |
Grok-3 |
xai/grok-mini |
Grok Mini |
xai/grok-code-fast-1 |
Grok Code Fast 1 |
aws/llama-4-maverick |
Llama-4 Maverick |
aws/llama-4-scout |
Llama-4 Scout |
gpt-oss-120b |
GPT-OSS-120B |
gpt-5-codex |
GPT-5 Codex |
deepseek-r1 |
DeepSeek-R1 |
gcp/qwen-3 |
Qwen-3 |
Note: Version dates (e.g., 2025-10-01) reflect the actual underlying model versions discovered through CBORG API testing on October 29, 2025.
Technical Implementation
What Changed
- Added
MODEL_NAME_MAPPINGdictionary based on CBORG API testing results - Added
resolve_model_name()function to convert aliases to display names - Updated
create_pair_label()to use resolved names instead of raw strings
What Stayed the Same
- Cost tables still use original model names (correct behavior)
- Data loading and filtering logic unchanged
- Plot generation code unchanged
- Cost calculations work correctly with original column values
Key Design Decision
The mapping only affects the pair column used for display in plots. The original supervisor and coder columns remain unchanged, ensuring cost lookups continue to work correctly:
# Cost lookup uses original columns (correct)
sup_model = row['supervisor'] # e.g., "anthropic/claude-haiku:latest"
sup_icost = input_cost.get(sup_model, 0) # Finds correct price
# Display uses mapped pair column
pair_name = row['pair'] # e.g., "Claude Haiku 4.5"
Benefits
- Clearer plot titles: "Claude Haiku 4.5" instead of "anthropic/claude-haiku:latest"
- Easier comparison: Names highlight the actual model versions
- Based on real data: Names reflect actual underlying models from CBORG API testing
- Maintains correctness: Cost calculations still work properly with original names
Example Output
Before:
anthropic/claude-sonnet:latestxai/grok:latestopenai/o:latestopenai/gpt-5
After (with version dates):
Claude Sonnet 4.5 (2025-09-29)Grok-3O3 (2025-04-16, Azure)GPT-5 (2025-08-07)
Much more readable in plot titles and legends, with version dates showing exactly which model snapshot was used!