Model Name Updates in five_step_analysis.ipynb

Changes Made

Updated the notebook to display cleaner, more readable model names in all plots while maintaining the correct cost lookups.

Before → After Transformations

Original Name	Display Name (with Version Date)
`anthropic/claude-haiku:latest`	Claude Haiku 4.5 (2025-10-01)
`anthropic/claude-opus:latest`	Claude Opus 4.1 (2025-08-05)
`anthropic/claude-sonnet:latest`	Claude Sonnet 4.5 (2025-09-29)
`claude-3-5-haiku-latest`	Claude 3.5 Haiku (2024-10-22)
`google/gemini:latest`	Gemini 2.5 Pro
`google/gemini-flash`	Gemini Flash
`gemini-2.0-flash-lite`	Gemini 2.0 Flash Lite
`openai/o:latest`	O3 (2025-04-16, Azure)
`openai/gpt-5`	GPT-5 (2025-08-07)
`openai/gpt-5-mini`	GPT-5 Mini
`openai/o3`	O3
`openai/o3-mini`	O3 Mini
`openai/o4-mini`	O4 Mini
`xai/grok:latest`	Grok-3
`xai/grok-mini`	Grok Mini
`xai/grok-code-fast-1`	Grok Code Fast 1
`aws/llama-4-maverick`	Llama-4 Maverick
`aws/llama-4-scout`	Llama-4 Scout
`gpt-oss-120b`	GPT-OSS-120B
`gpt-5-codex`	GPT-5 Codex
`deepseek-r1`	DeepSeek-R1
`gcp/qwen-3`	Qwen-3

Note: Version dates (e.g., 2025-10-01) reflect the actual underlying model versions discovered through CBORG API testing on October 29, 2025.

Technical Implementation

What Changed

Added MODEL_NAME_MAPPING dictionary based on CBORG API testing results
Added resolve_model_name() function to convert aliases to display names
Updated create_pair_label() to use resolved names instead of raw strings

What Stayed the Same

Cost tables still use original model names (correct behavior)
Data loading and filtering logic unchanged
Plot generation code unchanged
Cost calculations work correctly with original column values

Key Design Decision

The mapping only affects the pair column used for display in plots. The original supervisor and coder columns remain unchanged, ensuring cost lookups continue to work correctly:

# Cost lookup uses original columns (correct)
sup_model = row['supervisor']  # e.g., "anthropic/claude-haiku:latest"
sup_icost = input_cost.get(sup_model, 0)  # Finds correct price

# Display uses mapped pair column
pair_name = row['pair']  # e.g., "Claude Haiku 4.5"

Benefits

Clearer plot titles: "Claude Haiku 4.5" instead of "anthropic/claude-haiku:latest"
Easier comparison: Names highlight the actual model versions
Based on real data: Names reflect actual underlying models from CBORG API testing
Maintains correctness: Cost calculations still work properly with original names

Example Output

Before:

anthropic/claude-sonnet:latest
xai/grok:latest
openai/o:latest
openai/gpt-5

After (with version dates):

Claude Sonnet 4.5 (2025-09-29)
Grok-3
O3 (2025-04-16, Azure)
GPT-5 (2025-08-07)

Much more readable in plot titles and legends, with version dates showing exactly which model snapshot was used!