LLM4HEP / MODEL_NAME_UPDATES.md
ho22joshua's picture
initial commit
cfcbbc8

Model Name Updates in five_step_analysis.ipynb

Changes Made

Updated the notebook to display cleaner, more readable model names in all plots while maintaining the correct cost lookups.

Before → After Transformations

Original Name Display Name (with Version Date)
anthropic/claude-haiku:latest Claude Haiku 4.5 (2025-10-01)
anthropic/claude-opus:latest Claude Opus 4.1 (2025-08-05)
anthropic/claude-sonnet:latest Claude Sonnet 4.5 (2025-09-29)
claude-3-5-haiku-latest Claude 3.5 Haiku (2024-10-22)
google/gemini:latest Gemini 2.5 Pro
google/gemini-flash Gemini Flash
gemini-2.0-flash-lite Gemini 2.0 Flash Lite
openai/o:latest O3 (2025-04-16, Azure)
openai/gpt-5 GPT-5 (2025-08-07)
openai/gpt-5-mini GPT-5 Mini
openai/o3 O3
openai/o3-mini O3 Mini
openai/o4-mini O4 Mini
xai/grok:latest Grok-3
xai/grok-mini Grok Mini
xai/grok-code-fast-1 Grok Code Fast 1
aws/llama-4-maverick Llama-4 Maverick
aws/llama-4-scout Llama-4 Scout
gpt-oss-120b GPT-OSS-120B
gpt-5-codex GPT-5 Codex
deepseek-r1 DeepSeek-R1
gcp/qwen-3 Qwen-3

Note: Version dates (e.g., 2025-10-01) reflect the actual underlying model versions discovered through CBORG API testing on October 29, 2025.

Technical Implementation

What Changed

  • Added MODEL_NAME_MAPPING dictionary based on CBORG API testing results
  • Added resolve_model_name() function to convert aliases to display names
  • Updated create_pair_label() to use resolved names instead of raw strings

What Stayed the Same

  • Cost tables still use original model names (correct behavior)
  • Data loading and filtering logic unchanged
  • Plot generation code unchanged
  • Cost calculations work correctly with original column values

Key Design Decision

The mapping only affects the pair column used for display in plots. The original supervisor and coder columns remain unchanged, ensuring cost lookups continue to work correctly:

# Cost lookup uses original columns (correct)
sup_model = row['supervisor']  # e.g., "anthropic/claude-haiku:latest"
sup_icost = input_cost.get(sup_model, 0)  # Finds correct price

# Display uses mapped pair column
pair_name = row['pair']  # e.g., "Claude Haiku 4.5"

Benefits

  1. Clearer plot titles: "Claude Haiku 4.5" instead of "anthropic/claude-haiku:latest"
  2. Easier comparison: Names highlight the actual model versions
  3. Based on real data: Names reflect actual underlying models from CBORG API testing
  4. Maintains correctness: Cost calculations still work properly with original names

Example Output

Before:

  • anthropic/claude-sonnet:latest
  • xai/grok:latest
  • openai/o:latest
  • openai/gpt-5

After (with version dates):

  • Claude Sonnet 4.5 (2025-09-29)
  • Grok-3
  • O3 (2025-04-16, Azure)
  • GPT-5 (2025-08-07)

Much more readable in plot titles and legends, with version dates showing exactly which model snapshot was used!