Text Classification
Transformers
Safetensors
English
qwen3
reward
RM
Code
CodeScaler
text-embeddings-inference
Instructions to use LARK-Lab/CodeScaler-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LARK-Lab/CodeScaler-4B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="LARK-Lab/CodeScaler-4B")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("LARK-Lab/CodeScaler-4B") model = AutoModelForSequenceClassification.from_pretrained("LARK-Lab/CodeScaler-4B") - Notebooks
- Google Colab
- Kaggle
| library_name: transformers | |
| tags: | |
| - reward | |
| - RM | |
| - Code | |
| - CodeScaler | |
| license: mit | |
| datasets: | |
| - LARK-Lab/CodeScalerPair-51K | |
| language: | |
| - en | |
| base_model: | |
| - Skywork/Skywork-Reward-V2-Qwen3-4B | |
| <h2 align="center"> | |
| CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models | |
| </h2> | |
| <p align="center"> | |
| <a href=""> | |
| <img | |
| src="https://img.shields.io/badge/Paper-Arxiv-red?logo=arxiv&logoColor=red" | |
| alt="CodeScaler Paper on arXiv" | |
| /> | |
| <a href="https://github.com/LARK-AI-Lab/CodeScaler"> | |
| <img | |
| src="https://img.shields.io/badge/GitHub-Code-181717?logo=github&logoColor=white" | |
| alt="GitHub Code" | |
| /> | |
| </a> | |
| <a href="https://lark-ai-lab.github.io/codescaler.github.io/"> | |
| <img | |
| src="https://img.shields.io/badge/GitHub-Page-4078c0?logo=github&logoColor=white" | |
| alt="GitHub Page" | |
| /> | |
| </a> | |
| <a href="https://huggingface.co/collections/LARK-Lab/codescaler"> | |
| <img | |
| src="https://img.shields.io/badge/Datasets-Hugging%20Face%20Data-orange?logo=huggingface&logoColor=yellow" | |
| alt="Datasets on Hugging Face" | |
| /> | |
| </a> | |
| <a href="https://huggingface.co/collections/LARK-Lab/codescaler"> | |
| <img | |
| src="https://img.shields.io/badge/CodeScaler-Hugging%20Face%20Model-FFCC00?logo=huggingface&logoColor=yellow" | |
| alt="CodeScaler on Hugging Face" | |
| /> | |
| </a> | |
| </p> | |
| ## Overview | |
| We propose **CodeScaler**, an execution-free reward model designed to scale both reinforcement learning training and test-time inference for code generation. **CodeScaler** is trained on carefully curated preference data derived from verified code problems and incorporates syntax-aware code extraction and validity-preserving reward shaping to ensure stable and robust optimization. | |
| This model is the official CodeScaler-4B trained from Skywork/Skywork-Reward-V2-Qwen3-4B on [LARK-Lab/CodeScalerPair-51K](https://huggingface.co/datasets/LARK-Lab/CodeScalerPair-51K). | |
| ## Performance on RM-Bench | |
| | Model | Code | Chat | Math | Safety | Easy | Normal | Hard | Avg | | |
| | ------------------------------------ | ---- | ----- | ----- | ------ | ----- | ------ | ---- | ---- | | |
| | Skywork/Skywork-Reward-Llama-3.1-8B | 54.5 | 69.5 | 60.6 | 95.7 | 89 | 74.7 | 46.6 | 70.1 | | |
| | TIGER-Lab/AceCodeRM-7B | 66.9 | 66.7 | 65.3 | 89.9 | 79.9 | 74.4 | 62.2 | 72.2 | | |
| | TIGER-Lab/AceCoder-RM-32B | 72.1 | 73.7 | 70.5 | 88 | 84.5 | 78.3 | 65.5 | 76.1 | | |
| | Skywork/Skywork-Reward-V2-Qwen3-1.7B | 72.3 | 69.6 | 71.4 | 92.9 | 92.8 | 82.3 | 54.5 | 76.6 | | |
| | Skywork/Skywork-Reward-V2-Qwen3-4B | 74.4 | 78.2 | 73.6 | 95.7 | 92.1 | 85 | 64.4 | 80.5 | | |
| | Skywork/Skywork-Reward-V2-Qwen3-8B | 73.6 | 80.6 | 75 | 96.5 | 91.8 | 85.5 | 67 | 80.5 | | |
| | CodeScaler-1.7B | 73.1 | 74.4 | 74.7 | 93.1 | 91.7 | 83.2 | 61.5 | 78.8 | | |
| | **CodeScaler-4B (this model)** | 76.3 | 80.4 | 79 | 95.8 | 92.9 | 86.5 | 69.2 | 82.9 | | |
| | CodeScaler-8B | 76.9 | 83 | 79.9 | 96.4 | 92.5 | 87.9 | 71.8 | 84.1 | | |
| ## Usage | |
| ### RM Scoring | |
| ````python | |
| import torch | |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification | |
| device = "cuda" if torch.cuda.is_available() else "cpu" | |
| model_path = 'LARK-Lab/CodeScaler-4B' | |
| tokenizer = AutoTokenizer.from_pretrained(model_path) | |
| reward_model = AutoModelForSequenceClassification.from_pretrained(model_path).to(device) | |
| reward_model.eval() | |
| question = """\ | |
| Given an integer array nums and an integer k, return the total number of continuous subarrays whose sum equals k. | |
| A subarray is a contiguous part of the array. | |
| For example: | |
| ``` | |
| Input: | |
| nums = [1, 1, 1], k = 2 | |
| Output: | |
| 2 | |
| ``` | |
| """ | |
| program_correct = """\ | |
| from collections import defaultdict | |
| def subarraySum(nums, k): | |
| prefix = 0 | |
| count = 0 | |
| freq = defaultdict(int) | |
| freq[0] = 1 # Important: subarray starting from index 0 | |
| for num in nums: | |
| prefix += num | |
| if prefix - k in freq: | |
| count += freq[prefix - k] | |
| freq[prefix] += 1 | |
| return count | |
| """ | |
| program_wrong = """\ | |
| def subarraySum(nums, k): | |
| left = 0 | |
| curr_sum = 0 | |
| count = 0 | |
| for right in range(len(nums)): | |
| curr_sum += nums[right] | |
| while curr_sum > k and left <= right: | |
| curr_sum -= nums[left] | |
| left += 1 | |
| if curr_sum == k: | |
| count += 1 | |
| return count | |
| """ | |
| convs = [ | |
| [ | |
| { | |
| "content": question, | |
| "role": "user", | |
| }, | |
| { | |
| "role": "assistant", | |
| "content": program | |
| } | |
| ] for program in [program_correct, program_wrong] | |
| ] | |
| texts = [ | |
| tokenizer.apply_chat_template(conv, tokenize=False) | |
| for conv in convs | |
| ] | |
| toks = tokenizer( | |
| texts, | |
| truncation=True, | |
| padding=True, | |
| max_length=2048, | |
| return_tensors="pt", | |
| ) | |
| with torch.no_grad(): | |
| outputs = reward_model( | |
| input_ids=toks["input_ids"].to(device), | |
| attention_mask=toks["attention_mask"].to(device), | |
| ) | |
| scores = outputs.logits.squeeze(-1).cpu().tolist() | |
| print("RM Scores:", scores) | |
| # RM Scores: [12.552595138549805, 3.382493019104004] | |
| ```` | |
| ### RL Training | |
| Please refer to [https://github.com/LARK-AI-Lab/CodeScaler](https://github.com/LARK-AI-Lab/CodeScaler) for rl training details. | |
| ## Citation | |
| If you find our work helpful, please consider citing: | |
| ``` | |
| @misc{zhu2026codescalerscalingcodellm, | |
| title={CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models}, | |
| author={Xiao Zhu and Xinyu Zhou and Boyu Zhu and Hanxu Hu and Mingzhe Du and Haotian Zhang and Huiming Wang and Zhijiang Guo}, | |
| year={2026}, | |
| eprint={2602.17684}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.LG}, | |
| url={https://arxiv.org/abs/2602.17684}, | |
| } | |
| ``` | |