EkmekE commited on Oct 1, 2025

Commit

762d00a

verified ·

1 Parent(s): 8fd0e3e

Initial LoRA adapter upload

Browse files

Files changed (17) hide show

.gitattributes +1 -0
.ipynb_checkpoints/README-checkpoint.md +208 -0
README.md +208 -0
adapter_config.json +42 -0
adapter_model.safetensors +3 -0
added_tokens.json +28 -0
chat_template.jinja +4 -0
merges.txt +0 -0
optimizer.pt +3 -0
rng_state.pth +3 -0
scheduler.pt +3 -0
special_tokens_map.json +31 -0
tokenizer.json +3 -0
tokenizer_config.json +240 -0
trainer_state.json +452 -0
training_args.bin +3 -0
vocab.json +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

.ipynb_checkpoints/README-checkpoint.md ADDED Viewed

	@@ -0,0 +1,208 @@

+---
+base_model: Qwen/Qwen3-8B
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:Qwen/Qwen3-8B
+- llama-factory
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.17.1

README.md ADDED Viewed

	@@ -0,0 +1,208 @@

+---
+base_model: Qwen/Qwen3-8B
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:Qwen/Qwen3-8B
+- llama-factory
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.17.1

adapter_config.json ADDED Viewed

	@@ -0,0 +1,42 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "/workspace/Qwen3-8B",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "gate_proj",
+    "q_proj",
+    "v_proj",
+    "down_proj",
+    "o_proj",
+    "k_proj",
+    "up_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6826fd1aaf67adf29eb304886d4de5ec9ec895f938e8d75cda47339563c14f69
+size 174655536

added_tokens.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "</think>": 151668,
+  "</tool_call>": 151658,
+  "</tool_response>": 151666,
+  "<think>": 151667,
+  "<tool_call>": 151657,
+  "<tool_response>": 151665,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,4 @@

+{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% endif %}{% if system_message is defined %}{{ 'System: ' + system_message + '<|im_end|>' + '
+' }}{% endif %}{% for message in loop_messages %}{% set content = message['content'] %}{% if message['role'] == 'user' %}{{ 'Human: ' + content + '<|im_end|>' + '
+Assistant:' }}{% elif message['role'] == 'assistant' %}{{ content + '<|im_end|>' + '
+' }}{% endif %}{% endfor %}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b55db1ef8708462db55fd08cd80fd1d4f7f481f69ffbc2644f719f8822c428f1
+size 349601867

rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dd2151607431537fecd714359624bb118d0311545a791da0ded9114f8f7fc9fa
+size 14645

scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cf2eaebf8faff85dd2509741fe23a52298d3254a55a70ec40e99e4e0c13d29a2
+size 1465

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
+size 11422654

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,240 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151665": {
+      "content": "<tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151666": {
+      "content": "</tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151667": {
+      "content": "<think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151668": {
+      "content": "</think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "padding_side": "right",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

trainer_state.json ADDED Viewed

	@@ -0,0 +1,452 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.1569197239375226,
+  "eval_steps": 100,
+  "global_step": 200,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.02905920813657828,
+      "grad_norm": 1.8251163959503174,
+      "learning_rate": 1.2903225806451613e-05,
+      "loss": 2.6703,
+      "num_input_tokens_seen": 686720,
+      "step": 5,
+      "train_runtime": 395.6486,
+      "train_tokens_per_second": 1735.682
+    },
+    {
+      "epoch": 0.05811841627315656,
+      "grad_norm": 1.6785106658935547,
+      "learning_rate": 2.9032258064516133e-05,
+      "loss": 2.6024,
+      "num_input_tokens_seen": 1373472,
+      "step": 10,
+      "train_runtime": 791.2815,
+      "train_tokens_per_second": 1735.757
+    },
+    {
+      "epoch": 0.08717762440973484,
+      "grad_norm": 1.5060399770736694,
+      "learning_rate": 4.516129032258064e-05,
+      "loss": 2.2899,
+      "num_input_tokens_seen": 2061472,
+      "step": 15,
+      "train_runtime": 1188.0329,
+      "train_tokens_per_second": 1735.198
+    },
+    {
+      "epoch": 0.11623683254631312,
+      "grad_norm": 0.8076892495155334,
+      "learning_rate": 6.129032258064517e-05,
+      "loss": 1.7635,
+      "num_input_tokens_seen": 2748800,
+      "step": 20,
+      "train_runtime": 1583.7612,
+      "train_tokens_per_second": 1735.615
+    },
+    {
+      "epoch": 0.1452960406828914,
+      "grad_norm": 0.5285284519195557,
+      "learning_rate": 7.741935483870968e-05,
+      "loss": 1.4089,
+      "num_input_tokens_seen": 3436480,
+      "step": 25,
+      "train_runtime": 1980.3723,
+      "train_tokens_per_second": 1735.27
+    },
+    {
+      "epoch": 0.17435524881946968,
+      "grad_norm": 0.4370571970939636,
+      "learning_rate": 9.35483870967742e-05,
+      "loss": 1.2673,
+      "num_input_tokens_seen": 4122560,
+      "step": 30,
+      "train_runtime": 2375.5923,
+      "train_tokens_per_second": 1735.382
+    },
+    {
+      "epoch": 0.20341445695604796,
+      "grad_norm": 0.3576990067958832,
+      "learning_rate": 9.99906754234138e-05,
+      "loss": 1.1446,
+      "num_input_tokens_seen": 4810240,
+      "step": 35,
+      "train_runtime": 2771.8387,
+      "train_tokens_per_second": 1735.397
+    },
+    {
+      "epoch": 0.23247366509262624,
+      "grad_norm": 0.2466888129711151,
+      "learning_rate": 9.993370449424153e-05,
+      "loss": 1.0947,
+      "num_input_tokens_seen": 5498080,
+      "step": 40,
+      "train_runtime": 3168.1072,
+      "train_tokens_per_second": 1735.446
+    },
+    {
+      "epoch": 0.2615328732292045,
+      "grad_norm": 0.23221422731876373,
+      "learning_rate": 9.982500190692845e-05,
+      "loss": 1.0456,
+      "num_input_tokens_seen": 6187968,
+      "step": 45,
+      "train_runtime": 3566.2467,
+      "train_tokens_per_second": 1735.149
+    },
+    {
+      "epoch": 0.2905920813657828,
+      "grad_norm": 0.26600706577301025,
+      "learning_rate": 9.966468027809582e-05,
+      "loss": 1.0029,
+      "num_input_tokens_seen": 6875040,
+      "step": 50,
+      "train_runtime": 3962.2914,
+      "train_tokens_per_second": 1735.117
+    },
+    {
+      "epoch": 0.31965128950236105,
+      "grad_norm": 0.24320659041404724,
+      "learning_rate": 9.945290570204359e-05,
+      "loss": 0.974,
+      "num_input_tokens_seen": 7561952,
+      "step": 55,
+      "train_runtime": 4357.8907,
+      "train_tokens_per_second": 1735.232
+    },
+    {
+      "epoch": 0.34871049763893935,
+      "grad_norm": 0.22472068667411804,
+      "learning_rate": 9.918989757867583e-05,
+      "loss": 0.944,
+      "num_input_tokens_seen": 8248800,
+      "step": 60,
+      "train_runtime": 4753.5334,
+      "train_tokens_per_second": 1735.299
+    },
+    {
+      "epoch": 0.3777697057755176,
+      "grad_norm": 0.268387109041214,
+      "learning_rate": 9.88759283862006e-05,
+      "loss": 0.9328,
+      "num_input_tokens_seen": 8937280,
+      "step": 65,
+      "train_runtime": 5150.3485,
+      "train_tokens_per_second": 1735.277
+    },
+    {
+      "epoch": 0.4068289139120959,
+      "grad_norm": 0.21440809965133667,
+      "learning_rate": 9.851132339884096e-05,
+      "loss": 0.9074,
+      "num_input_tokens_seen": 9625248,
+      "step": 70,
+      "train_runtime": 5546.5992,
+      "train_tokens_per_second": 1735.342
+    },
+    {
+      "epoch": 0.43588812204867416,
+      "grad_norm": 0.228573739528656,
+      "learning_rate": 9.80964603498485e-05,
+      "loss": 0.8937,
+      "num_input_tokens_seen": 10312960,
+      "step": 75,
+      "train_runtime": 5943.0082,
+      "train_tokens_per_second": 1735.31
+    },
+    {
+      "epoch": 0.46494733018525247,
+      "grad_norm": 0.21550454199314117,
+      "learning_rate": 9.763176904016913e-05,
+      "loss": 0.8789,
+      "num_input_tokens_seen": 11001696,
+      "step": 80,
+      "train_runtime": 6340.0803,
+      "train_tokens_per_second": 1735.261
+    },
+    {
+      "epoch": 0.4940065383218307,
+      "grad_norm": 0.23164565861225128,
+      "learning_rate": 9.711773089316645e-05,
+      "loss": 0.8684,
+      "num_input_tokens_seen": 11688192,
+      "step": 85,
+      "train_runtime": 6735.2127,
+      "train_tokens_per_second": 1735.386
+    },
+    {
+      "epoch": 0.523065746458409,
+      "grad_norm": 0.2465435415506363,
+      "learning_rate": 9.655487845586377e-05,
+      "loss": 0.8422,
+      "num_input_tokens_seen": 12375296,
+      "step": 90,
+      "train_runtime": 7131.144,
+      "train_tokens_per_second": 1735.387
+    },
+    {
+      "epoch": 0.5521249545949873,
+      "grad_norm": 0.24671192467212677,
+      "learning_rate": 9.594379484722184e-05,
+      "loss": 0.8408,
+      "num_input_tokens_seen": 13063552,
+      "step": 95,
+      "train_runtime": 7528.0327,
+      "train_tokens_per_second": 1735.321
+    },
+    {
+      "epoch": 0.5811841627315656,
+      "grad_norm": 0.2649816572666168,
+      "learning_rate": 9.528511315402358e-05,
+      "loss": 0.8422,
+      "num_input_tokens_seen": 13751648,
+      "step": 100,
+      "train_runtime": 7924.8612,
+      "train_tokens_per_second": 1735.254
+    },
+    {
+      "epoch": 0.5811841627315656,
+      "eval_loss": 0.8274134397506714,
+      "eval_runtime": 872.0056,
+      "eval_samples_per_second": 6.314,
+      "eval_steps_per_second": 1.579,
+      "num_input_tokens_seen": 13751648,
+      "step": 100
+    },
+    {
+      "epoch": 0.6102433708681438,
+      "grad_norm": 0.2663179636001587,
+      "learning_rate": 9.457951577499187e-05,
+      "loss": 0.8217,
+      "num_input_tokens_seen": 14438496,
+      "step": 105,
+      "train_runtime": 9194.7951,
+      "train_tokens_per_second": 1570.29
+    },
+    {
+      "epoch": 0.6393025790047221,
+      "grad_norm": 0.2964800000190735,
+      "learning_rate": 9.382773371381985e-05,
+      "loss": 0.8018,
+      "num_input_tokens_seen": 15126496,
+      "step": 110,
+      "train_runtime": 9591.6416,
+      "train_tokens_per_second": 1577.05
+    },
+    {
+      "epoch": 0.6683617871413003,
+      "grad_norm": 0.28969624638557434,
+      "learning_rate": 9.303054582184609e-05,
+      "loss": 0.8072,
+      "num_input_tokens_seen": 15815136,
+      "step": 115,
+      "train_runtime": 9989.0582,
+      "train_tokens_per_second": 1583.246
+    },
+    {
+      "epoch": 0.6974209952778787,
+      "grad_norm": 0.30194368958473206,
+      "learning_rate": 9.218877799115928e-05,
+      "loss": 0.8014,
+      "num_input_tokens_seen": 16503360,
+      "step": 120,
+      "train_runtime": 10386.1584,
+      "train_tokens_per_second": 1588.976
+    },
+    {
+      "epoch": 0.726480203414457,
+      "grad_norm": 0.2715190052986145,
+      "learning_rate": 9.130330229896847e-05,
+      "loss": 0.7902,
+      "num_input_tokens_seen": 17190176,
+      "step": 125,
+      "train_runtime": 10782.1528,
+      "train_tokens_per_second": 1594.318
+    },
+    {
+      "epoch": 0.7555394115510352,
+      "grad_norm": 0.2829165756702423,
+      "learning_rate": 9.037503610412501e-05,
+      "loss": 0.7874,
+      "num_input_tokens_seen": 17877120,
+      "step": 130,
+      "train_runtime": 11178.1048,
+      "train_tokens_per_second": 1599.298
+    },
+    {
+      "epoch": 0.7845986196876135,
+      "grad_norm": 0.3267139196395874,
+      "learning_rate": 8.940494109673265e-05,
+      "loss": 0.7963,
+      "num_input_tokens_seen": 18563488,
+      "step": 135,
+      "train_runtime": 11573.6201,
+      "train_tokens_per_second": 1603.948
+    },
+    {
+      "epoch": 0.8136578278241918,
+      "grad_norm": 0.31520357728004456,
+      "learning_rate": 8.839402230183e-05,
+      "loss": 0.7822,
+      "num_input_tokens_seen": 19253216,
+      "step": 140,
+      "train_runtime": 11971.6869,
+      "train_tokens_per_second": 1608.229
+    },
+    {
+      "epoch": 0.8427170359607701,
+      "grad_norm": 0.30459001660346985,
+      "learning_rate": 8.734332703817771e-05,
+      "loss": 0.7859,
+      "num_input_tokens_seen": 19941568,
+      "step": 145,
+      "train_runtime": 12368.4401,
+      "train_tokens_per_second": 1612.294
+    },
+    {
+      "epoch": 0.8717762440973483,
+      "grad_norm": 0.32623955607414246,
+      "learning_rate": 8.625394383322914e-05,
+      "loss": 0.7687,
+      "num_input_tokens_seen": 20629312,
+      "step": 150,
+      "train_runtime": 12764.6653,
+      "train_tokens_per_second": 1616.126
+    },
+    {
+      "epoch": 0.9008354522339266,
+      "grad_norm": 0.32089152932167053,
+      "learning_rate": 8.512700129540847e-05,
+      "loss": 0.7672,
+      "num_input_tokens_seen": 21315136,
+      "step": 155,
+      "train_runtime": 13160.0163,
+      "train_tokens_per_second": 1619.689
+    },
+    {
+      "epoch": 0.9298946603705049,
+      "grad_norm": 0.3055724799633026,
+      "learning_rate": 8.396366694486466e-05,
+      "loss": 0.7639,
+      "num_input_tokens_seen": 22002976,
+      "step": 160,
+      "train_runtime": 13557.0617,
+      "train_tokens_per_second": 1622.99
+    },
+    {
+      "epoch": 0.9589538685070832,
+      "grad_norm": 0.30428361892700195,
+      "learning_rate": 8.276514600391272e-05,
+      "loss": 0.7617,
+      "num_input_tokens_seen": 22690560,
+      "step": 165,
+      "train_runtime": 13953.7665,
+      "train_tokens_per_second": 1626.124
+    },
+    {
+      "epoch": 0.9880130766436614,
+      "grad_norm": 0.3108614981174469,
+      "learning_rate": 8.153268014841506e-05,
+      "loss": 0.7613,
+      "num_input_tokens_seen": 23378048,
+      "step": 170,
+      "train_runtime": 14350.762,
+      "train_tokens_per_second": 1629.046
+    },
+    {
+      "epoch": 1.0116236832546313,
+      "grad_norm": 0.3532414436340332,
+      "learning_rate": 8.026754622139691e-05,
+      "loss": 0.7645,
+      "num_input_tokens_seen": 23937248,
+      "step": 175,
+      "train_runtime": 14673.5871,
+      "train_tokens_per_second": 1631.315
+    },
+    {
+      "epoch": 1.0406828913912096,
+      "grad_norm": 0.32768887281417847,
+      "learning_rate": 7.897105491022818e-05,
+      "loss": 0.7563,
+      "num_input_tokens_seen": 24623744,
+      "step": 180,
+      "train_runtime": 15069.3557,
+      "train_tokens_per_second": 1634.028
+    },
+    {
+      "epoch": 1.069742099527788,
+      "grad_norm": 0.310390830039978,
+      "learning_rate": 7.764454938874252e-05,
+      "loss": 0.7389,
+      "num_input_tokens_seen": 25312576,
+      "step": 185,
+      "train_runtime": 15466.8535,
+      "train_tokens_per_second": 1636.569
+    },
+    {
+      "epoch": 1.0988013076643661,
+      "grad_norm": 0.3332880139350891,
+      "learning_rate": 7.628940392569994e-05,
+      "loss": 0.7544,
+      "num_input_tokens_seen": 25999584,
+      "step": 190,
+      "train_runtime": 15863.0121,
+      "train_tokens_per_second": 1639.007
+    },
+    {
+      "epoch": 1.1278605158009445,
+      "grad_norm": 0.3539334237575531,
+      "learning_rate": 7.490702246103513e-05,
+      "loss": 0.7455,
+      "num_input_tokens_seen": 26685632,
+      "step": 195,
+      "train_runtime": 16258.3382,
+      "train_tokens_per_second": 1641.351
+    },
+    {
+      "epoch": 1.1569197239375226,
+      "grad_norm": 0.3256574869155884,
+      "learning_rate": 7.3498837151366e-05,
+      "loss": 0.7465,
+      "num_input_tokens_seen": 27371456,
+      "step": 200,
+      "train_runtime": 16653.8315,
+      "train_tokens_per_second": 1643.553
+    },
+    {
+      "epoch": 1.1569197239375226,
+      "eval_loss": 0.7508572340011597,
+      "eval_runtime": 873.0587,
+      "eval_samples_per_second": 6.307,
+      "eval_steps_per_second": 1.577,
+      "num_input_tokens_seen": 27371456,
+      "step": 200
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 519,
+  "num_input_tokens_seen": 27371456,
+  "num_train_epochs": 3,
+  "save_steps": 100,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 1.2501177571560653e+18,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2ad64a79aed33d8e1fa03ef1359a1fa441dc425c9b8a5f947f938bc287f44915
+size 6161

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff