nyxia commited on
Commit
074fbc0
·
verified ·
1 Parent(s): 4d85305

Upload Chimera 279M at step 250000

Browse files
Files changed (5) hide show
  1. README.md +1 -1
  2. config.json +1 -1
  3. model.safetensors +1 -1
  4. training.log +296 -0
  5. training_curves.png +2 -2
README.md CHANGED
@@ -36,7 +36,7 @@ thumbnail: auron_banner.png
36
  ![Training Curves](training_curves.png)
37
 
38
  ## Training
39
- - **Step:** 200,000
40
  - **Data:** Mixed (75% FineWeb-Edu, 18% StarCoder, 5% FineMath, 2% UltraChat)
41
  - **Optimizer:** Muon + AdamW (decoupled embedding LR)
42
  - **Schedule:** WSD (Warmup-Stable-Decay)
 
36
  ![Training Curves](training_curves.png)
37
 
38
  ## Training
39
+ - **Step:** 250,000
40
  - **Data:** Mixed (75% FineWeb-Edu, 18% StarCoder, 5% FineMath, 2% UltraChat)
41
  - **Optimizer:** Muon + AdamW (decoupled embedding LR)
42
  - **Schedule:** WSD (Warmup-Stable-Decay)
config.json CHANGED
@@ -26,7 +26,7 @@
26
  "architecture": "Chimera",
27
  "config_class": "ChimeraConfig",
28
  "topology": "4 bottom + 4x3 top = 16 virtual",
29
- "step": 200000,
30
  "total_params": 278664160,
31
  "size_label": "279M",
32
  "model_type": "zara-ml"
 
26
  "architecture": "Chimera",
27
  "config_class": "ChimeraConfig",
28
  "topology": "4 bottom + 4x3 top = 16 virtual",
29
+ "step": 250000,
30
  "total_params": 278664160,
31
  "size_label": "279M",
32
  "model_type": "zara-ml"
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4faab2f99d9051212788f8fef862eafd897021d81ab1cb41671c79f99991b809
3
  size 868508712
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e9aeabb54e88c5e896387efab5024159e4668318f6aa947a0f6efb457d35d185
3
  size 868508712
training.log CHANGED
@@ -757,3 +757,299 @@ step 202200/250000 | loss 2.9451 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,0
757
  step 202400/250000 | loss 2.9424 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,043 tok/s | epoch 1
758
  step 202600/250000 | loss 2.9361 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,045 tok/s | epoch 1
759
  step 202800/250000 | loss 2.9862 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,048 tok/s | epoch 1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
757
  step 202400/250000 | loss 2.9424 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,043 tok/s | epoch 1
758
  step 202600/250000 | loss 2.9361 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,045 tok/s | epoch 1
759
  step 202800/250000 | loss 2.9862 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,048 tok/s | epoch 1
760
+ step 203000/250000 | loss 2.9366 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,051 tok/s | epoch 1
761
+ step 203200/250000 | loss 2.9636 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,054 tok/s | epoch 1
762
+ step 203400/250000 | loss 2.9466 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,056 tok/s | epoch 1
763
+ step 203600/250000 | loss 2.9545 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,059 tok/s | epoch 1
764
+ step 203800/250000 | loss 2.9576 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,062 tok/s | epoch 1
765
+ step 204000/250000 | loss 2.9216 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,065 tok/s | epoch 1
766
+ step 204200/250000 | loss 2.9615 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,067 tok/s | epoch 1
767
+ step 204400/250000 | loss 2.9418 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,070 tok/s | epoch 1
768
+ step 204600/250000 | loss 2.9606 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,073 tok/s | epoch 1
769
+ step 204800/250000 | loss 2.9470 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,075 tok/s | epoch 1
770
+ step 205000/250000 | loss 2.9586 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,078 tok/s | epoch 1
771
+ >>> val_loss: 3.2677 | bpt: 4.7143 | true_bpb: 1.5165 *BEST*
772
+ >>> [The] The fascination is that you can finally learn about ancient times, and your knowledge can help you develop your love for literature and art. The Golden Age of Ancient Art is a great place to start. It is an important place to start when you find a passion for literature, and the book is a fantastic style for those who want to find out more about the history and literature of Ancient Greece. You can also
773
+ >>> [Scientists have discovered] Scientists have discovered the first evidence of how a virus works. They predict that the human virus might shift to a new form later than they possibly did.
774
+ I hope this proves what they think. The “scrum mechanism” was at play in the 1980s and “scrum” was long accepted throughout the 1990s. More is known about how the virus used a biological mechanism
775
+ step 205200/250000 | loss 2.9465 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,048 tok/s | epoch 1
776
+ step 205400/250000 | loss 2.9347 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,051 tok/s | epoch 1
777
+ step 205600/250000 | loss 2.9672 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,053 tok/s | epoch 1
778
+ step 205800/250000 | loss 2.9859 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,056 tok/s | epoch 1
779
+ step 206000/250000 | loss 2.9744 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,059 tok/s | epoch 1
780
+ step 206200/250000 | loss 2.9416 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,061 tok/s | epoch 1
781
+ step 206400/250000 | loss 2.9505 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,064 tok/s | epoch 1
782
+ step 206600/250000 | loss 2.9565 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,066 tok/s | epoch 1
783
+ step 206800/250000 | loss 2.9583 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,069 tok/s | epoch 1
784
+ step 207000/250000 | loss 2.9546 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,071 tok/s | epoch 1
785
+ step 207200/250000 | loss 2.9690 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,074 tok/s | epoch 1
786
+ step 207400/250000 | loss 2.9388 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,076 tok/s | epoch 1
787
+ step 207600/250000 | loss 2.9458 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,079 tok/s | epoch 1
788
+ step 207800/250000 | loss 2.9647 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,082 tok/s | epoch 1
789
+ step 208000/250000 | loss 2.9677 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,084 tok/s | epoch 1
790
+ step 208200/250000 | loss 2.9475 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,087 tok/s | epoch 1
791
+ step 208400/250000 | loss 2.9537 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,089 tok/s | epoch 1
792
+ step 208600/250000 | loss 2.9560 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,092 tok/s | epoch 1
793
+ step 208800/250000 | loss 2.9321 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,094 tok/s | epoch 1
794
+ step 209000/250000 | loss 2.9257 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,097 tok/s | epoch 1
795
+ step 209200/250000 | loss 2.9494 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,099 tok/s | epoch 1
796
+ step 209400/250000 | loss 2.9532 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,101 tok/s | epoch 1
797
+ step 209600/250000 | loss 2.9501 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,099 tok/s | epoch 1
798
+ step 209800/250000 | loss 2.9366 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,102 tok/s | epoch 1
799
+ step 210000/250000 | loss 2.9452 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,104 tok/s | epoch 1
800
+ >>> val_loss: 3.2675 | bpt: 4.7140 | true_bpb: 1.5165 *BEST*
801
+ >>> [The] The propaganda industry has been to struggle with the need for change, in order to achieve the goal of war and peace and freedom. This is arguably the most important aspect of the war effort, and a subject that has been the subject of intense debate in the political and military arenas. Thus, it also plays a critical role in this conflict.
802
+ The conflict of the 19th century is one of the
803
+ >>> [Scientists have discovered] Scientists have discovered a new compound that could enhance the ability of insects to survive and reproduce. It is essential for insects to survive and grow, and it’s too early to say whether the compound will be effective in combatting the insect’s disease. Karl Kuo, at South African University of Agriculture, is the author of the paper. The team announced today their discovery.
804
+ Apparently, this particular compound is a little bit
805
+ step 210200/250000 | loss 2.9522 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,070 tok/s | epoch 1
806
+ step 210400/250000 | loss 2.9578 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,073 tok/s | epoch 1
807
+ step 210600/250000 | loss 2.9600 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,075 tok/s | epoch 1
808
+ step 210800/250000 | loss 2.9824 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,078 tok/s | epoch 1
809
+ step 211000/250000 | loss 2.9618 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,080 tok/s | epoch 1
810
+ step 211200/250000 | loss 2.9632 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,083 tok/s | epoch 1
811
+ step 211400/250000 | loss 2.9634 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,085 tok/s | epoch 1
812
+ step 211600/250000 | loss 2.9450 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,088 tok/s | epoch 1
813
+ step 211800/250000 | loss 2.9554 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,090 tok/s | epoch 1
814
+ step 212000/250000 | loss 2.9556 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,093 tok/s | epoch 1
815
+ step 212200/250000 | loss 2.9391 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,095 tok/s | epoch 1
816
+ step 212400/250000 | loss 2.9443 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,098 tok/s | epoch 1
817
+ step 212600/250000 | loss 2.9262 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,100 tok/s | epoch 1
818
+ step 212800/250000 | loss 2.9390 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,102 tok/s | epoch 1
819
+ step 213000/250000 | loss 2.9842 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,105 tok/s | epoch 1
820
+ step 213200/250000 | loss 2.9660 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,107 tok/s | epoch 1
821
+ step 213400/250000 | loss 2.9350 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,110 tok/s | epoch 1
822
+ step 213600/250000 | loss 2.9519 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,112 tok/s | epoch 1
823
+ step 213800/250000 | loss 2.9428 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,115 tok/s | epoch 1
824
+ step 214000/250000 | loss 2.9341 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,117 tok/s | epoch 1
825
+ step 214200/250000 | loss 2.9679 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,119 tok/s | epoch 1
826
+ step 214400/250000 | loss 2.9575 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,122 tok/s | epoch 1
827
+ step 214600/250000 | loss 2.9406 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,124 tok/s | epoch 1
828
+ step 214800/250000 | loss 2.9603 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,126 tok/s | epoch 1
829
+ step 215000/250000 | loss 2.9369 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,129 tok/s | epoch 1
830
+ >>> val_loss: 3.2669 | bpt: 4.7132 | true_bpb: 1.5162 *BEST*
831
+ >>> [The] The Australian Centre for Coastal Research and Development proposes and focuses on the history of ocean biodiversity, biological processes, human impacts, and the evolution of ocean life, rather than the wide scope and complexity of the great ocean.
832
+
833
+ As an example, the region's history revolves around the discovery of the first caves on the Australian continent during the Ice Age, and the discovery of the first living organisms on Earth around 1
834
+ >>> [Scientists have discovered] Scientists have discovered a technique to isolate the missing piece of DNA. Apparently, this technique has the potential to allow an accurate mapping and characterization of the DNA sequences, which will help scientists learn more about the human genome. The technique could also help researchers understand more about immune system function and the code used by our immune system. The researchers hope to use the technique as an inhibitor for the virus to prevent the virus from infect
835
+ step 215200/250000 | loss 2.9393 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,101 tok/s | epoch 1
836
+ step 215400/250000 | loss 2.9331 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,103 tok/s | epoch 1
837
+ step 215600/250000 | loss 2.9448 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,106 tok/s | epoch 1
838
+ step 215800/250000 | loss 2.9456 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,108 tok/s | epoch 1
839
+ step 216000/250000 | loss 2.9353 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,110 tok/s | epoch 1
840
+ step 216200/250000 | loss 2.9637 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,113 tok/s | epoch 1
841
+ step 216400/250000 | loss 2.9404 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,115 tok/s | epoch 1
842
+ step 216600/250000 | loss 2.9532 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,117 tok/s | epoch 1
843
+ step 216800/250000 | loss 2.9737 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,120 tok/s | epoch 1
844
+ step 217000/250000 | loss 2.9322 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,122 tok/s | epoch 1
845
+ step 217200/250000 | loss 2.9628 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,125 tok/s | epoch 1
846
+ step 217400/250000 | loss 2.9519 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,127 tok/s | epoch 1
847
+ step 217600/250000 | loss 2.9288 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,129 tok/s | epoch 1
848
+ step 217800/250000 | loss 2.9550 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,132 tok/s | epoch 1
849
+ step 218000/250000 | loss 2.9554 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,134 tok/s | epoch 1
850
+ step 218200/250000 | loss 2.9533 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,136 tok/s | epoch 1
851
+ step 218400/250000 | loss 2.9593 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,138 tok/s | epoch 1
852
+ step 218600/250000 | loss 2.9548 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,141 tok/s | epoch 1
853
+ step 218800/250000 | loss 2.9672 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,143 tok/s | epoch 1
854
+ step 219000/250000 | loss 2.9620 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,145 tok/s | epoch 1
855
+ step 219200/250000 | loss 2.9483 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,147 tok/s | epoch 1
856
+ step 219400/250000 | loss 2.9663 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,150 tok/s | epoch 1
857
+ step 219600/250000 | loss 2.9630 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,152 tok/s | epoch 1
858
+ step 219800/250000 | loss 2.9525 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,154 tok/s | epoch 1
859
+ step 220000/250000 | loss 2.9565 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,156 tok/s | epoch 1
860
+ >>> val_loss: 3.2645 | bpt: 4.7096 | true_bpb: 1.5150 *BEST*
861
+ >>> [The] The orthostatic gyric (AFG) system helps improve the balance of the hands and fingers as the movement becomes more competitive. The movement stimulates the electrodes in the hands reducing the risk of injury. The movement releases energy into the muscles allowing the movement to become more efficient and more effective. This principle is useful in a wide range of situations including sports, recreational activities, home sports activities, and so
862
+ >>> [Scientists have discovered] Scientists have discovered a 50% decrease in the number of genes in the human genome for genetic mutations that interfere with a gene's function.
863
+ The research team focused on two genes that are responsible for the production of a protein called microtubule-affected gene 2. This gene is important to the development of the nervous system. This protein is critical in the production of these genes. The researchers found that
864
+ step 220200/250000 | loss 2.9548 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,125 tok/s | epoch 1
865
+ step 220400/250000 | loss 2.9558 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,128 tok/s | epoch 1
866
+ step 220600/250000 | loss 2.9517 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,130 tok/s | epoch 1
867
+ step 220800/250000 | loss 2.9478 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,132 tok/s | epoch 1
868
+ step 221000/250000 | loss 2.9411 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,135 tok/s | epoch 1
869
+ step 221200/250000 | loss 2.9608 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,137 tok/s | epoch 1
870
+ step 221400/250000 | loss 2.9563 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,139 tok/s | epoch 1
871
+ step 221600/250000 | loss 2.9397 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,141 tok/s | epoch 1
872
+ step 221800/250000 | loss 2.9382 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,143 tok/s | epoch 1
873
+ step 222000/250000 | loss 2.9588 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,146 tok/s | epoch 1
874
+ step 222200/250000 | loss 2.9533 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,148 tok/s | epoch 1
875
+ step 222400/250000 | loss 2.9465 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,146 tok/s | epoch 1
876
+ step 222600/250000 | loss 2.9540 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,148 tok/s | epoch 1
877
+ step 222800/250000 | loss 2.9345 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,150 tok/s | epoch 1
878
+ step 223000/250000 | loss 2.9579 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,152 tok/s | epoch 1
879
+ step 223200/250000 | loss 2.9398 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,154 tok/s | epoch 1
880
+ step 223400/250000 | loss 2.9452 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,156 tok/s | epoch 1
881
+ step 223600/250000 | loss 2.9344 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,158 tok/s | epoch 1
882
+ step 223800/250000 | loss 2.9572 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,160 tok/s | epoch 1
883
+ step 224000/250000 | loss 2.9450 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,162 tok/s | epoch 1
884
+ step 224200/250000 | loss 2.9175 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,164 tok/s | epoch 1
885
+ step 224400/250000 | loss 2.9402 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,165 tok/s | epoch 1
886
+ step 224600/250000 | loss 2.9301 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,167 tok/s | epoch 1
887
+ step 224800/250000 | loss 2.9507 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,169 tok/s | epoch 1
888
+ step 225000/250000 | loss 2.9394 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,171 tok/s | epoch 1
889
+ >>> val_loss: 3.2629 | bpt: 4.7074 | true_bpb: 1.5143 *BEST*
890
+ >>> [The] The Furthest Way In The World. The United States Passport: 1930-1933. Baltimore: National Geographic Society.
891
+ Jewish World Index. The Jewish World Index. The Jewish World Index. The Jewish World Index. The Jewish World Index. The Jewish World Index. The Jewish World Index. The Jewish World Index. The Jewish World Index. The Jewish World Index. The
892
+ >>> [Scientists have discovered] Scientists have discovered a new species of Chondssharyi, or elusive swathe of chondssharyi, which is believed to belong to the Bhagavan gallei, a stream species encountered throughout the world. The avian world is dominated by the Indian avian, the world's most endemic bird.
893
+ The Kalapaland Bird Guide claims that their distribution in India is "comfortably abys
894
+ step 225200/250000 | loss 2.9478 | lr 8.00e-04 emb 4.00e-04 | 318ms/step | 103,144 tok/s | epoch 1
895
+ step 225400/250000 | loss 2.9718 | lr 7.99e-04 emb 4.00e-04 | 318ms/step | 103,146 tok/s | epoch 1
896
+ step 225600/250000 | loss 2.9482 | lr 7.99e-04 emb 3.99e-04 | 318ms/step | 103,148 tok/s | epoch 1
897
+ step 225800/250000 | loss 2.9423 | lr 7.98e-04 emb 3.99e-04 | 318ms/step | 103,150 tok/s | epoch 1
898
+ step 226000/250000 | loss 2.9479 | lr 7.97e-04 emb 3.98e-04 | 318ms/step | 103,152 tok/s | epoch 1
899
+ step 226200/250000 | loss 2.9422 | lr 7.95e-04 emb 3.98e-04 | 318ms/step | 103,154 tok/s | epoch 1
900
+ step 226400/250000 | loss 2.9455 | lr 7.94e-04 emb 3.97e-04 | 318ms/step | 103,156 tok/s | epoch 1
901
+ step 226600/250000 | loss 2.9301 | lr 7.92e-04 emb 3.96e-04 | 318ms/step | 103,158 tok/s | epoch 1
902
+ step 226800/250000 | loss 2.9308 | lr 7.90e-04 emb 3.95e-04 | 318ms/step | 103,160 tok/s | epoch 1
903
+ step 227000/250000 | loss 2.9256 | lr 7.87e-04 emb 3.94e-04 | 318ms/step | 103,163 tok/s | epoch 1
904
+ step 227200/250000 | loss 2.9350 | lr 7.85e-04 emb 3.92e-04 | 318ms/step | 103,165 tok/s | epoch 1
905
+ step 227400/250000 | loss 2.9659 | lr 7.82e-04 emb 3.91e-04 | 318ms/step | 103,167 tok/s | epoch 1
906
+ step 227600/250000 | loss 2.9347 | lr 7.79e-04 emb 3.89e-04 | 318ms/step | 103,169 tok/s | epoch 1
907
+ step 227800/250000 | loss 2.9516 | lr 7.76e-04 emb 3.88e-04 | 318ms/step | 103,171 tok/s | epoch 1
908
+ step 228000/250000 | loss 2.9502 | lr 7.72e-04 emb 3.86e-04 | 318ms/step | 103,173 tok/s | epoch 1
909
+ step 228200/250000 | loss 2.9523 | lr 7.68e-04 emb 3.84e-04 | 318ms/step | 103,175 tok/s | epoch 1
910
+ step 228400/250000 | loss 2.9310 | lr 7.64e-04 emb 3.82e-04 | 318ms/step | 103,177 tok/s | epoch 1
911
+ step 228600/250000 | loss 2.9572 | lr 7.60e-04 emb 3.80e-04 | 318ms/step | 103,179 tok/s | epoch 1
912
+ step 228800/250000 | loss 2.9297 | lr 7.55e-04 emb 3.78e-04 | 318ms/step | 103,181 tok/s | epoch 1
913
+ step 229000/250000 | loss 2.9389 | lr 7.51e-04 emb 3.75e-04 | 318ms/step | 103,183 tok/s | epoch 1
914
+ step 229200/250000 | loss 2.9286 | lr 7.46e-04 emb 3.73e-04 | 318ms/step | 103,185 tok/s | epoch 1
915
+ step 229400/250000 | loss 2.9567 | lr 7.40e-04 emb 3.70e-04 | 318ms/step | 103,187 tok/s | epoch 1
916
+ step 229600/250000 | loss 2.9286 | lr 7.35e-04 emb 3.68e-04 | 318ms/step | 103,189 tok/s | epoch 1
917
+ step 229800/250000 | loss 2.9405 | lr 7.29e-04 emb 3.65e-04 | 318ms/step | 103,191 tok/s | epoch 1
918
+ step 230000/250000 | loss 2.9440 | lr 7.24e-04 emb 3.62e-04 | 318ms/step | 103,193 tok/s | epoch 1
919
+ >>> val_loss: 3.2499 | bpt: 4.6886 | true_bpb: 1.5083 *BEST*
920
+ >>> [The] The pontiff of the Dominican Church was build after January 1st 1543 by the Bishop of Mshiftu (the great church of the Saints of the Dominican city of Mshiftu) and was killed by a torrent from the river Elger. On May 12, 1553, he was brought back to the Cathedral, where the great manor became the
921
+ >>> [Scientists have discovered] Scientists have discovered that the “home order” and “episodes” that are responsible for the timing of events are precisely the same. The discovery of this explanation will allow more and better scientists to use this theory to quickly and quickly create time-lapse video, as well as new datasets that can be used to understand and predict the time cycles of the “mids” of space-time.
922
+ The researchers used the data
923
+ step 230200/250000 | loss 2.9346 | lr 7.18e-04 emb 3.59e-04 | 318ms/step | 103,164 tok/s | epoch 1
924
+ step 230400/250000 | loss 2.9302 | lr 7.11e-04 emb 3.56e-04 | 318ms/step | 103,166 tok/s | epoch 1
925
+ step 230600/250000 | loss 2.9457 | lr 7.05e-04 emb 3.53e-04 | 318ms/step | 103,168 tok/s | epoch 1
926
+ step 230800/250000 | loss 2.9214 | lr 6.98e-04 emb 3.49e-04 | 318ms/step | 103,170 tok/s | epoch 1
927
+ step 231000/250000 | loss 2.9491 | lr 6.92e-04 emb 3.46e-04 | 318ms/step | 103,172 tok/s | epoch 1
928
+ step 231200/250000 | loss 2.9373 | lr 6.85e-04 emb 3.42e-04 | 318ms/step | 103,174 tok/s | epoch 1
929
+ step 231400/250000 | loss 2.9339 | lr 6.77e-04 emb 3.39e-04 | 318ms/step | 103,176 tok/s | epoch 1
930
+ step 231600/250000 | loss 2.9302 | lr 6.70e-04 emb 3.35e-04 | 318ms/step | 103,178 tok/s | epoch 1
931
+ step 231800/250000 | loss 2.9252 | lr 6.63e-04 emb 3.31e-04 | 318ms/step | 103,180 tok/s | epoch 1
932
+ step 232000/250000 | loss 2.9380 | lr 6.55e-04 emb 3.28e-04 | 318ms/step | 103,182 tok/s | epoch 1
933
+ step 232200/250000 | loss 2.9195 | lr 6.47e-04 emb 3.24e-04 | 318ms/step | 103,184 tok/s | epoch 1
934
+ step 232400/250000 | loss 2.9027 | lr 6.39e-04 emb 3.20e-04 | 318ms/step | 103,186 tok/s | epoch 1
935
+ step 232600/250000 | loss 2.9357 | lr 6.31e-04 emb 3.16e-04 | 318ms/step | 103,188 tok/s | epoch 1
936
+ step 232800/250000 | loss 2.9304 | lr 6.23e-04 emb 3.11e-04 | 318ms/step | 103,190 tok/s | epoch 1
937
+ step 233000/250000 | loss 2.9179 | lr 6.14e-04 emb 3.07e-04 | 318ms/step | 103,192 tok/s | epoch 1
938
+ step 233200/250000 | loss 2.9157 | lr 6.06e-04 emb 3.03e-04 | 318ms/step | 103,194 tok/s | epoch 1
939
+ step 233400/250000 | loss 2.9254 | lr 5.97e-04 emb 2.99e-04 | 318ms/step | 103,196 tok/s | epoch 1
940
+ step 233600/250000 | loss 2.9002 | lr 5.88e-04 emb 2.94e-04 | 318ms/step | 103,198 tok/s | epoch 1
941
+ step 233800/250000 | loss 2.9221 | lr 5.79e-04 emb 2.90e-04 | 318ms/step | 103,200 tok/s | epoch 1
942
+ step 234000/250000 | loss 2.8802 | lr 5.70e-04 emb 2.85e-04 | 318ms/step | 103,202 tok/s | epoch 1
943
+ step 234200/250000 | loss 2.9077 | lr 5.61e-04 emb 2.81e-04 | 318ms/step | 103,204 tok/s | epoch 1
944
+ step 234400/250000 | loss 2.9153 | lr 5.52e-04 emb 2.76e-04 | 318ms/step | 103,206 tok/s | epoch 1
945
+ step 234600/250000 | loss 2.8949 | lr 5.43e-04 emb 2.71e-04 | 317ms/step | 103,208 tok/s | epoch 1
946
+ step 234800/250000 | loss 2.9154 | lr 5.33e-04 emb 2.67e-04 | 317ms/step | 103,210 tok/s | epoch 1
947
+ step 235000/250000 | loss 2.9074 | lr 5.24e-04 emb 2.62e-04 | 317ms/step | 103,212 tok/s | epoch 1
948
+ >>> val_loss: 3.2180 | bpt: 4.6426 | true_bpb: 1.4935 *BEST*
949
+ >>> [The] The Swiss Federal Institute of Technology Kelvin (TU) is a subsidiary of Granazame University, a subsidiary of the University of Bonn. The scholars are now working in collaboration with the University of Oxford, the University of London, and Witherspoon.
950
+ The aim of the project is to build an artificial smartphone that can autonomously drive itself like a car, while simultaneously accessing the road.
951
+ The device is
952
+ >>> [Scientists have discovered] Scientists have discovered a record of well managed colony-based activity in the Cape peninsula from 1538 to 1660. Some of the colonies, which had begun to grow and spread, managed to survive until the 17th century. The earliest records of colony-based communities in the Cape peninsula are from about 1500, when the first settlements were established in the area. The
953
+ step 235200/250000 | loss 2.9162 | lr 5.14e-04 emb 2.57e-04 | 318ms/step | 103,184 tok/s | epoch 1
954
+ step 235400/250000 | loss 2.9090 | lr 5.04e-04 emb 2.52e-04 | 318ms/step | 103,186 tok/s | epoch 1
955
+ step 235600/250000 | loss 2.8819 | lr 4.95e-04 emb 2.47e-04 | 318ms/step | 103,188 tok/s | epoch 1
956
+ step 235800/250000 | loss 2.8953 | lr 4.85e-04 emb 2.42e-04 | 318ms/step | 103,190 tok/s | epoch 1
957
+ step 236000/250000 | loss 2.9035 | lr 4.75e-04 emb 2.38e-04 | 318ms/step | 103,192 tok/s | epoch 1
958
+ step 236200/250000 | loss 2.8745 | lr 4.65e-04 emb 2.33e-04 | 318ms/step | 103,193 tok/s | epoch 1
959
+ step 236400/250000 | loss 2.8870 | lr 4.55e-04 emb 2.28e-04 | 318ms/step | 103,195 tok/s | epoch 1
960
+ step 236600/250000 | loss 2.9080 | lr 4.45e-04 emb 2.23e-04 | 318ms/step | 103,197 tok/s | epoch 1
961
+ step 236800/250000 | loss 2.8799 | lr 4.35e-04 emb 2.18e-04 | 318ms/step | 103,199 tok/s | epoch 1
962
+ step 237000/250000 | loss 2.9001 | lr 4.25e-04 emb 2.13e-04 | 318ms/step | 103,201 tok/s | epoch 1
963
+ step 237200/250000 | loss 2.9006 | lr 4.15e-04 emb 2.08e-04 | 318ms/step | 103,203 tok/s | epoch 1
964
+ step 237400/250000 | loss 2.8784 | lr 4.05e-04 emb 2.03e-04 | 318ms/step | 103,205 tok/s | epoch 1
965
+ step 237600/250000 | loss 2.8793 | lr 3.95e-04 emb 1.98e-04 | 317ms/step | 103,206 tok/s | epoch 1
966
+ step 237800/250000 | loss 2.9026 | lr 3.85e-04 emb 1.92e-04 | 317ms/step | 103,208 tok/s | epoch 1
967
+ step 238000/250000 | loss 2.8822 | lr 3.75e-04 emb 1.87e-04 | 317ms/step | 103,210 tok/s | epoch 1
968
+ step 238200/250000 | loss 2.8997 | lr 3.65e-04 emb 1.82e-04 | 317ms/step | 103,212 tok/s | epoch 1
969
+ step 238400/250000 | loss 2.9033 | lr 3.55e-04 emb 1.77e-04 | 317ms/step | 103,214 tok/s | epoch 1
970
+ step 238600/250000 | loss 2.8829 | lr 3.45e-04 emb 1.72e-04 | 317ms/step | 103,216 tok/s | epoch 1
971
+ step 238800/250000 | loss 2.8967 | lr 3.35e-04 emb 1.67e-04 | 317ms/step | 103,218 tok/s | epoch 1
972
+ step 239000/250000 | loss 2.8898 | lr 3.25e-04 emb 1.63e-04 | 317ms/step | 103,219 tok/s | epoch 1
973
+ step 239200/250000 | loss 2.8716 | lr 3.15e-04 emb 1.58e-04 | 317ms/step | 103,217 tok/s | epoch 2
974
+ step 239400/250000 | loss 2.8480 | lr 3.05e-04 emb 1.53e-04 | 317ms/step | 103,219 tok/s | epoch 2
975
+ step 239600/250000 | loss 2.8681 | lr 2.96e-04 emb 1.48e-04 | 317ms/step | 103,221 tok/s | epoch 2
976
+ step 239800/250000 | loss 2.8503 | lr 2.86e-04 emb 1.43e-04 | 317ms/step | 103,223 tok/s | epoch 2
977
+ step 240000/250000 | loss 2.8632 | lr 2.76e-04 emb 1.38e-04 | 317ms/step | 103,225 tok/s | epoch 2
978
+ >>> val_loss: 3.1951 | bpt: 4.6096 | true_bpb: 1.4828 *BEST*
979
+ >>> [The] The core educational objectives are to understand the significance and value of the assets that are managed by the community. This includes the goals for teaching expertise and teaching skills. The various components of the teacher's curriculum include the following:
980
+ - TA's curriculum and teaching materials
981
+ - Assessment tools
982
+ - Assessment tools thematic
983
+ - Assessment tools thematic
984
+ - Assessment tools thematic
985
+ - Assessment tools thematic
986
+ - Assessment tools thematic
987
+
988
+ >>> [Scientists have discovered] Scientists have discovered the presence of an M6 protein is behind the growth of brain cells that are part of a growing body of evidence.
989
+ The answers are not always clear, however. The findings are published in the August 16 issue of Science.
990
+ While we have only been able to produce one version of the M6 protein in our brains, the research team has demonstrated that the protein is usually located at the base
991
+ step 240200/250000 | loss 2.8883 | lr 2.67e-04 emb 1.33e-04 | 318ms/step | 103,197 tok/s | epoch 2
992
+ step 240400/250000 | loss 2.8652 | lr 2.57e-04 emb 1.29e-04 | 318ms/step | 103,199 tok/s | epoch 2
993
+ step 240600/250000 | loss 2.8634 | lr 2.48e-04 emb 1.24e-04 | 318ms/step | 103,201 tok/s | epoch 2
994
+ step 240800/250000 | loss 2.8479 | lr 2.39e-04 emb 1.19e-04 | 318ms/step | 103,203 tok/s | epoch 2
995
+ step 241000/250000 | loss 2.8898 | lr 2.30e-04 emb 1.15e-04 | 318ms/step | 103,205 tok/s | epoch 2
996
+ step 241200/250000 | loss 2.8542 | lr 2.21e-04 emb 1.10e-04 | 317ms/step | 103,207 tok/s | epoch 2
997
+ step 241400/250000 | loss 2.8431 | lr 2.12e-04 emb 1.06e-04 | 317ms/step | 103,209 tok/s | epoch 2
998
+ step 241600/250000 | loss 2.8541 | lr 2.03e-04 emb 1.01e-04 | 317ms/step | 103,211 tok/s | epoch 2
999
+ step 241800/250000 | loss 2.8915 | lr 1.94e-04 emb 9.71e-05 | 317ms/step | 103,212 tok/s | epoch 2
1000
+ step 242000/250000 | loss 2.8476 | lr 1.86e-04 emb 9.29e-05 | 317ms/step | 103,214 tok/s | epoch 2
1001
+ step 242200/250000 | loss 2.8849 | lr 1.77e-04 emb 8.86e-05 | 317ms/step | 103,216 tok/s | epoch 2
1002
+ step 242400/250000 | loss 2.8630 | lr 1.69e-04 emb 8.45e-05 | 317ms/step | 103,218 tok/s | epoch 2
1003
+ step 242600/250000 | loss 2.8633 | lr 1.61e-04 emb 8.04e-05 | 317ms/step | 103,219 tok/s | epoch 2
1004
+ step 242800/250000 | loss 2.8451 | lr 1.53e-04 emb 7.64e-05 | 317ms/step | 103,221 tok/s | epoch 2
1005
+ step 243000/250000 | loss 2.8546 | lr 1.45e-04 emb 7.25e-05 | 317ms/step | 103,223 tok/s | epoch 2
1006
+ step 243200/250000 | loss 2.8721 | lr 1.37e-04 emb 6.87e-05 | 317ms/step | 103,225 tok/s | epoch 2
1007
+ step 243400/250000 | loss 2.8400 | lr 1.30e-04 emb 6.50e-05 | 317ms/step | 103,227 tok/s | epoch 2
1008
+ step 243600/250000 | loss 2.8609 | lr 1.23e-04 emb 6.13e-05 | 317ms/step | 103,229 tok/s | epoch 2
1009
+ step 243800/250000 | loss 2.8659 | lr 1.15e-04 emb 5.77e-05 | 317ms/step | 103,230 tok/s | epoch 2
1010
+ step 244000/250000 | loss 2.8536 | lr 1.08e-04 emb 5.42e-05 | 317ms/step | 103,232 tok/s | epoch 2
1011
+ step 244200/250000 | loss 2.8595 | lr 1.02e-04 emb 5.08e-05 | 317ms/step | 103,234 tok/s | epoch 2
1012
+ step 244400/250000 | loss 2.8564 | lr 9.51e-05 emb 4.75e-05 | 317ms/step | 103,235 tok/s | epoch 2
1013
+ step 244600/250000 | loss 2.8792 | lr 8.86e-05 emb 4.43e-05 | 317ms/step | 103,237 tok/s | epoch 2
1014
+ step 244800/250000 | loss 2.8561 | lr 8.24e-05 emb 4.12e-05 | 317ms/step | 103,239 tok/s | epoch 2
1015
+ step 245000/250000 | loss 2.8584 | lr 7.64e-05 emb 3.82e-05 | 317ms/step | 103,241 tok/s | epoch 2
1016
+ >>> val_loss: 3.1881 | bpt: 4.5994 | true_bpb: 1.4796 *BEST*
1017
+ >>> [The] The incoming incoming incoming packets are processed in two steps: a) processing the incoming packets into the appropriate stack, b) processing the incoming packets into memory, and c) processing the incoming packets into the appropriate memory located on the incoming buffer.
1018
+ When we talk about the processor, the processor is the "audio processor", and the processor is the "server". The processor is the "tuns" of the
1019
+ >>> [Scientists have discovered] Scientists have discovered that in the early stages of growth, young bugs have a “memory”; during the rest, they find it difficult to know where they are sticking to. That’s why the researchers believe that it’s quite probable that it’s these “memory” parts of the brain which allow us to differentiate between different behaviors and get the best result.
1020
+ “Research produced by MIT has shown that young insects have a memory
1021
+ step 245200/250000 | loss 2.8427 | lr 7.06e-05 emb 3.53e-05 | 317ms/step | 103,218 tok/s | epoch 2
1022
+ step 245400/250000 | loss 2.8683 | lr 6.50e-05 emb 3.25e-05 | 317ms/step | 103,220 tok/s | epoch 2
1023
+ step 245600/250000 | loss 2.8556 | lr 5.96e-05 emb 2.98e-05 | 317ms/step | 103,222 tok/s | epoch 2
1024
+ step 245800/250000 | loss 2.8867 | lr 5.45e-05 emb 2.72e-05 | 317ms/step | 103,224 tok/s | epoch 2
1025
+ step 246000/250000 | loss 2.8754 | lr 4.95e-05 emb 2.48e-05 | 317ms/step | 103,226 tok/s | epoch 2
1026
+ step 246200/250000 | loss 2.8679 | lr 4.48e-05 emb 2.24e-05 | 317ms/step | 103,227 tok/s | epoch 2
1027
+ step 246400/250000 | loss 2.8659 | lr 4.03e-05 emb 2.01e-05 | 317ms/step | 103,229 tok/s | epoch 2
1028
+ step 246600/250000 | loss 2.8499 | lr 3.60e-05 emb 1.80e-05 | 317ms/step | 103,231 tok/s | epoch 2
1029
+ step 246800/250000 | loss 2.8637 | lr 3.19e-05 emb 1.60e-05 | 317ms/step | 103,233 tok/s | epoch 2
1030
+ step 247000/250000 | loss 2.8582 | lr 2.81e-05 emb 1.41e-05 | 317ms/step | 103,235 tok/s | epoch 2
1031
+ step 247200/250000 | loss 2.8703 | lr 2.45e-05 emb 1.23e-05 | 317ms/step | 103,237 tok/s | epoch 2
1032
+ step 247400/250000 | loss 2.8691 | lr 2.12e-05 emb 1.06e-05 | 317ms/step | 103,238 tok/s | epoch 2
1033
+ step 247600/250000 | loss 2.8528 | lr 1.81e-05 emb 9.03e-06 | 317ms/step | 103,240 tok/s | epoch 2
1034
+ step 247800/250000 | loss 2.8689 | lr 1.52e-05 emb 7.60e-06 | 317ms/step | 103,242 tok/s | epoch 2
1035
+ step 248000/250000 | loss 2.8516 | lr 1.26e-05 emb 6.29e-06 | 317ms/step | 103,240 tok/s | epoch 2
1036
+ step 248200/250000 | loss 2.8735 | lr 1.02e-05 emb 5.10e-06 | 317ms/step | 103,242 tok/s | epoch 2
1037
+ step 248400/250000 | loss 2.8326 | lr 8.07e-06 emb 4.03e-06 | 317ms/step | 103,244 tok/s | epoch 2
1038
+ step 248600/250000 | loss 2.8612 | lr 6.18e-06 emb 3.09e-06 | 317ms/step | 103,245 tok/s | epoch 2
1039
+ step 248800/250000 | loss 2.8651 | lr 4.55e-06 emb 2.27e-06 | 317ms/step | 103,247 tok/s | epoch 2
1040
+ step 249000/250000 | loss 2.8535 | lr 3.16e-06 emb 1.58e-06 | 317ms/step | 103,249 tok/s | epoch 2
1041
+ step 249200/250000 | loss 2.8788 | lr 2.02e-06 emb 1.01e-06 | 317ms/step | 103,250 tok/s | epoch 2
1042
+ step 249400/250000 | loss 2.8845 | lr 1.14e-06 emb 5.70e-07 | 317ms/step | 103,252 tok/s | epoch 2
1043
+ step 249600/250000 | loss 2.8357 | lr 5.08e-07 emb 2.54e-07 | 317ms/step | 103,254 tok/s | epoch 2
1044
+ step 249800/250000 | loss 2.8607 | lr 1.28e-07 emb 6.38e-08 | 317ms/step | 103,255 tok/s | epoch 2
1045
+ step 250000/250000 | loss 2.8425 | lr 3.16e-12 emb 1.58e-12 | 317ms/step | 103,257 tok/s | epoch 2
1046
+ >>> val_loss: 3.1877 | bpt: 4.5988 | true_bpb: 1.4794 *BEST*
1047
+ >>> [The] The creature was found in 1954 in the Oecostrome and there may be a chance that it may be a juvenile tiger. The tiger is a full 370 ft high and is 9.5 ft long with a wingspan of 11 ft. It has a length of 2.9 ft and a neck length of 2.1 ft. Its
1048
+ >>> [Scientists have discovered] Scientists have discovered another step in the evolution of the interconnected human organism, the ability to communicate with one another. This ability is very important and the mind of humans today has become one of the most complex and complex systems of existence.
1049
+ Perhaps the most striking aspect of human interaction is the ability to communicate with each other. Humans are incredibly intelligent beings who have a very efficient way of communicating with each other. This is an
1050
+
1051
+ ============================================================
1052
+ Training complete: 250000 steps in 51904s (865.1min)
1053
+ Final val_loss: 3.1877 | bpt: 4.5988 | true_bpb: 1.4794
1054
+ Best val_loss: 3.1877 | bpt: 4.5988 | true_bpb: 1.4794
1055
+ ============================================================
training_curves.png CHANGED

Git LFS Details

  • SHA256: bfd4cf011352f55a88519f0a4f22bfe3da3963d7ee8daab75e6a92b5b2eebcb5
  • Pointer size: 131 Bytes
  • Size of remote file: 229 kB

Git LFS Details

  • SHA256: 767b73fcb2e5808fd3dc2f96db233512579e362828fffd8d52b22a47bd367b4f
  • Pointer size: 131 Bytes
  • Size of remote file: 237 kB