Bochkov
/

bvv241-abs

@@ -9,16 +9,17 @@ tags:
   - MoE
   - unicode
 ---
 # bvv241-abs: Unified Unicode Tokenizer (SOTA Intersection) with Frozen Embeddings and Extended Vector Dim (4096)
-This model is a core component described in the paper [**Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate**](https://huggingface.co/papers/2507.07129).
-This work explores a novel constructive approach to model development, built upon the foundation of non-trainable, deterministic input embeddings. It demonstrates that this fixed representational substrate acts as a universal "docking port," enabling seamless modular composition and progressive layer-wise growth of Transformer models.
 ## Tokenizer Description
-<!-- Provide a longer summary of what this model is. -->
 This tokenizer is based on a hybrid vocabulary:
@@ -40,21 +41,14 @@ No training or adaptation; suitable for plug-and-play use in research on embeddi
 ## How to Get Started with the Tokenizer
 ```python
 from transformers import AutoTokenizer
 from huggingface_hub import hf_hub_download
 import torch
 tokenizer = AutoTokenizer.from_pretrained('Bochkov/bvv241-abs')
 emb_path = hf_hub_download(
     repo_id="Bochkov/bvv241-abs",
     filename="normalized_embeddings_weights.pt"
 )
 embeddings = torch.load(emb_path)
 ```
@@ -72,7 +66,6 @@ If you use this model or the underlying concepts in your research, please cite o
       primaryClass={cs.CL},
       url={https://arxiv.org/abs/2507.04886},
 }
 @misc{bochkov2025growingtransformersmodularcomposition,
       title={Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate},
       author={A. Bochkov},
@@ -84,4 +77,4 @@ If you use this model or the underlying concepts in your research, please cite o
 }
 ```
-This work demonstrates that transformer blocks, not token embeddings, carry the semantic burden in LLMs — a step toward modular, fusable, multilingual LMs.

   - MoE
   - unicode
 ---
 # bvv241-abs: Unified Unicode Tokenizer (SOTA Intersection) with Frozen Embeddings and Extended Vector Dim (4096)
 ## Tokenizer Description
+This repository contains the tokenizer and associated resources from this paper:
+[📚 Paper (Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations)](https://huggingface.co/papers/2507.04886) -
+[💻 Code](https://github.com/AVBochkov/Embeddings)
 This tokenizer is based on a hybrid vocabulary:
 ## How to Get Started with the Tokenizer
 ```python
 from transformers import AutoTokenizer
 from huggingface_hub import hf_hub_download
 import torch
 tokenizer = AutoTokenizer.from_pretrained('Bochkov/bvv241-abs')
 emb_path = hf_hub_download(
     repo_id="Bochkov/bvv241-abs",
     filename="normalized_embeddings_weights.pt"
 )
 embeddings = torch.load(emb_path)
 ```
       primaryClass={cs.CL},
       url={https://arxiv.org/abs/2507.04886},
 }
 @misc{bochkov2025growingtransformersmodularcomposition,
       title={Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate},
       author={A. Bochkov},
 }
 ```
+This work demonstrates that transformer blocks, not token embeddings, carry the semantic burden in LLMs — a step toward modular, fusable, multilingual LMs.