Parveshiiii commited on
Commit
5e8cd76
·
verified ·
1 Parent(s): 8c37467

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +64 -6
README.md CHANGED
@@ -70,19 +70,77 @@ library_name: transformers
70
 
71
  ---
72
 
 
 
 
 
 
 
 
73
  ## 🧪 Example Usage
74
 
 
 
 
75
  ```python
76
- from transformers import AutoTokenizer, AutoModelForCausalLM
 
 
 
 
 
 
77
 
78
- model = AutoModelForCausalLM.from_pretrained("your-username/Auto-Completer-0.1")
79
- tokenizer = AutoTokenizer.from_pretrained("your-username/Auto-Completer-0.1")
80
 
81
- prompt = "The integral of x squared from 0 to 1 is"
82
- inputs = tokenizer(prompt, return_tensors="pt")
83
- outputs = model.generate(**inputs, max_new_tokens=100)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
85
  ```
 
 
 
 
86
 
87
  ---
88
 
 
70
 
71
  ---
72
 
73
+
74
+ ### How to use
75
+
76
+ ```bash
77
+ pip install transformers
78
+ ```
79
+
80
  ## 🧪 Example Usage
81
 
82
+ >Don't try to use it as a chat model its not meant for that
83
+
84
+ * _Using full precision_
85
  ```python
86
+ from transformers import AutoModelForCausalLM, AutoTokenizer
87
+
88
+ checkpoint = "Parveshiiii/Auto-Completer-0.1"
89
+ device = "cuda" # or "cpu"
90
+
91
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
92
+ model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
93
 
94
+ inputs = tokenizer.encode("Gravity is", return_tensors="pt").to(device)
 
95
 
96
+ outputs = model.generate(
97
+ inputs,
98
+ repetition_penalty=1.2, # you can increase it as it can often stuck in loops after it autocompletes the sentence
99
+ max_new_tokens=10, # as a autocomplete model i would suggest to use lower max token as the model generates till the max token cap
100
+ do_sample=True, # use this for diversity
101
+ eos_token_id=tokenizer.eos_token_id # Optional: stop at end-of-text
102
+ )
103
+
104
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
105
+ ```
106
+
107
+ * _Using `torch.bfloat16`_
108
+ ```python
109
+ # pip install accelerate
110
+ import torch
111
+ from transformers import AutoTokenizer, AutoModelForCausalLM
112
+
113
+ checkpoint = "HuggingFaceTB/SmolLM2-360M"
114
+ device = "cuda"
115
+
116
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
117
+ model = AutoModelForCausalLM.from_pretrained(
118
+ checkpoint,
119
+ device_map="auto",
120
+ torch_dtype=torch.bfloat16 # or torch.float16 for fp16
121
+ )
122
+
123
+ # Encode prompt
124
+ inputs = tokenizer.encode("Gravity is", return_tensors="pt").to(device)
125
+
126
+ # Generate with sampling and token control
127
+ outputs = model.generate(
128
+ inputs,
129
+ max_new_tokens=50, # Limit output length
130
+ do_sample=True, # Enable sampling for diversity
131
+ temperature=0.7, # Controls randomness (lower = more deterministic)
132
+ top_p=0.9, # Nucleus sampling (focus on top 90% of probability mass)
133
+ repetition_penalty=1.2, # Penalize repeated phrases
134
+ eos_token_id=tokenizer.eos_token_id # Optional: stop at end-of-text
135
+ )
136
+
137
+ # Decode and print
138
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
139
  ```
140
+ ```bash
141
+ >>> print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
142
+ Memory footprint: 723.56 MB
143
+ ```
144
 
145
  ---
146