eliseobao commited on
Commit
67d69b2
·
verified ·
1 Parent(s): 40193f8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -1
README.md CHANGED
@@ -8,4 +8,39 @@ pipeline_tag: text-generation
8
  library_name: transformers
9
  ---
10
 
11
- 4-bit quantized version of [irlab-udc/Llama-3.1-8B-Instruct-Galician](https://huggingface.co/irlab-udc/Llama-3.1-8B-Instruct-Galician).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  library_name: transformers
9
  ---
10
 
11
+ 4-bit quantized version of [irlab-udc/Llama-3.1-8B-Instruct-Galician](https://huggingface.co/irlab-udc/Llama-3.1-8B-Instruct-Galician).
12
+
13
+ ## How to Use
14
+ ```python
15
+ import torch
16
+
17
+ from transformers import AutoModelForCausalLM, AutoTokenizer
18
+
19
+ model_id = "irlab-udc/Llama-3.1-8B-Instruct-Galician-GPTQ-Int4"
20
+
21
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
22
+
23
+ model = AutoModelForCausalLM.from_pretrained(
24
+ model_id,
25
+ torch_dtype=torch.float16,
26
+ low_cpu_mem_usage=True,
27
+ device_map="auto"
28
+ )
29
+
30
+ messages = [
31
+ {"role": "system", "content": "You are a conversational AI that responds in Galician."},
32
+ {"role": "user", "content": "Cal é a principal vantaxe de Scrum?"},
33
+ ]
34
+
35
+ inputs = tokenizer.apply_chat_template(
36
+ messages,
37
+ tokenize=True,
38
+ add_generation_prompt=True,
39
+ return_tensors="pt",
40
+ return_dict=True,
41
+ ).to("cuda")
42
+
43
+ outputs = model.generate(**inputs, do_sample=True, max_new_tokens=512)
44
+
45
+ print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
46
+ ```