openvino-ci commited on
Commit
4bb20ca
·
verified ·
1 Parent(s): cd3e8c0

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,34 +1,33 @@
1
  ---
2
  license: apache-2.0
3
- language:
4
- - en
5
  ---
6
-
7
- # Mistral-7b-Instruct-v0.1-int4-ov
8
-
9
- * Model creator: [Mistral AI](https://huggingface.co/mistralai)
10
- * Original model: [Mistral-7b-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)
11
 
12
  ## Description
13
-
14
- This is [Mistral-7b-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) model converted to the [OpenVINO™ IR](https://docs.openvino.ai/2024/documentation/openvino-ir-format.html) (Intermediate Representation) format with weights compressed to INT8 by [NNCF](https://github.com/openvinotoolkit/nncf).
15
 
16
  ## Quantization Parameters
17
 
18
  Weight compression was performed using `nncf.compress_weights` with the following parameters:
19
 
20
- * mode: **INT8_ASYM**
 
 
 
 
21
 
22
- For more information on quantization, check the [OpenVINO model optimization guide](https://docs.openvino.ai/2024/openvino-workflow/model-optimization-guide/weight-compression.html)
23
 
24
  ## Compatibility
25
 
26
  The provided OpenVINO™ IR model is compatible with:
27
 
28
- * OpenVINO version 2024.2.0 and higher
29
  * Optimum Intel 1.16.0 and higher
30
 
31
- ## Running Model Inference with [Optimum Intel](https://huggingface.co/docs/optimum/intel/index)
32
 
33
  1. Install packages required for using [Optimum Intel](https://huggingface.co/docs/optimum/intel/index) integration with the OpenVINO backend:
34
 
@@ -42,56 +41,26 @@ pip install optimum[openvino]
42
  from transformers import AutoTokenizer
43
  from optimum.intel.openvino import OVModelForCausalLM
44
 
45
- model_id = "OpenVINO/mistral-7b-instrcut-v0.1-int4-ov"
46
  tokenizer = AutoTokenizer.from_pretrained(model_id)
47
  model = OVModelForCausalLM.from_pretrained(model_id)
48
 
49
  inputs = tokenizer("What is OpenVINO?", return_tensors="pt")
50
 
51
- outputs = model.generate(inputs, max_new_tokens=20)
52
- print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 
53
  ```
54
 
55
  For more examples and possible optimizations, refer to the [OpenVINO Large Language Model Inference Guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html).
56
 
57
- # Running Model Inference with [OpenVINO GenAI](https://github.com/openvinotoolkit/openvino.genai)
58
-
59
- 1. Install packages required for using OpenVINO GenAI.
60
- ```
61
- pip install openvino-genai huggingface_hub
62
- ```
63
-
64
- 2. Download model from HuggingFace Hub
65
-
66
- ```
67
- import huggingface_hub as hf_hub
68
-
69
- model_id = "OpenVINO/mistral-7b-instrcut-v0.1-int4-ov"
70
- model_path = "mistral-7b-instrcut-v0.1-int4-ov"
71
-
72
- hf_hub.snapshot_download(model_id, local_dir=model_path)
73
-
74
- ```
75
-
76
- 3. Run model inference:
77
-
78
- ```
79
- import openvino_genai as ov_genai
80
-
81
- device = "CPU"
82
- pipe = ov_genai.LLMPipeline(model_path, device)
83
- print(pipe.generate("What is OpenVINO?"))
84
- ```
85
-
86
- More GenAI usage examples can be found in OpenVINO GenAI library [docs](https://github.com/openvinotoolkit/openvino.genai/blob/master/src/README.md) and [samples](https://github.com/openvinotoolkit/openvino.genai?tab=readme-ov-file#openvino-genai-samples)
87
-
88
  ## Limitations
89
 
90
- Check the original model card for [limitations](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1#limitations).
91
 
92
  ## Legal information
93
 
94
- The original model is distributed under [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) license. More details can be found in [original model card](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1).
95
 
96
  ## Disclaimer
97
 
 
1
  ---
2
  license: apache-2.0
3
+ license_link: https://choosealicense.com/licenses/apache-2.0/
 
4
  ---
5
+ # mistral-7b-instruct-v0.1-int4-ov
6
+ * Model creator: [Mistralai](https://huggingface.co/mistralai)
7
+ * Original model: [Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)
 
 
8
 
9
  ## Description
10
+ This is [Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) model converted to the [OpenVINO™ IR](https://docs.openvino.ai/2024/documentation/openvino-ir-format.html) (Intermediate Representation) format with weights compressed to INT4 by [NNCF](https://github.com/openvinotoolkit/nncf).
 
11
 
12
  ## Quantization Parameters
13
 
14
  Weight compression was performed using `nncf.compress_weights` with the following parameters:
15
 
16
+ * mode: **int4_asym**
17
+ * ratio: **0.8**
18
+ * group_size: **128**
19
+
20
+ For more information on quantization, check the [OpenVINO model optimization guide](https://docs.openvino.ai/2024/openvino-workflow/model-optimization-guide/weight-compression.html).
21
 
 
22
 
23
  ## Compatibility
24
 
25
  The provided OpenVINO™ IR model is compatible with:
26
 
27
+ * OpenVINO version 2024.1.0 and higher
28
  * Optimum Intel 1.16.0 and higher
29
 
30
+ ## Running Model Inference
31
 
32
  1. Install packages required for using [Optimum Intel](https://huggingface.co/docs/optimum/intel/index) integration with the OpenVINO backend:
33
 
 
41
  from transformers import AutoTokenizer
42
  from optimum.intel.openvino import OVModelForCausalLM
43
 
44
+ model_id = "OpenVINO/mistral-7b-instruct-v0.1-int4-ov"
45
  tokenizer = AutoTokenizer.from_pretrained(model_id)
46
  model = OVModelForCausalLM.from_pretrained(model_id)
47
 
48
  inputs = tokenizer("What is OpenVINO?", return_tensors="pt")
49
 
50
+ outputs = model.generate(**inputs, max_length=200)
51
+ text = tokenizer.batch_decode(outputs)[0]
52
+ print(text)
53
  ```
54
 
55
  For more examples and possible optimizations, refer to the [OpenVINO Large Language Model Inference Guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html).
56
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
  ## Limitations
58
 
59
+ Check the original model card for [limitations]().
60
 
61
  ## Legal information
62
 
63
+ The original model is distributed under [apache-2.0](https://choosealicense.com/licenses/apache-2.0/) license. More details can be found in [original model card](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1).
64
 
65
  ## Disclaimer
66
 
config.json CHANGED
@@ -19,7 +19,7 @@
19
  "rope_theta": 10000.0,
20
  "sliding_window": 4096,
21
  "tie_word_embeddings": false,
22
- "transformers_version": "4.40.1",
23
  "use_cache": true,
24
  "vocab_size": 32000
25
- }
 
19
  "rope_theta": 10000.0,
20
  "sliding_window": 4096,
21
  "tie_word_embeddings": false,
22
+ "transformers_version": "4.41.2",
23
  "use_cache": true,
24
  "vocab_size": 32000
25
+ }
generation_config.json CHANGED
@@ -2,5 +2,5 @@
2
  "_from_model_config": true,
3
  "bos_token_id": 1,
4
  "eos_token_id": 2,
5
- "transformers_version": "4.40.1"
6
  }
 
2
  "_from_model_config": true,
3
  "bos_token_id": 1,
4
  "eos_token_id": 2,
5
+ "transformers_version": "4.41.2"
6
  }
openvino_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f7290c3cea473ee6b893af9a3327329f8d3c9c056a922ff6a57ca083e7bb12d7
3
- size 7280406284
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:99eedd1fa219d94465dc7874ee240d0f515b23709f90bf92ab3bf589f64d59f6
3
+ size 4617228256
openvino_model.xml CHANGED
The diff for this file is too large to render. See raw diff
 
tokenizer.json CHANGED
@@ -31,23 +31,13 @@
31
  "special": true
32
  }
33
  ],
34
- "normalizer": {
35
- "type": "Sequence",
36
- "normalizers": [
37
- {
38
- "type": "Prepend",
39
- "prepend": "▁"
40
- },
41
- {
42
- "type": "Replace",
43
- "pattern": {
44
- "String": " "
45
- },
46
- "content": "▁"
47
- }
48
- ]
49
  },
50
- "pre_tokenizer": null,
51
  "post_processor": {
52
  "type": "TemplateProcessing",
53
  "single": [
 
31
  "special": true
32
  }
33
  ],
34
+ "normalizer": null,
35
+ "pre_tokenizer": {
36
+ "type": "Metaspace",
37
+ "replacement": "▁",
38
+ "prepend_scheme": "first",
39
+ "split": false
 
 
 
 
 
 
 
 
 
40
  },
 
41
  "post_processor": {
42
  "type": "TemplateProcessing",
43
  "single": [
tokenizer_config.json CHANGED
@@ -29,10 +29,10 @@
29
  },
30
  "additional_special_tokens": [],
31
  "bos_token": "<s>",
32
- "chat_template": "{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token + ' ' }}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}",
33
  "clean_up_tokenization_spaces": false,
34
  "eos_token": "</s>",
35
- "legacy": true,
36
  "model_max_length": 1000000000000000019884624838656,
37
  "pad_token": null,
38
  "sp_model_kwargs": {},
 
29
  },
30
  "additional_special_tokens": [],
31
  "bos_token": "<s>",
32
+ "chat_template": "{%- if messages[0]['role'] == 'system' %}\n {%- set system_message = messages[0]['content'] %}\n {%- set loop_messages = messages[1:] %}\n{%- else %}\n {%- set loop_messages = messages %}\n{%- endif %}\n\n{{- bos_token }}\n{%- for message in loop_messages %}\n {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}\n {{- raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...') }}\n {%- endif %}\n {%- if message['role'] == 'user' %}\n {%- if loop.first and system_message is defined %}\n {{- ' [INST] ' + system_message + '\\n\\n' + message['content'] + ' [/INST]' }}\n {%- else %}\n {{- ' [INST] ' + message['content'] + ' [/INST]' }}\n {%- endif %}\n {%- elif message['role'] == 'assistant' %}\n {{- ' ' + message['content'] + eos_token}}\n {%- else %}\n {{- raise_exception('Only user and assistant roles are supported, with the exception of an initial optional system message!') }}\n {%- endif %}\n{%- endfor %}\n",
33
  "clean_up_tokenization_spaces": false,
34
  "eos_token": "</s>",
35
+ "legacy": false,
36
  "model_max_length": 1000000000000000019884624838656,
37
  "pad_token": null,
38
  "sp_model_kwargs": {},