--- language: - en tags: - openvino --- # ibm-granite/granite-8b-code-instruct This is the [ibm-granite/granite-8b-code-instruct](https://huggingface.co/ibm-granite/granite-8b-code-instruct) model converted to [OpenVINO](https://openvino.ai) with INT8 weights compression for accelerated inference. An example of how to do inference on this model: ```python # pip install optimum[openvino] from transformers import AutoTokenizer from optimum.intel import OVModelForCausalLM model_path = "helenai/ibm-granite-granite-8b-code-instruct-ov" tokenizer = AutoTokenizer.from_pretrained(model_path) model = OVModelForCausalLM.from_pretrained(model_path) # change input text as desired chat = [ { "role": "user", "content": "Write a code to find the maximum value in a list of numbers." }, ] chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True) # tokenize the text input_tokens = tokenizer(chat, return_tensors="pt") # generate output tokens output = model.generate(**input_tokens, max_new_tokens=100) # decode output tokens into text output = tokenizer.batch_decode(output) # loop over the batch to print, in this example the batch size is 1 for i in output: print(i) ```