allenai/Molmo-7B-D-0924 · Multi-turn Chat History Management

Dec 4, 2024

Hey Team,

I'm finding hard to properly integrate chat history as shown in the Molmo Chat WebPage.

Would be great if you could help me out with this, :>)

chat_history = []

while True:
    print(f"{color_b}************ Enter Your Input Query: *****************{reset}")
    input_query = input()
    query_text = input_query + "\n"+ f"{chat_history}" if chat_history else input_query


    print("######################")

    if input_query == exit_trigger:
        print(f"{color_r}**************************Exiting Molmo Chat!***************************{reset}")
        break


    inputs = processor.process(images=ip_img, text=query_text)
    # Move inputs to the correct device and create a batch of size 1
    inputs = {k: v.to(model.device).unsqueeze(0) for k, v in inputs.items()}

    # Generate output; maximum 200 new tokens; stop generation when <|endoftext|> is generated
    output = model.generate_from_batch(
    inputs, GenerationConfig(max_new_tokens=500, stop_strings="<|endoftext|>"), tokenizer=processor.tokenizer
    )

    # Only get generated tokens; decode them to text
    generated_tokens = output[0, inputs["input_ids"].size(1) :]
    generated_text = processor.tokenizer.decode(generated_tokens, skip_special_tokens=True)

    # Print the generated text
    print(generated_text)


    # format_chat = f"Previous Query: {query_text}\nYour Previous Response: {generated_text}"
    format_chat = {
        "Previous Query": input_query,
        "Previous Model Response": generated_text
    }

    print("----------------------------------------------")
    print(format_chat)
    print("------------------------------------------------")

    chat_history = save_history(chat_history, save_text = format_chat, remove_prev_str = False)



def save_history(chat_history, save_text: str = '', remove_prev_str: bool = False, remove_len: int = 1):
    
    chat_history = [save_text]+chat_history
    
    if remove_prev_str:
        for _ in range(remove_len):
            chat_history.pop(0)
    
    return chat_history

Jaykumaran17

Dec 4, 2024

•

edited Dec 4, 2024

Can anyone clarify where i'm doing wrong, or what's the right approach to have Conversation Memory

chrishoertnagl

Jan 17

They way this is usually done is with a list of JSONs e.g.:
conversation = [{"role": "user", "content": "My message 1"}, {"role": "assistant", "content": "answer 1"}, {"role": "user", "content": "My message 2"}, {"role": "assistant", "content": "answer 2"}...]
And then you use a chat_template, which you can add to the tokenizer or processor, to take care of structuring it into a tokenizable conversation and do the tokenization. Like this:
query_text = processor.apply_chat_template(conversation, tokenize=False)
And then you can tokenize the way you did before:
inputs = processor.process(images=ip_img, text=query_text)

Your chat template and conversation format have to match and ideally it also matches what the model was tuned for. Since this one is Qwen based, I would recommend trying the Qwen chat template in case this processor and tokenizer do not have one.