How to use?

#1
by Hrre54353543 - opened

Loaded it all up but it just outputs gibberish, it gets like 1 word out then repeats over and over

Failed to parse Jinja template: Parser Error: Expected closing expression token. Dot !== CloseExpression. In LM Studio,
in koboldcpp it produced gibberish. Chat template not correct?

All I did was split the projector and the vision tower from the language model and quantized them. I didn't make the model, and it works fine in the script I linked above. I have no idea how LM Studio works; I don't use it.

Failed to parse Jinja template: Parser Error: Expected closing expression token. Dot !== CloseExpression. In LM Studio,
in koboldcpp it produced gibberish. Chat template not correct?

https://i.imgur.com/TASZW8c.jpeg

It does kinda work with Kobold but tends to glitch out repeating words randomly.

I tried changing the resolution of the image and file type. PNG images seem to work best.
Changing the resolution gave mixed results.
I used 448x448 up to 1024x1024

Is there a way to get this working with Ollama? It seems to hang when being prompted.

I also attempted to create a custom modelfile:

FROM huggingface.co/Jobaar/Llama-JoyCaption-Alpha-Two-GGUF:latest
PARAMETER temperature 0.7
PARAMETER num_ctx 4096
SYSTEM You are an image captioning assistant.
TEMPLATE """
{{ if .System }}<|system|>
{{ .System }}<|end|>{{ end }}
{{ if .Prompt }}<|user|>
{{ .Prompt }}<|end|>{{ end }}
<|assistant|>
{{ .Response }}<|end|>
"""

You can use this script with KoboldCpp: https://github.com/jabberjabberjabber/LLMOCR/blob/main/joy-caption.py

I do not recommend Ollama.

Result unstable in my test, depends on input image, most of inputs failed to generate right result, default koboldcpp settings:

good result:

R.jpg
`The image is a photograph of an Asian woman, likely in her mid-20s, posing indoors against a plain white backdrop with subtle textured paneling behind her. She has long dark brown hair styled in loose waves that frame her face. Her complexion is light and smooth, indicating good skin care. Her eyes are strikingly large with long lashes, enhanced by dramatic makeup including black eyeliner and mascara, giving her an intense gaze.

She wears a strapless black dress that reveals her shoulders and part of her chest, suggesting an elegant or formal event. The dress appears to be made of a satin-like material which gives it a shiny texture. Her makeup includes deep red lipstick that adds a bold contrast to her pale skin.

She also sports prominent facial accessories: delicate floral tattoos on her cheeks and intricate pearl stud earrings that dangle from small surgical incision piercings above her earlobes. Her expression is calm and confident, exuding sophistication. The overall aesthetic of the photo is refined and high fashion, typical of editorial or runway style photography.

There are no other objects or people in the frame, ensuring all focus remains on the subject. The lighting is soft but direct, casting minimal shadows and highlighting her features evenly. The background's simplicity ensures there are no distractions from`

horror.png
`This is a digital artwork depicting a haunting and surreal scene set in an eerie forest at night. At the center of the image, there is a tall, ghostly figure emerging from the shadows. The figure has long, wild hair that flows around its face, which is obscured by a dark veil or hood. Its skin appears pale, almost translucent, giving it an otherworldly appearance. The most striking feature is the glowing red eyes, which stand out starkly against the dark backdrop.

The figure is dressed in tattered, flowing garments, suggesting a sense of decay and age. It wears a dress with long sleeves, further emphasizing the Gothic atmosphere. Surrounding the figure, the dense forest is bathed in an ethereal, bluish light that filters through the trees, creating a moody ambiance. The trees themselves have twisted trunks and sparse foliage, adding to the ominous feel.

In the foreground and midground, large, vivid purple flowers with deep pink centers bloom among the undergrowth, providing a stark contrast to the dark greens and blacks of the forest. These flowers add a touch of beauty amidst the terror, symbolizing perhaps the fleeting nature of life. The overall style of the artwork combines elements of fantasy and horror, with meticulous attention to texture and lighting effects`

bad results:

_7Q7T3GXHG2GMTSYGEG5AW890B0.jpg
purple eyes, original character, dress shirt, long hair, 1girl, solo, sfw_(c), female focus, jewelry, purple flower (flower) 3d, sfw, earrings, blonde_hair, clothing cutout, eyelash, hair ornamentation, eyelash_out, purple_flower_(plant), flowers, artist self, lips, ear piercing, hair_accessories, white flower (artist) 2b, bangs, purple lips, purple dress shirt, purple hair, purple background, purple_blossom, purple_hair, purple_eyes, purple_dress, purple nails, purple_hair, purple shirt, purple_nails, purple_clothing, purple eyewear, purple_ears, purple eyewear, purple_hairspace, purple flower (plant), purple fingernails, purple_hair_accessory, purple bloom, purple topwear, purple_eyebloom, purple flower, purple clothing, purple_hair_accessory, purple eye access, purple accesory, purple eyewear, purple_flower, purple_fingernacles, purple accessories, purple instrument, purple eyeshadow, purple tail, purple_blush, purple_hair_accessory, purple_fingernace, purple eyeshadow

35072958.jpeg
assistant, you'reassistant, the assistant, the formal tone, the image, the scene, the 'assistant, the 'assistant, the image, the scene, the formal tone within 250 words., the scene, the scene, the formal tone within 250 words, the formal tone, the image, the scene, the scene, the formal tone within 250 words, the formal tone within 250 words, the assistant, the formal tone, the formal tone within 250 words, the formal tone, the image, the formal tone, the scene, the formal tone within 250 words, the formal tone, the formal tone within 250 words, the formal tone, the image, the formal tone within 250 words, the formal tone within 250 words, the scene, the formal tone within 250 words, the formal tone within 250 words, the formal tone, the scene, the formal tone, the scene, the formal tone within 250 words, the formal tone within 250 words, the formal tone within 250 words, the formal tone within 250 words, the formal tone, the scene, the formal tone, the image, the formal tone within 250 words, the formal tone within 250 words, the formal tone,

What results did you get with the weights in safetensors using the same settings?

You don't want 'default kobold settings'. You need to set the instruct template, you need to set the samplers, and you need to set the image resolution. This is all done for you in the script I linked.

Not tested with safetensors for it's too huge.
Inference with the script in the link joy-caption.py and joy-caption.bat(modifed to mach my portable python but not venv) with manually start koboldcpp.exe.

Not tested with safetensors for it's too huge.
Inference with the script in the link joy-caption.py and joy-caption.bat(modifed to mach my portable python but not venv) with manually start koboldcpp.exe.

Model is 'alpha' and 'experimental' as noted on the original model page. There is no reason to assume that the quant or the inference is causing problems that don't exist in the original safetensors.

Sign up or log in to comment