--- library_name: transformers tags: - trl - grpo - rl - superthoughts - reasoning - cot license: apache-2.0 datasets: - openai/gsm8k - Pinkstack/intructions-sft-sharegpt language: - en base_model: - HuggingFaceTB/SmolLM2-1.7B-Instruct pipeline_tag: text-generation widget: - messages: - role: user content: "You must act in a conversational matter and always include at the start ... ... tokens.\n\nAre cats cool?" - messages: - role: user content: "You must act in a conversational matter and always include at the start ... ... tokens.\n\nHello!" - messages: - role: user content: "You must act in a conversational matter and always include at the start ... ... tokens.\n\n2x-2=6, how much is X?" --- Free online chat with web search capabilities (demo): https://huggingface.co/spaces/Pinkstack/Chat-with-superthoughts-lite ![superthoughts lite](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/K5kYIHYj2aX2kB6MlcM9O.png) # Information Advanced, high-quality and **lite** reasoning for a tiny size that you can run on your phone. At original quality, it runs at ~400 tokens/second on a single H100 Nvidia GPU from Friendli. Trained similarly to Deepseek R1, we used Smollm2 as a base model, then we've SFT fine tuned on reasoning using our own private superthoughts instruct dataset which includes a mix of code, website generation, day-to-day chats, math and counting problems. And then we modified the tokenizer slightly, after the SFT fine tuning we used GRPO to further amplify it's mathematics & problem solving abilities. # Format ``` <|im_start|>user How many R's in strawberry<|im_end|> <|im_start|>assistant Alright, the user has asked how many R's in the word strawberry, that's easy! I just need to count each instance of the letter 'R' in the word 's-t-r-a-w-b-e-r-r-y' and then find out how many R's there are, lets count! S - Not an R, T - Not an R, R - First instance of the letter R! (1), A - Not an R, W - Not an R, B - Not an R, E - Not an R, R - Great! Second instance of the letter R. (2), R - Third instance of the letter R. (3), Y - Not an R. So, i've counted all the letters correctly, meaning that I am sure that there are 3 R's in the word Strawberry. I should probably let the user know. 3 <|im_end|> ``` It is very reccomend to use a low temperature, higher temperatures may cause it to not think. # system prompt (important to ensure it would always think, output). ``` respond in the following format: ... ... ``` # Examples: all responses below generated with our system prompt and a temperature of 0.7. Generated inside the android application, ChatterUI via GGUF Q8, using the model's prompt format. and our 1) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/5veZJmkjuv_7W7pKhvsu0.png) 2) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/pAwPdVkEZ7rnFf-TZ5tMU.png) 3) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/FDaWAAqgv2kvoZvjl8gjl.png) # Uploaded model - **Developed by:** Pinkstack - **License:** apache-2.0 - **Finetuned from model :** HuggingFaceTB/SmolLM2-1.7B-Instruct