ValueFX9507/Tifa-Deepsex-14b-CoT
Reinforcement Learning
•
Updated
•
68.3k
•
177
Additionally, we feed generated with structured prediction JSON data and feed them and text into DeepSeek-R1 Llama 70B to generate a chain of thought that can explain the extraction process.
Why don't you use R1 original (>600B) to get the best results?