zzhhhh's picture
1

zzhhhh

zihan-aiml
·

AI & ML interests

(M)LLMs, LLM post training, LLM alignment, LLM Agent, LLM engineering

Recent Activity

Organizations

None yet

zihan-aiml's activity

view reply

Great work! I have two questions regarding the reward design:

  1. How do you balance the different reward components? I assume it's through trial and error, but I'm particularly interested in:

    • The scale of each reward component
    • How numerical adjustments impact the RL training process
    • The relative weights between different rewards
  2. Regarding the F1 score calculation: Is it computed based on the number of entries in your graph? I'm curious about the granularity of reward design, as different reward components seem to operate at different levels of detail:

    • Format reward appears to be one-dimensional
    • F1 reward seems to be a composite metric derived from multiple sub- level data points

This granularity difference in reward design could potentially affect the training dynamics. Would love to hear your thoughts on handling these different scales of feedback. Thanks :)

upvoted an article 5 days ago
view article
Article

Replicating DeepSeek R1 for Information Extraction

By Ihor
34