Text Generation
Transformers
Safetensors
English
mistral
text-generation-inference
Inference Endpoints

Continued pre-training with replay?

#4
by ostapeno - opened

Dear authors,
thank you for the great work.
Regarding the continued pre-training experiments conducted in your work, did you use replay to prevent forgetting the knowledge obtained by the model in stage 1 training?

Thank you in advance!

Thanks for the question. We did not use replay in our experiments. Instead, we mixed domain-specific instruction-augmented corpora with general instructions to help maintain the model's general capabilities while adapting to the target domain.

Sign up or log in to comment