AI & ML interests

None defined yet.

Recent Activity

nthakur  updated a dataset 9 days ago
freshstack/leaderboard-results
nthakur  updated a dataset 19 days ago
freshstack/leaderboard-results
nthakur  published a dataset 6 months ago
freshstack/leaderboard-results
View all activity

nthakur 
updated a Space 11 months ago
nthakur 
published a Space 11 months ago
nthakur 
posted an update 12 months ago
view post
Post
1866
Last year, I curated & generated a few multilingual SFT and DPO datasets by translating English SFT/DPO datasets into 9-10 languages using the mistralai/Mistral-7B-Instruct-v0.2 model.

I hope it helps the community for pretraining/instruction tuning multilingual LLMs! I added a small diagram to briefly describe which datasets are added and their sources.

Happy to collaborate in either using these datasets for instruction FT, or wishes to extend translated versions of newer SFT/DPO english datasets!

nthakur/multilingual-sft-and-dpo-datasets-67eaf56fe3feca5a57cf7d74