Русские видео

Сейчас в тренде

Иностранные видео


Скачать с ютуб Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained в хорошем качестве

Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained 8 месяцев назад


Если кнопки скачивания не загрузились НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием, пожалуйста напишите в поддержку по адресу внизу страницы.
Спасибо за использование сервиса savevideohd.ru



Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained

Direct Preference Optimization (DPO) to finetune LLMs without reinforcement learning. DPO was one of the two Outstanding Main Track Runner-Up papers. ➡️ AI Coffee Break Merch! 🛍️ https://aicoffeebreak.creator-spring.... 📜 Rafailov, Rafael, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, and Chelsea Finn. "Direct preference optimization: Your language model is secretly a reward model." arXiv preprint arXiv:2305.18290 (2023). https://arxiv.org/abs/2305.18290 Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏 Dres. Trost GbR, Siltax, Vignesh Valliappan, ‪@Mutual_Information‬ , Kshitij Outline: 00:00 DPO motivation 00:53 Finetuning with human feedback 01:39 RLHF explained 03:05 DPO explained 04:24 Why Reinforcement Learning in the first place? 05:58 Shortcomings 06:50 Results ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ 🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕ Patreon:   / aicoffeebreak   Ko-fi: https://ko-fi.com/aicoffeebreak ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ 🔗 Links: AICoffeeBreakQuiz:    / aicoffeebreak   Twitter:   / aicoffeebreak   Reddit:   / aicoffeebreak   YouTube:    / aicoffeebreak   #AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research​ Video editing: Nils Trost Music 🎵 : Ice & Fire - King Canyon

Comments