Скачать с ютуб видео Model Distillation: Same LLM Power but 3240x Smaller

Скачать бесплатно и смотреть ютуб-видео без блокировок Model Distillation: Same LLM Power but 3240x Smaller в качестве 4к (2к / 1080p)

У нас вы можете посмотреть бесплатно Model Distillation: Same LLM Power but 3240x Smaller или скачать в максимальном доступном качестве, которое было загружено на ютуб. Для скачивания выберите вариант из формы ниже:

Загрузить музыку / рингтон Model Distillation: Same LLM Power but 3240x Smaller в формате MP3:

Если кнопки скачивания не загрузились НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием, пожалуйста напишите в поддержку по адресу внизу страницы.
Спасибо за использование сервиса savevideohd.ru

Model Distillation: Same LLM Power but 3240x Smaller

Foundation model performance at a fraction of the cost- model distillation is a powerful technique to leverage the advanced generation capabilities of foundation models like Llama 3.1 405B, GPT-4, or Claude Opus as teachers, distilling their knowledge and performance on a given task to a student model. The result is a task-specific lightweight language model that provides the same performance, capability, or style as the foundation model without all the extra parameters. In this video we demonstrate this by using Llama 3.1 405B to perform sentiment analysis on a dataset of tweets, and use that generated dataset to train RoBERTa, a 125 million parameter model, to perform with the same accuracy on tweet sentiment classification tasks. Comparable performance using a model 3240 times smaller! Resources: Code: https://github.com/ALucek/LLM-distill... Llama 3.1 405B Tweet Dataset: https://huggingface.co/datasets/AdamL... Distilled Model: https://huggingface.co/AdamLucek/robe... Moritz Laurer Blog: https://huggingface.co/blog/synthetic... AutoTrain: https://huggingface.co/autotrain A Survey on Knowledge Distillation of Large Language Models: https://arxiv.org/pdf/2402.13116 Chapters: 00:00 - Intro 01:11 - Model Distillation Trend 04:49 - Use Case: Instruction Following 05:45 - Use Case: Multi-Turn Dialogue 06:17 - Use Case: Retrieval Augmented Generation 06:59 - Use Case: Tool & Function Calling 07:52 - Use Case: Text Annotation 08:16 - Code: Distilling Llama 3.1 405B Overview 09:32 - Code: Initializing Tweet Dataset 10:57 - Code: Setting Up LLM & Annotation Prompt 15:10 - Code: Creating Annotated Dataset 17:25 - Training: RoBERTa & AutoTrain 18:30 - Training: Setting up AutoTrain Environment 19:02 - Training: Running Training Job on RoBERTa 21:42 - Evaluate: Using our Fine Tuned RoBERTa Model 22:23 - Evaluate: Visualizing Accuracy 23:37 - Evaluate: Visualizing Label Distribution 24:14 - Evaluate: Cost & Time Considerations 24:49 - Outro #machinelearning #ai #coding

Comments