Русские видео

Сейчас в тренде

Иностранные видео


Скачать с ютуб Large Model Training and Inference with DeepSpeed // Samyam Rajbhandari // LLMs in Prod Conference в хорошем качестве

Large Model Training and Inference with DeepSpeed // Samyam Rajbhandari // LLMs in Prod Conference 1 год назад


Если кнопки скачивания не загрузились НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием, пожалуйста напишите в поддержку по адресу внизу страницы.
Спасибо за использование сервиса savevideohd.ru



Large Model Training and Inference with DeepSpeed // Samyam Rajbhandari // LLMs in Prod Conference

//Abstract In the last few years, DeepSpeed has released numerous technologies for training and inference of large models, transforming the large model training landscape from a system perspective. Technologies like ZeRO, and 3D-Parallelism have become the building blocks for training large models at scale, powering LLMs like Bloom-176B, Megatron-Turing 530B, and many others. Heterogenous memory training systems like ZeRO-Offload and ZeRO-Infinity have democratized LLMs by making them accessible with limited resources. DeepSpeed-Inference and DeepSpeed-MII have made it easy to apply powerful inference optimizations to accelerate LLMs for deployment. As a result, DeepSpeed has been integrated directly into platforms like HuggingFace, PyTorch Lightning, and Mosiac ML. Similarly, the ZeRO family of technologies and 3D-Parallelism are offered as part of PyTorch, Colossal-AI, Megatron-LM, etc. In this talk, Samyam shares the journey of DeepSpeed as they navigated through the large model training landscape and built systems to extend it beyond what was possible. Samyam shares their motivations, insights, aha moments, and stories behind the technologies that are now part of DeepSpeed and have become the fundamental building blocks for training and inferencing large language models at scale. //Bio Samyam Rajbhandari is a co-founder and the system architect of DeepSpeed at Microsoft. He works on developing high performance infrastructures for accelerating large scale deep learning training and inference on parallel and distributed systems. He designed systems such as ZeRO and 3D parallelism that have been adopted many DL frameworks, has become the staple engine for training large language models, and has made it possible to train models like Turing-NLG 17.2B, Megtron-Turing 530B, and Bloom 176B, etc. On the inference front, he designs fast systems and leads optimization efforts for various transformer and MoE based LLMs architectures as well as more esoteric multi-modal architecture like DALL.e. His work on inference optimizations have been released as part of DeepSpeed-Inference, DeepSpeed-MII, and DeepSpeed-Chat, while also being used in multiple Microsoft systems and products such as Bing, Ads, AzureML to reduce latency, cost and improve capacity. Samyam received his PhD in Computer Science from The Ohio State University.

Comments