Скачать с ютуб видео PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers Using Synthetic Scene Data

Скачать бесплатно и смотреть ютуб-видео без блокировок PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers Using Synthetic Scene Data в качестве 4к (2к / 1080p)

У нас вы можете посмотреть бесплатно PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers Using Synthetic Scene Data или скачать в максимальном доступном качестве, которое было загружено на ютуб. Для скачивания выберите вариант из формы ниже:

Загрузить музыку / рингтон PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers Using Synthetic Scene Data в формате MP3:

Если кнопки скачивания не загрузились НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием, пожалуйста напишите в поддержку по адресу внизу страницы.
Спасибо за использование сервиса savevideohd.ru

PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers Using Synthetic Scene Data

Authors: Roei Herzig; Ofir Abramovich; Elad Ben Avraham; Assaf Arbelle; Leonid Karlinsky; Ariel Shamir; Trevor Darrell; Amir Globerson Description: Action recognition models have achieved impressive results by incorporating scene-level annotations, such as objects, their relations, 3D structure, and more. However, obtaining annotations of scene structure for videos requires a significant amount of effort to gather and annotate, making these methods expensive to train. In contrast, synthetic datasets generated by graphics engines provide powerful alternatives for generating scene-level annotations across multiple tasks. In this work, we propose an approach to leverage synthetic scene data for improving video understanding. We present a multi-task prompt learning approach for video transformers, where a shared video transformer backbone is enhanced by a small set of specialized parameters for each task. Specifically, we add a set of “task prompts”, each corresponding to a different task, and let each prompt predict task-related annotations. This design allows the model to capture information shared among synthetic scene tasks as well as information shared between synthetic scene tasks and a real video downstream task throughout the entire network. We refer to this approach as “Promptonomy”, since the prompts model task-related structure. We propose the PromptonomyViT model (PViT), a video transformer that incorporates various types of scene-level information from synthetic data using the “Promptonomy” approach. PViT shows strong performance improvements on multiple video understanding tasks and datasets. Project page: https://ofir1080.github.io/Promptonom...

Comments