Readings Newsletter
Become a Readings Member to make your shopping experience even easier.
Sign in or sign up for free!
You’re not far away from qualifying for FREE standard shipping within Australia
You’ve qualified for FREE standard shipping within Australia
The cart is loading…
This title is printed to order. This book may have been self-published. If so, we cannot guarantee the quality of the content. In the main most books will have gone through the editing process however some may not. We therefore suggest that you be aware of this before ordering this book. If in doubt check either the author or publisher’s details as we are unable to accept any returns unless they are faulty. Please contact us if you have any questions.
Video captioning, the task of describing the content of a video in natural language, is a popular task both in computer vision and natural language processing. In the beginning, researchers try to generate sentence-level captions for short video clips (Venugopalan et al., 2015). Krishna et al. (2017) propose the task of dense video captioning. The system needs to detect event segments first and then generate captions. Park et al. (2019) propose the task of video paragraph captioning: they use ground-truth event segments and focus on generating coherent paragraphs. Lei et al. (2020) follow the task setting and propose a recurrent transformer model that can generate more coherent and less repetitive paragraphs. Considering the groundtruth event segments are often unavailable in practice, our goal is to generate paragraph captions without ground-truth segments.
$9.00 standard shipping within Australia
FREE standard shipping within Australia for orders over $100.00
Express & International shipping calculated at checkout
This title is printed to order. This book may have been self-published. If so, we cannot guarantee the quality of the content. In the main most books will have gone through the editing process however some may not. We therefore suggest that you be aware of this before ordering this book. If in doubt check either the author or publisher’s details as we are unable to accept any returns unless they are faulty. Please contact us if you have any questions.
Video captioning, the task of describing the content of a video in natural language, is a popular task both in computer vision and natural language processing. In the beginning, researchers try to generate sentence-level captions for short video clips (Venugopalan et al., 2015). Krishna et al. (2017) propose the task of dense video captioning. The system needs to detect event segments first and then generate captions. Park et al. (2019) propose the task of video paragraph captioning: they use ground-truth event segments and focus on generating coherent paragraphs. Lei et al. (2020) follow the task setting and propose a recurrent transformer model that can generate more coherent and less repetitive paragraphs. Considering the groundtruth event segments are often unavailable in practice, our goal is to generate paragraph captions without ground-truth segments.