Paperback

Image Caption

$154.99

Add to wishlist

Image captioning with audio has emerged as a challenging yet promising task in the field of deep learning. This paper proposes a novel approach to address this task by integrating convolutional neural networks (CNNs) for image feature extraction and recurrent neural networks (RNNs) for sequential audio analysis. Specifically, we leverage pre-trained CNNs such as VGG to extract visual features from images and employ spectrogram representations coupled with RNNs such as LSTM or GRU to process audio inputs. Our proposed model based not only on their visual content but also on accompanying audio cues. We evaluate the performance of our model on benchmark datasets and demonstrate its effectiveness in generating coherent and contextually relevant captions for images with corresponding audio inputs. Additionally, we conduct tablation studies to analyze the contribution of each modality to the overall captioning performance, our results show that the fusion of visual and auditory modalities significantly improves captioning quality compared to using either modality in isolation.

In Shop

Out of stock

Shipping & Delivery

Available to order, ships in 2-3 weeks

$9.00 standard shipping within Australia
FREE standard shipping within Australia for orders over $100.00
Express & International shipping calculated at checkout

MORE INFO

Format

Paperback

Publisher

LAP Lambert Academic Publishing

Date

16 May 2024

Pages

ISBN

9786207647606

Format

Paperback

Publisher

LAP Lambert Academic Publishing

Date

16 May 2024

Pages

ISBN

9786207647606

Looking for something in particular?

Search our extensive online catalogue.

Cart ()

Readings Recommends

Subtotal (excludes shipping)

Re-send account confirmation

Reset your password

Reset your password

Image Caption

In Shop

Shipping & Delivery

Looking for something in particular?

Readings E-News