Image Captioning Model

Novel concept-based image captioning models using LSTM and multi-encoder transformer architecture

Captioning an image involves using a combination of vision and language models to describe the image in an expressive and concise sentence. Successful captioning task requires extracting as much ...

9to5Mac

Apple trained an AI that captions images better than models ten times its size

Apple researchers have developed a new way to train AI models for image captioning that delivers more accurate, detailed descriptions while using far smaller models. Here are the details. In a new ...

Nature

Visual spatial relationship sensitive transformer for image captioning

Image captioning is a cross-modal task that combines computer vision and natural language processing to generate natural language descriptions of visual content. Recent advances have explored the ...

Democrat and Chronicle

Memories.ai Recognized as a Leading Video Understanding Model for Video Caption

Memories.ai, the pioneering AI company founded by former Meta Reality Labs researchers, today announced it has been recognized as a leading video understanding model for video caption by the ...

9to5Mac

You can try Apple’s lightning-fast video captioning model right from your browser

A few months ago, Apple released FastVLM, a Visual Language Model (VLM) that offered near-instant high-resolution image processing. Now, you can take it for a spin, provided you have an Apple ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results