Understanding Visual Language Models

Visual language models show widespread visual deficits on neuropsychological tests

Counterintuitively, the mental abilities that seem simplest to humans are often the hardest to achieve in artificial intelligence (AI)—a fact known as Moravec’s paradox 1. The most well-known example ...

Nature

Visual cognition in multimodal large language models

A chief goal of artificial intelligence is to build machines that think like people. Yet it has been argued that deep neural network architectures fail to accomplish this. Researchers have asserted ...

TechCrunch

‘Visual’ AI models might not see anything at all

The latest round of language models, like GPT-4o and Gemini 1.5 Pro, are touted as “multimodal,” able to understand images and audio as well as text. But a new study makes clear that they don’t really ...

EurekAlert!

Assessing and understanding creativity in large language models

A TTCT-inspired dataset was constructed to evaluate LLMs under varied prompts and role-play settings. GPT-4 served as the evaluator to score model outputs. In recent years, the realm of artificial ...

Tech Times

AI Chart Understanding Breakthrough: MIT-IBM Dataset Lets Small Models Beat GPT-4o

MIT and IBM released ChartNet, a 1.7-million-sample synthetic training dataset that lets compact open-source vision-language ...

Science News

AI’s understanding and reasoning skills can’t be assessed by current tests

“Sparks of artificial general intelligence,” “near-human levels of comprehension,” “top-tier reasoning capacities.” All of these phrases have been used to describe large language models, which drive ...

Forbes

The Next Leap In AI: From Large Language Models To Large World Models?

The realm of artificial intelligence (AI) may be on the cusp of a new transformative leap, transitioning from Large Language Models (LLMs) to an innovative and expansive concept, which we may call ...

Ars Technica

Microsoft unveils AI model that understands image content, solves visual puzzles

On Monday, researchers from Microsoft introduced Kosmos-1, a multimodal model that can reportedly analyze images for content, solve visual puzzles, perform visual text recognition, pass visual IQ ...

VentureBeat

Alibaba releases new AI model Qwen2-VL that can analyze videos more than 20 minutes long

Alibaba Cloud, the cloud services and storage division of the Chinese e-commerce giant, has announced the release of Qwen2-VL, its latest advanced vision-language model designed to enhance visual ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results