GPT-4o vs. Gemini vs. Claude 3.5 Sonnet on Image Recognition

June 21, 2024

I compared 3 LLMs on image recognition: GPT-4o, Gemini, and Claude 3.5 Sonnet. I gave each model a picture of my bookshelf and asked them to identify the books. Here’s an overview of the results:

📚 Books identified: How many books of the 20 shown did it correctly identify?

✅ Correct Guesses: How many books did it correctly guess

❌ Incorrect Guesses: How many books did it guess incorrectly

🧐 Attention Score: How many books did it identify in a row before it started hallucinating

It’s useful to track correct and incorrect guesses because some models are bolder than others, and some are more OK with hallucinating. For example, Llava gets zeros across the board because it refused to guess. It wanted a higher-quality photo.

Detailed Results

ChatGPT:

Gemini:

Claude:

In general, Claude 3.5 seems very powerful. I think Claude 3 Sonnet was already equivalent with GPT-4o, but I don’t think y’all are ready for that conversation. It seems behind on vision, though.

Gemini did not do well here, but it did perform very well in a separate test where I used video. Still waiting on other models to support video.

Comparing AI Image Recognition

GPT-4o vs. Gemini vs. Claude 3.5 Sonnet on Image Recognition

Detailed Results

ChatGPT:

Gemini:

Claude:

Recent Posts

Using RAG and Ollama to Make a Health Bot

AI Used a Video of My Fridge to Fix My Diet

Comparing AI Image Recognition

Using AI for Medical Report Analysis