• 2 hours
  • Easy

Free online content available in this course.

course.header.alt.is_certifying

Got it!

Last updated on 8/19/24

Analyze Images with ChatGPT

Chapter banner

Did you think ChatGPT only worked with text?

Of course not! In theory, generative AI can work with a wide range of formats:

  • Text

  • Image

  • Sound

  • Video

You can even use text to generate an image, use an image to generate a video, or even use a video to generate text. 🤪

A few months after its launch, ChatGPT added images to its operating modes. We can get it to “read” images, and you’ll see how impressive it is! 😳

 Describe an Image in Detail

There are several ways to use ChatGPT’s “read images” feature.

The most obvious way is to send it an image and ask for a detailed description of it. 🤓

You can use your smartphone’s camera with the ChatGPT application:

Screenshot of the ChatGPT mobile app.
You can upload an image from the camera or your image library to the ChatGPT mobile app.

You can also use the web interface by simply clicking the “attachment” icon to the left of the text box:

Screenshot of the ChatGPT interface
You can also upload an image from your computer from ChatGPT’s web interface.

Select a photo (preferably not a private one, as everything sent to ChatGPT is used to train it!) and ask it to describe the image.

Screenshot of the ChatGPT interface
I can upload a photo to ChatGPT, and it describes everything it sees!

As shown below, I’ve uploaded the photo followed by the text “Describe this photo in detail” so it knows what to do with the photo. The description is really impressive, right?! 😲

🧔🏻‍♂: [Image] Describe this photo in detail

🤖: The photo shows a spacious room with a stone floor and three large arches opening onto a view of the calm blue sea and sky. There are three classical-style statues on pedestals, one lying in the center and the other two standing at each side. Two columns with ornate capitals stand at either side of the central statue. Natural light illuminates the scene, creating a peaceful and serene atmosphere. The architecture evokes classical elegance, and the sea view suggests a coastal location.

It works with just about anything—give it a try!

This type of functionality can be very helpful for people with impaired vision. You can use it to generate alt text to describe images that you post on the internet with the following HTML:

<img src="xxx" alt="Detailed description by ChatGPT" />

Apply Image Reasoning

That was impressive enough, but it can be taken further. We’re going to ask ChatGPT to do some work and apply reasoning from an image.

For example, you can:

  • 📗 take a photo of a book cover and ask for a summary of it.

  • 🌿 take a photo of a plant and ask what kind of plant it is and how to care for it.

  • 📐 take a photo of a geometry problem and ask for help to solve it 🫢 (Yes!).

What a great time saver! Now let’s try with a book:

Screenshot of the ChatGPT interface
ChatGPT provides a summary of the book, just from the cover!

Basically, don’t just ask ChatGPT what it “sees” in the image, but make it work with the image you upload. Most of the time, it understands what it sees and even shows quite an in-depth knowledge of the subject!

Get Help from an Image

Think of ChatGPT as a super assistant that can help you solve problems. For example, this works well for me when it comes to DIY questions. 😆

One example from OpenAI is to use it to ask for help with your bike!

Screenshot of the ChatGPT Mobile App
ChatGPT: “Help me to lower my bike seat”

Try it with some everyday questions you might have. Here are a few ideas to inspire you:

  • Fashion: “Using this photo of my closet, suggest appropriate outfits for the season to help me choose what to wear today.”

  • Food: “Using this photo of my fridge, suggest some simple, healthy recipes and plan my meals for the week.”

  • Gardening: “Using this photo of my outdoor space, show me how I could design my yard with plants adapted to my climate.”

That’s just a small sample of the possibilities! It’s up to you to ask for help.

Translate a Road Sign

Real-time on-site translation is another useful feature. You can take a photo of a sign or menu in a foreign country and ask ChatGPT to translate it. Remember: It speaks every language! 🤩

For example, I can upload this sign:

A road sign in Thailand
A road sign in Thailand

I can then ask what it means:

🧔🏻‍♂️: [Image]Translate this sign into English

🤖: The sign in the photo is in Thai. It says “ห้ามเข้า” which translates to “No Entry” in English. The bottom sentence is too blurry to read precisely, but it seems to be an explanation or addition to the main instruction.

This is already very useful for someone who doesn’t speak a word of Thai. 🫣

As you can see, ChatGPT isn’t perfect, since it can’t read the text on the bottom, which appears to be readable.

Convert an Image Into Another Format

Want to push the limits even further? Let’s do it! 😃

Why not upload a website mockup and ask it to generate the code for us? In theory, even a rough sketch can be used to create the website architecture.

Download this mockup (credit Shazzad Hossain on Dribbble):

A website mockup
A website mockup

And now ask ChatGPT to reproduce the mockup using code:

🧔🏻‍♂️: [Image] Generate HTML code to reproduce the following design

🤖: Here is a structure you can use as a starting point:

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>MARCHY Business</title>
<style>
    /* Add your CSS styles here */
    body {
        font-family: Arial, sans-serif;
        margin: 0;
        padding: 0;
        box-sizing: border-box;
    }

Sometimes, ChatGPT may not generate the code at first. Also, the code may not always be perfect or fully complete. From experience, it sometimes turns out okay, but you may have to ask it to work on it again.

You can ask it for more step-by-step details, indicating what you want it to complete or redo.

Let’s Recap

  • ChatGPT is not limited to processing text; it can also analyze images with its multimodal capabilities.

  • You can ask it to describe images, which is particularly useful for detailed descriptions or to assist people who are visually impaired.

  • ChatGPT can apply reasoning skills to an image in order to offer practical suggestions for everyday life, summaries from book covers, or assistance with geometry problems.

  • AI can be used as a real-time translation tool, interpreting signs or menus in foreign languages.

  • ChatGPT can convert images into code (e.g. convert a website mockup into HTML), although it may need some tweaking to achieve the desired result.

Impressive, isn’t it? After analyzing images, let’s have a look at ChatGPT’s ability to analyze data contained in files, using GPT-4.

Example of certificate of achievement
Example of certificate of achievement