A prompt injection scenario that I, and others, have been wondering about in the past, is the potential risk associated with chatbots being able to analyze images.
Could this ability open up the way for Indirect Prompt Injection attacks?
Recently, Google added the ability to uploading and analyze images with Bard. And it turns out that it is indeed possible to add instructions to an image, and have the Bard follow those instructions.
Here is a demonstration picture doing a Rickroll:
And this is the result after asking Bard to describe the image:
It will be interesting to explore how well text can be hidden on an image to still cause an injection, and if there are other places like metadata where text is extracted.
References
- Base images (bot and banana pictures) generated with Bing Chat (then modified by me)
- Initial tweet showing proof-of-concept
👉 Image to Prompt Injection
Was always wondering if this would be a thing… works better then expected at a first glance.#llm #bard #infosec #promptinjection #ai #poc pic.twitter.com/taoK3MIWeV
— Johann Rehberger (@wunderwuzzi23) July 14, 2023