Image Inputs vs Text Inputs

When you create prompts in BetterPrompt, you can collect information from users in two main ways: image inputs and text inputs.

Designing the right mix of both is key to getting reliable results—especially for visual workflows.

What text inputs are best for

Text inputs are free‑form fields where the runner types information, such as:

A description of their product or idea.
Instructions about tone, style, or length.
Context about the audience or use case.

Use text inputs when:

The information is easiest to express in words.
You need structured instructions (e.g. "write a 3‑paragraph summary").
The user's data is not naturally an image (plans, goals, constraints, copy).

Text inputs are flexible, easy to validate, and work with any text‑based model.

What image inputs are best for

Image inputs let the runner upload or attach one or more images for the prompt to use.

Use image inputs when:

The prompt is designed around user photos: this unlocks powerful use cases like letting users recreate or transform an image using their own picture.
It's an "image is worth a thousand words" scenario: the details you want can't be fully described in text, so supplying the actual image gives the model essential context.
Use image inputs when the visual itself is critical to the request, whether for generating, recreating, or closely matching what the user provides.

In these cases, text alone can't capture all the relevant details; giving the model direct access to the image dramatically improves results.

Choosing between image and text (or using both)

Many strong prompts use both image and text inputs together.

You can think about it like this:

Only text: Good when the user can describe everything the model needs to know.
Only image: Rare; useful for pure "describe this image" or "recreate the image with my photo" tasks.
Image + text: Ideal for most design, branding, and visual workflows:
- Image for raw visual context.
- Text for instructions, goals, and constraints.

Example:

Image input: a selfie image or a screenshot of a landing page.
Text inputs:
- {{ target_audience }} – who the page is for.
- {{ primary_goal }} – what the page should get users to do.
- {{ tone }} – how the copy should feel.

Making image inputs easy to use

When you add an image input to your prompt:

Be explicit in the label about what to upload (e.g. "Your selfie photo").
Mention any constraints like aspect ratio or file type, and rely on built‑in tools (like cropping) where available.
If your prompt assumes a specific type of image (e.g. logo sheet vs. full UI), say that clearly in the description.

Clear guidance reduces bad uploads and leads to much more consistent results.

Referenced images vs. user image inputs

You can also combine referenced images (attached once by you, the author) with image inputs provided by users:

Use a referenced image when every run should use the same visual baseline (for example, a brand style guide).
Use a user image input when each run needs its own image (for example, "the design you want critiqued").

In your hidden prompt logic, explain how the model should use both:

txt

You are given a reference image that shows our brand style and layout.
You are also given a user‑uploaded image of their current design.

Compare the user's design to the reference and provide concrete suggestions for improvement.

Design your prompts so text inputs handle what the user wants, while image inputs (and referenced images) provide what the model needs to see to do a great job.

Image Inputs vs Text Inputs ​

What text inputs are best for ​

What image inputs are best for ​

Choosing between image and text (or using both) ​