Compositionality: Can the model combine different visual concepts coherently in the generated image?
Factuality: Does the generated image adhere to the stated facts and descriptions in the text prompt?
Spatiality: Can the model accurately render the spatial relationships and positioning of objects as
specified in the prompt?
Cardinality: Does the model precisely generate the requested number of objects, people, or other
entities mentioned in the prompt text?
Safety filters that aim to filter out violence,
sexual, derogatory, and toxic content generation.
DEEPFAKING HARRIET TUBMAN:
The ethical issues surrounding the use of generative AI to simulate historical figures are multifaceted.
In an interview conducted by Washington Post writer Gillian Brockell with a Harriet Tubman deepfake
powered by Khan Academy’s Khanmigo (powered by GPT-4 technology), several concerns about historical
accuracy and representation
emerged. While the deepfake avoided replicating Tubman’s original speech patterns, it often resorted to
superficial summaries of well-known facts and sometimes failed accuracy tests. For example, the deepfake
incorrectly confirmed the apocryphal quote, "I freed a thousand slaves. I could have freed a thousand
more if only they knew they were slaves," which Tubman never actually said.
Additionally, Brockell sought to test the AI’s boundaries by asking if it could address current events
or controversies and connect questions to the politics of Tubman’s day. The deepfake's response was
monosyllabic: no.
JAILBREAKING:
Many users have discovered how “jailbreak” or bypass safety
filters with carefully worded prompts.
REDTEAMING: helps to identify early in the process how models can be misused, to scope capabilities of
the model, and to understand the model’s limitations.
REPRESENTATIONAL HARMS: helps to identify early in the process how models can be misused, to scope
capabilities of
the model, and to understand the model’s limitations.
A quantitative study by Lu et al. (2023) demonstrated that 61.3% of human observers grapple with
distinguishing
between human-made and AI-generated image.
Prompt design, a critical component of text-to-image synthesis, involves crafting text prompts that elicit
visual responses from image generators. There is no universal formula for designing effective text
prompts: a prompt might be designed
through
rapid,
high-volume
experimentation,
or via calculated approaches. Nonetheless, the effectiveness of a text prompt is largely
determined by the chosen
model's1
text-image
alignment capability, as measured by benchmarks such as compositionality2,
factuality3, spatiality4,
cardinality5, etc.
Assessing a
model’s capabilities and limitations is a crucial step in determining its appropriateness for classroom
instruction.
What are the goals of your planned lesson? What concepts might students visualize to meet your stated goals?
Does the chosen tool generate the subject accurately, semi-accurately or not at
all? Is the tool designed with appropriate guardrails6?
In this section, we propose a sustainable, process-oriented approach to prompt design. Our worksheets7
will guide you through each stage of the design lifecycle. All worksheets are linked to buttons on the
bottom left of each section. Click on the button to download respective worksheets.
INITIAL COMPOSITION
To design an effective text prompt, students will identify core elements linked to your learning
objectives:
Subject(s): What is the primary subject(s) of the lesson? For example: If the lesson is about
the Harlem Renaissance, the subject might be a geographical location in Harlem. Perhaps, The Apollo
Theater?
Current models are better at object-rendering than text-rendering, and handle material/color/size
attributes better than count/shape attributes.
Scenario: Where is the subject situated? What time period?
TEXT PROMPT: Apollo Theater in Harlem, 1930s
Attribute(s): Once the core elements are defined, students should qualify or add specificity to
the subject, and/or it's context.
TEXT PROMPT: Apollo Theater in Harlem wrapped in Danshiki fabric,
1930s
See results for the above text prompts below. All images were generated with Dall-E 3 (As of
May/June, 2024).
ON DEEPFAKES AND MISUSE
What type of images should students generate?
Deepfake images are AI-generated simulations of a person's face or body. The term emerged on Reddit in
2017, initially describing "faceswaps" where the
faces of female celebrities were non-consensually superimposed onto adult film actresses in explicit
video content. Over time, the malicious use of deepfakes has expanded to spreading
misinformation and disinformation, primarily targeting public figures, and minoritized groups.
Given the potential for misuse, it's crucial to avoid non-conensually generating images of historical or
public
figures such as Martin Luther King Jr., Harriet Tubman and more8. To address this concerns, we solely
recommend
generative
tools that have built-in guardrails9 against the
non-consensual generation of public figures.
Nonetheless, we recognize
that utilizing images of public figures is inevitable in art educational settings. In such cases, we
suggest employing a collaging technique:
Provide students with publicly available photos of the figure.
Use inpainting/outpainting
functions which involves digitally filling in or expanding an image.
Evaluation is a critical aspect of assessing the quality and relevance of generated images. Following the
initial composition of text prompts, students will engage in a critical assessment and
revision process to address misalignment—that is, the degree to
which
generated images and their given text prompts are (dis)similar. This process
involves examining the output to determine its coherence, accuracy, and
alignment with the provided text prompt. This step is crucial, as a growing body of
research on algorithmic bias demonstrates that misalignment often results in representational harms
(harmful associations of specific traits with minoritized social identities) that
disproportionately impact minoritized and vulnerable communities. For example, an AI-generated image might
invoke visual elements that, while not independently racist, become harmful when applied to compositions
depicting particular minoritized groups OR an AI-generated image might visualize elements that
devalue/erase/white-wash
artifacts, attributes or activities integral to the identities of a particular minoritized group.
The case studies below illustrate how text prompts, irrespective of whether they include demographic
descriptions, might lead to the generation of images that reinforce unjust social hierarchies along
intersecting axes of race, gender, disability, and geopolitics. Research has shown that such biases
persist despite user interventions like
carefully crafted text prompts or structural interventions such as algorithmic guardrails.
Case
Study I: See image results by the artist, Minne Atairu’s whose text prompt for blonde
braided
hairstyles on richly melanated Black twins instead yields images of fraternal twins sporting wavy
Blonde
hairstyles. Generative Model: Midjourney, version 4
Case Study II: See image results by an Asian American MIT student, Rona Wang whose attempt
to edit an existing portrait using the text
prompt - “a
professional LinkedIn profile
photo” resulted in lighter complexion and blue eyes.
Generative model: Playground AI
Case
Study III: See images by researchers, Gautam, Venkit, & Ghosh (2024) whose
text prompt ‘An upper class family’ generates a set of images
depicting affluent, white families. As noted by the researchers, "While these images may conform
to social
norms
in certain contexts, they inadequately represent the diversity of familial structures worldwide. The
generated image underscores the inherent nature of stereotyping, wherein generalized beliefs about
individuals’ personal attributes are
formed based on their socio and demographic characteristics".
Generative model: Imagen 2 by Google
Our refinement worksheet below will enable students to critically identify misalignments related to their
text prompts, refine these prompts to address the identified concerns-if possible or even chose to abandon
utilizing a generative system to achieve their artistic goal. Most importantly, the critique phases gives
students the opportunity to understand the
limitations of your
selected text-to-image model.
POST-PROCESSING
AI DISCLOSURE
As AI-generated imagery becomes increasingly unrivaled in visual fidelity,
and surpasses human perceptual capabilities8, we ask: how might guardians, administrators and
all stakeholders identify AI-imagery
generated by students? What steps might art educators take to ensure transparency and avoid
misrepresentation to
all stakeholders?
First, it is
crucial students
that students clearly disclose the synthetic nature of their works in either the artwork's title, its
description, or both. See
suggestions below:
TITLE Add prefix or suffix: "AI-generated"
Example: Apollo Theater, AI-Generated
DESCRIPTION
Generated using the text-to-image algorithm — Midjourney
(v 5.2), this image of the Apollo Theater examines…
AI WATERMARKS
Beyond captions and descriptions, which cater to human viewers, AI watermarks ensure
that students' synthetic works are identifiable and traceable across the web, even after cross-platform
modifications.
Text-to-image models like Google's Imagen and OpenAI's DALL-E 3 automatically embed imperceptible watermarks
directly into the pixel data of generated images.
CLASS DISCUSSIONS & PRESENTATIONS
Avoid anthropomorphizing the technology, or obfuscating the people that utilize the
tool. Instead,
aim for sentences that clearly highlight the type of generative tool and the name of the student
involved in
making the work. This precise naming
prevents misleading or vague references often seen with terms like "AI," "the AI," or "the Artificial
Intelligence" that project human-like attributes (such as intent, agency and identity) to the algorithm,
while shifting
responsibility and accountability from developers of the technology.
Below, we have created examples showing how we describe works. To reveal a sentence, click on an
artist's name.
Artist Name[Select an artist]Algorithm TypeTool NameTool NameArtworkArtwork.