1. TEXT-TO-IMAGE GENERATORS
    Midjourney Dalle-3 Firefly ImageFX Eden
  2. Compositionality: Can the model combine different visual concepts coherently in the generated image?
  3. Factuality: Does the generated image adhere to the stated facts and descriptions in the text prompt?
  4. Spatiality: Can the model accurately render the spatial relationships and positioning of objects as specified in the prompt?
  5. Cardinality: Does the model precisely generate the requested number of objects, people, or other entities mentioned in the prompt text?
  6. Safety filters that aim to filter out violence, sexual, derogatory, and toxic content generation.
  7. DEEPFAKING HARRIET TUBMAN: The ethical issues surrounding the use of generative AI to simulate historical figures are multifaceted. In an interview conducted by Washington Post writer Gillian Brockell with a Harriet Tubman deepfake powered by Khan Academy’s Khanmigo (powered by GPT-4 technology), several concerns about historical accuracy and representation emerged. While the deepfake avoided replicating Tubman’s original speech patterns, it often resorted to superficial summaries of well-known facts and sometimes failed accuracy tests. For example, the deepfake incorrectly confirmed the apocryphal quote, "I freed a thousand slaves. I could have freed a thousand more if only they knew they were slaves," which Tubman never actually said.

    Additionally, Brockell sought to test the AI’s boundaries by asking if it could address current events or controversies and connect questions to the politics of Tubman’s day. The deepfake's response was monosyllabic: no.
  8. JAILBREAKING: Many users have discovered how “jailbreak” or bypass safety filters with carefully worded prompts.
  9. REDTEAMING: helps to identify early in the process how models can be misused, to scope capabilities of the model, and to understand the model’s limitations.
  10. Blonde Braids Study (2023), Minne Atairu Descriptive text about the image Descriptive text about the image
  11. EDUCATOR WORKSHEETS
  12. REPRESENTATIONAL HARMS: helps to identify early in the process how models can be misused, to scope capabilities of the model, and to understand the model’s limitations.
  13. A quantitative study by Lu et al. (2023) demonstrated that 61.3% of human observers grapple with distinguishing between human-made and AI-generated image.
  14. Acuff, J. B. (2020). “Y’aaaaas,”“Okay nah,” and other Black Woman Utterances About a Proposed Cultural Production Approach. Journal of Curriculum Theorizing, 35(3).
  15. Burton, J. M., Horowitz, R., & Abeles, H. (2000). Learning in and through the arts: The question of transfer. Studies in art education, 41(3), 228-257.
  16. Crawley, A. (2018). After Over-Representation, Care. ASAP/Journal, 3(2), 303-306.
  17. Gautam, S., Venkit, P. N., & Ghosh, S. (2024). From melting pots to misrepresentations: Exploring harms in generative ai. arXiv preprint arXiv:2403.10776.
  18. Grimal, P., Le Borgne, H., Ferret, O., & Tourille, J. (2024). TIAM-A Metric for Evaluating Alignment in Text-to-Image Generation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 2890-2899).
  19. Liang, Y., He, J., Li, G., Li, P., Klimovskiy, A., Carolan, N., ... & Navalpakkam, V. (2023). Rich Human Feedback for Text-to-Image Generation. arXiv preprint arXiv:2312.10240.
  20. Popova, M. (2020). Reading out of context: pornographic deepfakes, celebrity and intimacy. Porn Studies, 7(4), 367-381.
  21. Rolling Jr, J. H. (2013). Art as social response and responsibility: Reframing critical thinking in art education as a basis for altruistic intent. Art Education, 66(2), 6-12.
  22. Wa Thiong'o, N. (1998). Decolonising the mind. Diogenes, 46(184), 101-104.
PROMPT DESIGN
Prompt design, a critical component of text-to-image synthesis, involves crafting text prompts that elicit visual responses from image generators. There is no universal formula for designing effective text prompts: a prompt might be designed through rapid, high-volume experimentation, or via calculated approaches. Nonetheless, the effectiveness of a text prompt is largely determined by the chosen model's1 text-image alignment capability, as measured by benchmarks such as compositionality2, factuality3, spatiality4, cardinality5, etc.

Assessing a model’s capabilities and limitations is a crucial step in determining its appropriateness for classroom instruction. What are the goals of your planned lesson? What concepts might students visualize to meet your stated goals? Does the chosen tool generate the subject accurately, semi-accurately or not at all? Is the tool designed with appropriate guardrails6?

In this section, we propose a sustainable, process-oriented approach to prompt design. Our worksheets7 will guide you through each stage of the design lifecycle. All worksheets are linked to buttons on the bottom left of each section. Click on the button to download respective worksheets.

INITIAL COMPOSITION
To design an effective text prompt, students will identify core elements linked to your learning objectives:
  1. Subject(s): What is the primary subject(s) of the lesson? For example: If the lesson is about the Harlem Renaissance, the subject might be a geographical location in Harlem. Perhaps, The Apollo Theater?

    Current models are better at object-rendering than text-rendering, and handle material/color/size attributes better than count/shape attributes.

  2. Scenario: Where is the subject situated? What time period?

    TEXT PROMPT: Apollo Theater in Harlem, 1930s

  3. Attribute(s): Once the core elements are defined, students should qualify or add specificity to the subject, and/or it's context.

    TEXT PROMPT: Apollo Theater in Harlem wrapped in Danshiki fabric, 1930s

    ADDITIONAL ATTRIBUTES: African cherubs with blingy rhinestone eyes, Kente cloth, 1990s Harlem, grainy vintage photograph

See results for the above text prompts below. All images were generated with Dall-E 3 (As of May/June, 2024).
ON DEEPFAKES AND MISUSE

What type of images should students generate?

Deepfake images are AI-generated simulations of a person's face or body. The term emerged on Reddit in 2017, initially describing "faceswaps" where the faces of female celebrities were non-consensually superimposed onto adult film actresses in explicit video content. Over time, the malicious use of deepfakes has expanded to spreading misinformation and disinformation, primarily targeting public figures, and minoritized groups.

Given the potential for misuse, it's crucial to avoid non-conensually generating images of historical or public figures such as Martin Luther King Jr., Harriet Tubman and more8. To address this concerns, we solely recommend generative tools that have built-in guardrails9 against the non-consensual generation of public figures. Nonetheless, we recognize that utilizing images of public figures is inevitable in art educational settings. In such cases, we suggest employing a collaging technique:
  1. Provide students with publicly available photos of the figure.
  2. Use inpainting/outpainting functions which involves digitally filling in or expanding an image.
For detailed guidance, please refer to our lesson plan on Hip Hop Culture.
Image 1 Image 2 Image 3 Image 4 Image 1 Image 2 Image 3 Image 4
CRITIQUE & REFINEMENT
Evaluation is a critical aspect of assessing the quality and relevance of generated images. Following the initial composition of text prompts, students will engage in a critical assessment and revision process to address misalignment—that is, the degree to which generated images and their given text prompts are (dis)similar. This process involves examining the output to determine its coherence, accuracy, and alignment with the provided text prompt. This step is crucial, as a growing body of research on algorithmic bias demonstrates that misalignment often results in representational harms (harmful associations of specific traits with minoritized social identities) that disproportionately impact minoritized and vulnerable communities. For example, an AI-generated image might invoke visual elements that, while not independently racist, become harmful when applied to compositions depicting particular minoritized groups OR an AI-generated image might visualize elements that devalue/erase/white-wash artifacts, attributes or activities integral to the identities of a particular minoritized group.

The case studies below illustrate how text prompts, irrespective of whether they include demographic descriptions, might lead to the generation of images that reinforce unjust social hierarchies along intersecting axes of race, gender, disability, and geopolitics. Research has shown that such biases persist despite user interventions like carefully crafted text prompts or structural interventions such as algorithmic guardrails.
  1. Case Study I: See image results by the artist, Minne Atairu’s whose text prompt for blonde braided hairstyles on richly melanated Black twins instead yields images of fraternal twins sporting wavy Blonde hairstyles.
    Generative Model: Midjourney, version 4

  2. Case Study II: See image results by an Asian American MIT student, Rona Wang whose attempt to edit an existing portrait using the text prompt - “a professional LinkedIn profile photo” resulted in lighter complexion and blue eyes.
    Generative model: Playground AI

  3. Case Study III: See images by researchers, Gautam, Venkit, & Ghosh (2024) whose text prompt ‘An upper class family’ generates a set of images depicting affluent, white families. As noted by the researchers, "While these images may conform to social norms in certain contexts, they inadequately represent the diversity of familial structures worldwide. The generated image underscores the inherent nature of stereotyping, wherein generalized beliefs about individuals’ personal attributes are formed based on their socio and demographic characteristics".
    Generative model: Imagen 2 by Google
Our refinement worksheet below will enable students to critically identify misalignments related to their text prompts, refine these prompts to address the identified concerns-if possible or even chose to abandon utilizing a generative system to achieve their artistic goal. Most importantly, the critique phases gives students the opportunity to understand the limitations of your selected text-to-image model.
POST-PROCESSING


AI DISCLOSURE
As AI-generated imagery becomes increasingly unrivaled in visual fidelity, and surpasses human perceptual capabilities8, we ask: how might guardians, administrators and all stakeholders identify AI-imagery generated by students? What steps might art educators take to ensure transparency and avoid misrepresentation to all stakeholders?

First, it is crucial students that students clearly disclose the synthetic nature of their works in either the artwork's title, its description, or both. See suggestions below:

  1. TITLE
    Add prefix or suffix: "AI-generated"
    Example: Apollo Theater, AI-Generated

  2. DESCRIPTION
    Generated using the text-to-image algorithm — Midjourney (v 5.2), this image of the Apollo Theater examines…


AI WATERMARKS
Beyond captions and descriptions, which cater to human viewers, AI watermarks ensure that students' synthetic works are identifiable and traceable across the web, even after cross-platform modifications. Text-to-image models like Google's Imagen and OpenAI's DALL-E 3 automatically embed imperceptible watermarks directly into the pixel data of generated images.

To check the provenance of an image, use Maybe's AI Art Detector.

CLASS DISCUSSIONS & PRESENTATIONS
Avoid anthropomorphizing the technology, or obfuscating the people that utilize the tool. Instead, aim for sentences that clearly highlight the type of generative tool and the name of the student involved in making the work. This precise naming prevents misleading or vague references often seen with terms like "AI," "the AI," or "the Artificial Intelligence" that project human-like attributes (such as intent, agency and identity) to the algorithm, while shifting responsibility and accountability from developers of the technology. Below, we have created examples showing how we describe works. To reveal a sentence, click on an artist's name.

Artist Name [Select an artist] Algorithm Type Tool Name Tool Name Artwork Artwork.