1. Descriptive text about the image
  2. Inside the Legal Tussle Between Authors and AI: “We’ve Got to Attack This From All Directions”
  3. The field of AI engineering is primarily influenced by privileged, elite, cis-gendered, able-bodied white men. Consequently, the technologies they produce reflect their perspectives, interests, and concerns. This limited worldview is often presented as "universal", "value-neutral", "well-meaning" and therefore imposed on all—including minoritized and vulnerable communities that were overlooked during the development lifecycle.
  4. Blonde Braids Study (2023), Minne Atairu Descriptive text about the image Descriptive text about the image
  5. Such disparities in representation stem, in part, from the fact that particular, non-neutral viewpoints are routinely yet implicitly invoked in the design of tasks and labeling heuristics. For example, a survey of literature on computer vision systems for detecting pornography found that the task is largely framed around detecting the features of thin, nude, femme-presenting bodies, largely to the exclusion of other kinds of bodies—thereby implicitly assuming a relatively narrow and conservative view of pornography that happens to align with a straight male gaze.
  6. Perrigo, B. (2023). Exclusive: OpenAI Used Kenyan Workers on Less Than $2 Per Hour to Make ChatGPT Less Toxic. Last accessed, 19.
  7. The Workers Behind AI Rarely See Its Rewards. This Indian Startup Wants to Fix That
  8. Nelson, A., Friedler, S., & Fields-Meyer, F. (2022). Blueprint for an AI bill of rights: A vision for protecting our civil rights in the algorithmic age. White House Office of Science and Technology Policy, 18.
  9. Crawford, K. (2021). The atlas of AI: Power, politics, and the planetary costs of artificial intelligence. Yale University Press.
  10. Gillotte, J. L. (2019). Copyright infringement in ai-generated artworks. UC Davis L. Rev., 53, 2655.
  11. Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021, March). On the dangers of stochastic parrots: Can language models be too big? 🦜. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency (pp. 610-623).
  12. Birhane, A. (2021). Algorithmic injustice: a relational ethics approach. Patterns, 2(2).
  13. Noble, S. U. Algorithms of oppression. In Algorithms of oppression. New York University Press, 2018., Browne, S. Dark matters: On the surveillance of blackness. Duke University Press, 2015.
  14. Open Letter: Artists Using Generative AI Demand Seat at Table from US Congress
  15. class-action law­suit against Sta­bil­ity AI, DeviantArt, and Mid­jour­ney for their use of Sta­ble Dif­fu­sion
WHAT AI IS, AND ISN'T
The term Artificial Intelligence (AI) often conjures up images of sentient, omnipotent machines portrayed in science fiction, such as those seen in movies like The Matrix1. In contrast, real-world AI systems perform specialized, non-fantastical tasks within well-defined domains. This toolkit focuses on tools powered by Generative AI—a subfield field of AI that excels at detecting patterns and generating outputs based on large-scale datasets. Specifically, we explore its applications in image generation using Text-to-image (T2I) algorithms.

Throughout this toolkit, we use the term 'AI' in reference to technologies engineered to synthesize images. Occasionally, we term outputs as 'artworks.' In so doing, we align with Rolling's (2013) definition: "A work of art is like a theory. A theory is a set of interrelated constructs represented in a distinguishable manner or form, the major function of which is to describe, explain, and/or interpret the variables and variability of a phenomenon or experience within the world."

WHAT IS IMAGE GENERATION?
Generative AI is fueled by human-generated data. Platforms like Flickr, YouTube, Instagram, have become a robust data resource for researchers across industry and academia.

A selfie you may have posted on a social media platform.2
An image on your website.3
A story on a news site.4
A book you may have authored.5

Although large corporations generate substantial profits from this data, the original creators, and rightsholders, are almost never compensated5, that has in turn elicited a slew of copyright lawsuits.

The development process begins with assembling of massive web-scale datasets, each comprising billions of images and their corresponding textual descriptions. AI researchers utilize automated web crawling techniques to amass these datasets. Following this collection phase is a computationally demanding “training” process, which spans several months. During this phase, researchers apply advanced statistical methods to teach the AI model to identify and replicate the semantic relationships between words and their visual representations.

Post-training, text-to-image models6 may undergo further refinement to better align with specific performance goals or ethical standards. This phase, often referred to as "fine-tuning," typically involves additional training on a narrower dataset. This step is crucial for addressing potential biases in the model and improving its ability to handle sensitive or complex scenarios responsibly. Once fine-tuned, the model acquires the capability to synthesize images that not only mirror characteristics observed in the dataset, but also plausibly assemble disparate visual concepts in novel ways.

COLOSSAL DATASETS


This section contains information that some may find distressing.

The breadth and depth of a dataset are crucial factors in developing equitable and robust generative systems. This large amount of data has led to training some disruptive models in the field, such as CLIP trained on 400 million image-text pairs; Imagen trained on 860 million image-text pairs; Flamingo trained on 2.3 billion images and short videos paired with text; DALL-E 2 trained on 650 million images; or Stable Diffusion, trained on 600 million captioned images. Those models have been shown to learn visual and language representations that outperform previous state-of-the-art models.

There has been growing concern regarding the degree and manner of representation of different sociodemographic groups within datasets. Consider the case of LAION-5B, a dataset that has been instrumental in the development of various photorealistic text-to-image generators. LAION-5B comprises an extensive, uncurated pool of 5 billion text-image pairs. While the sheer size of the LAION-5B dataset has enabled significant advancements in the field, research studies have revealed that its uncurated nature is problematic. Researchers have identified a glaring under-representation of darker-skinned subjects, compared with lighter-skinned subjects overwhelmingly sourced from Western countries; disproportionate association between words describing queer identities and text labeled as “toxic”; the presence of underage explicit content (Thiel, 2023), non-consensual intimate imagery (Birhane et al, 2021), and toxic, hateful, aggressive, racist, sexist, homophobic, transphobic, and pornographic content (Birhane et al, 2024), with disproportionate implications for communities minoritized along racial, gender, and socio-economic lines.
HUMAN & ENVIRONMENTAL COSTS


What are the human 2 and environmental costs of developing and sustaining Generative systems? Given that generative systems are subject to bias, misinformation, and toxicity, the current technology relies on millions of invisible, underpaid data workers (based in the majority world) to improve performance. The manual, labor-intensive process, known as reinforcement learning with human feedback necessitates thousands of hours of work, often involving exposure to and categorization of toxic content. During this process, data workers are assigned tasks that include assessing and selecting text snippets generated by AI to evaluate their resemblance to human-generated content. The choices made by these data workers are subsequently used to refine the algorithm, guiding the system towards producing more 'human-like' outputs. This method is not just restricted to refining language; it is also employed to filter out “graphic details like child sexual abuse, bestiality, murder, suicide, torture, self harm, and incest” that may be generated by the system. While this work may seem mundane, it has demonstrable implications for the mental well-being of the workers involved.

As previously noted, AI models require large-scale datasets for training, deployment, and ongoing maintenance. This requirement translates into a significant demand for computational power, with associated environmental implications. The sourcing of minerals and raw materials essential for constructing the computational infrastructure often disproportionately affects communities in the majority world. Data centers, needed for the training and storage of AI models, are colossal consumers of energy. They are responsible for approximately 2% of global electricity consumption, leading to significant carbon dioxide (CO2) emissions and necessitating the use of millions of liters of freshwater for cooling purposes.