The Global Labours of AI and Data Intensive Systems

The remarkable capacities of AI-infused and data intensive systems are regularly presented as emblematic of technoscientific innovation and progress. Far less recognised is the human labour required to sift through and sort data, and ultimately train these systems. This is a labour distributed well beyond the presumed centres of technological innovation, and spun into and strewn across geographically distributed regions in the "Global South". This workshop will bring together scholars and practitioners interested in and conducting research on the hidden labours that lie behind AI and data intensive systems. Participants will have the opportunity to share perspectives or results of their research and, through highly exploratory engagements and dialogues, set out a critical mode of inquiry and future directions with and for a nascent community. (Read more).

Submitting

The submision deadline is Wednesday 15 September 2021 (midnight, anywhere on earth).

Use this button to submit to the workshop.

Submissions are invited for a wide variety of media forms, ranging from two-page position pieces to pictorials or short (Tik-tok-like) videos.

Workshop organisers will dedicate a day to collectively review the submissions, with a target of between 15 to 20 participants. Content will be judged on its relevance to the call and its capacity to provoke discussion and critical inquiry.

Selection will rely on an inclusive model, where we will as organisers especially welcome work that represents a diverse community of scholarship and practice.

Notifications of acceptance will be sent by 24 September 2021.

Organisers

Benedetta Catanzariti is a PhD candidate in Science, Technology and Innovation Studies at the University of Edinburgh. Her research focuses on the social and organizational dynamics shaping the design of emotion recognition AI, as well as the data curation practices involved in the creation of face datasets.

Srrayva Chandhiramowuli is a Research Associate at the Machine Intelligence and Robotics Centre at International Institute of Information Technology Bangalore, India. Her research, as part of the Humanising Automation project, focuses on the state of AI-based automation in India and how it might shape the future of AI-enabled work.

Suha Mohamed leads partnerships and strategy at Aapti, a public research institution where part of her research focuses on the nexus of artificial intelligence, data rights and labour - specifically the possibilities of collective negotiation and bargaining in the data economy.

Sarayu Natarajan is the Founder of Aapti Institute. She thinks about technology and society at Aapti, particularly state, citizenship, work, and AI—and about politics all the time.

Shantanu Prabhat in an HCI researcher working in the Google AI India and People AI Research team.

Noopur Raval is a postdoctoral researcher at the Tandon Engineering School at New York University. Her research focuses on the global political economy of algorithmic platforms and how they remediate social contexts.

Alex Taylor co-directs the Centre for Human-Computer Interaction at City, University of London. His interests are in how technologies are co-constitutive of forms of knowing and being, and, as a consequence, provide a basis for radical transformations in society.

Ding Wang is a senior HCI researcher from Google AI India and People AI Research team. Her research focuses on the practices, processes and organisations of work (e.g. the collection, annotation and documentation) on data that is essential to ML and AI systems.

Planning

Duration:

The workshop will be hosted online using an accessible conferencing system, and held over one day. Timings will be designed to limit the time participants must spend online, both to reduce fatigue and enable a global audience. It's anticipated the workshop will run for up to 4 to 5 hours.

Activities:

Activities will be designed to engage participants in extended and progressively refined discussions before, during and after the workshop.

Pre-Workshop:— Accepted submissions will be shared with the workshop participants in advance with the expectation that they will be read or viewed ahead of the workshop. We will choose an appropriate platform to host the submissions so that comments and questions can be shared, asynchronously, and preserved after the workshop. As organisers we will use the submissions and subsequent commentaries to produce a range of provocations—short statements, narratives, visual materials, etc. inviting examination or queering of significant themes. These provocations will also be shared in advance of the workshop and allow for a further layering of ideas and reflections from participants.

Dialogue groups:— At the workshop, after brief introductions, much of the time (approx. 2 to 3 hours, including breaks) will be spent in small dialogue groups made up of changing combinations of three to four people. These small groups will work experimentally with the provocations and participants’ commentaries. Groups will iteratively produce various media (notes, sketches, collages, etc.) in a shared workshop notebook (e.g. Notion or Miro) to interject in, reflect and capture views ranging from the specific tasks behind AI and data intensive systems to the wider global structures they operate in.

Collective reflections:— The dialogue groups will be interspersed with two short periods for collective reflection and a longer closing activity where all participants will be able to look across and reflect on the workshop’s outputs. The timings and details of the two synchronous activities (this and above) have been kept loose as we know from experience that exploratory workshops such as this one need to adapt to what is working best for participants.

Post-workshop:— Through the media-rich records produced, discussion and reflection will continue after the workshop. Particular focus here will be on the goals set out below.

Goals:

Aligned with the background and context for the workshop presented in the introduction, the activities described above will progressively work towards the following goals:

Producing a catalogue of perspectives and case studies reflecting current research related to the global labours of AI and data intensive systems.
Defining clearer ideas of and possibly frameworks for such labours and the structures affecting them, e.g. labelling, annotation, content moderation and micro-tasking, skill, expertise, value systems, global and capital flows, etc.
Developing the basis for a critical mode of inquiry. This will establish a language and range of conceptual objects intended to help with future research into the global labours of AI and data intensive systems, and, crucially, frame this research in terms of equity, fairness and justice for those who are too often placed at the margins of technological innovation.

Long Text

Introduction

This workshop will bring together scholars and practitioners interested in and conducting research on the hidden labours behind AI and data intensive systems. Attention will be given to the global character of these labours. This will mean examining how the unacknowledged yet still essential work of AI is distributed well beyond the presumed centres of technological innovation, and spun into and strewn across geographically distributed regions in the euphemistically termed “Global South”.

Brought together by this focus and a corresponding mixture of intersecting interests, workshop participants will have the opportunity to share perspectives or results of their research with a nascent community. Topics of concern will include but not be limited to the labours that revolve around data labelling, content moderation, microtasking,platform economies, and more broadly a globalised gig-work, and the concentration of such labour across cities (and sometimes the rural regions) of India, Kenya, Vietnam, Venezuela, the Philippines, and so on.

The sharing of perspectives and materials will:

help to compile a diverse corpus of empirical research,
give further clarity to terms such as labelling, annotation, content moderation and micro-tasking, and
set the stage for a critical mode of inquiry to be used, collectively, by the community.

Stimulated by the shared insights and materials, and this critical mode of inquiry, the workshop will go on to use the format of multiple and intersecting dialogues between participants to discuss and speculate on plausible futures. The dialogues and attendant speculative futures will concentrate on re-imagining the uneven geographies of technoscientific innovation. They will seek to build on rather than obfuscate the global supply chains of data, knowledge and skill that have enabled a thriving tech sector in the Global North (see [7] for one such example). And they will project design and policy interventions for more just and equitable practices for the global labourers enabling AI and data intensive systems.

Global Labours

The remarkable capacities of AI-infused and data intensive systems are regularly lauded not only by the digerati but in the popular press. Browse Wired or TechCrunch and in minutes you’ll encounter an article crediting the power of AI or the deluge of data for supplying the ‘hot sauce’ in the next big social networking startup, surpassing some milestone in driverless transport or meeting the pressing challenges faced in health screening.

Far less recognised, beyond some notable exceptions (e.g. [3, 5, 7, 10]), is the human labour required to sift through and sort data, and ultimately train these systems. Crowdsourcing platforms, worldwide, employ thousands of workers to read texts, view images and video, and label data to produce the models AI systems rely on. A straightforward example here is the labelling of faces in photographs. Although many reports of computer vision systems tout the incredible performance of algorithms that identify faces, what is rarely given attention is the work involved in labelling data–required to train and refine computer vision models. Complicating such labours are the norms that are imposed on labelers. This enforcement in the homogeneity of norms–used, for example, in the normalisation of image classification–mask frictions between multiple layers of meaning and, often, culturally sensitive value systems [8, 11].

Those workers sifting, sorting and labelling data can encounter further complexities and ambiguities when, for example, doing the work of content moderation [4]. Again, technological innovations in content moderation will point to new algorithmic techniques for filtering language and in some cases providing important tools for reducing genuine harms in society, such as identifying graphic or violent imagery, or hate speech. Yet overlooked is a global workforce that supplements such automation, or trains and validates the AI, and, crucially, the impacts on such a workforce. Telling is the only recent reports reaching a wider public of the physical demands and, perhaps more important, psychological damage that accompanies such labours [1].

Altogether, these examples and others prefigure a globalised gig-economy, a structural configuration that concentrates wealth and authority (although not necessarily agency) in the Global North and relies on a little recognised, underpaid, disenfranchised, and largely unregulated workforce in the Global South. This structural agglomeration of actors, software systems and platforms, and flows of data further cements an elitist technoscience—what the anthropologist Anna Tsing refers to as the “globe-crossing capital and commodity chains” of capitalist forms [12, p. 4].

The specifics of the labours that constitute gig-work and the wider political economies that these labours are a part of will form the context for the Global Labours of AI and Data Intensive Systems workshop. Participants will bring accounts of labour or informed perspectives in order to compile a corpus of related and current work in the area.

Re-imagined Futures

Past work in CSCW and beyond has drawn attention to precisely these hidden and often exploitative labours [2, 5, 7, 10]. Building on this, our own work investigates a contemporary moment where the global gig-economy is coming to rely on new organisational actors emerging in the Global South [9]. For example, small platform startups like iMerit are creating new models to compete with the likes of Amazon’s Mechanical Turk. Based on a very different business model, iMerit has moved away from sourcing and providing microtasks for its labellers and towards an overtly ethical model that prioritises worker training and developing local expertise (in the case of iMerit, in India).

This changing landscape reveals an evolving and expanding matrix of dependencies, intensifying Tsing’s crossings, chains and connections. It highlights, for example, tensions between, on the one hand, the systemic deskilling of data work and, on the other, the growing need to recognise the value of "AI dataset expertise" [6]. Invited, here, is a re-centering of global data workers as situated and agile experts. The extensive domain and bias sensitivity training they must undergo and the ways they capitalise on existing forms of expertise must be seen as processes through which they constantly reproduce themselves as fast-adapting, flexible and responsive actors that are crucial to wider global structures. Thus, it’s not just a recognition that gig workers are hidden from view or marginalized in innovation and regulatory processes that is needed, but directions are required that acknowledge and reward the attention, discretion and care given to data work.

Altogether, what the fluxes in crossings, chains and connections suggest are the possibility of reconfigurations of the status quo and, in particular, a reworking of capital flows, agency, authority and values. For example, with startups like iMerit, we see the potential for decentering those actors that have hitherto dictated the structural configurations of labours behind AI and data intensive systems and, in turn, thinking more generatively about recognition and reward of the distributed and varied global labours.

It’s this that we will take as a starting point for a critical mode of inquiry in the proposed workshop. The possibility for change will be used to set the conditions for speculating on alternate futures, as well as for intervening in or even refusing prevailing capitalist forms. Whether models like iMerit’s go far enough—-to achieve more just and equitable reconfigurations—-will help to stimulate a line of inquiry. Similarly, the modes of moving forward by, for example, intervening in platform design or advising on corporate or national labour policies, will form the basis for articulating and reflecting on what alternate futures we should be seeking.

References

[1] Angel Au-Yeung. 2021. At Risk Of Losing Their Jobs, Facebook Content Moderators In Ireland Speak Out Against Working Conditions. Retrieved June 11, 2021 from URL.

[2] Paško Bilić. 2016. Search algorithms, hidden labour and information control. Big Data & Society 3, 1 (2016), 2053951716652159.

[3] Kate Crawford. 2021. The Atlas of AI. Yale University Press.

[4] Tarleton Gillespie. 2020. Content moderation, AI, and the question of scale. Big Data & Society 7, 2 (2020).

[5] Mary L Gray and Siddharth Suri. 2019. Ghost work: How to stop Silicon Valley from building a new global underclass. Eamon Dolan Books.

[6] Ben Hutchinson, Andrew Smart, Alex Hanna, Emily Denton, Christina Greer, Oddur Kjartansson, Parker Barnes, and Margaret Mitchell. 2021. Towards Accountability for Machine Learning Datasets: Practices from Software Engineering and Infrastructure. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (Virtual Event, Canada) (FAccT ’21). Association for Computing Machinery, New York, NY, USA, 560–575. URL.

[7] Lilly C. Irani and M. Six Silberman. 2013. Turkopticon: Interrupting Worker Invisibility in Amazon Mechanical Turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Paris, France) (CHI ’13). Association for Computing Machinery, New York, NY, USA, 611–620. URL.

[8] Milagros Miceli, Tianling Yang, Laurens Naudts, Martin Schuessler, Diana Serbanescu, and Alex Hanna. 2021. Documenting Computer Vision Datasets: An Invitation to Reflexive Data Practices. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (Virtual Event, Canada) (FAccT ’21). Association for Computing Machinery, New York, NY, USA, 161–172. URL.

[9] Sarayu Natarajan, Kushang Mishra, Suha Mohamed, and Alex S. Taylor. 25 Feb 2021. Just and equitable data labelling: Towards a responsible AI supply chain. Technical Report. Aapti Institute, Bangalore, India. URL.

[10] Noopur Raval and Paul Dourish. 2016. Standing Out from the Crowd: Emotional Labor, Body Labor, and Temporal Labor in Ridesharing. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work amp; Social Computing (San Francisco, California, USA) (CSCW ’16). Association for Computing Machinery, New York, NY, USA, 97–107. URL.

[11] Morgan Klaus Scheuerman, Kandrea Wade, Caitlin Lustig, and Jed R. Brubaker. 2020. How We’ve Taught Algorithms to See Identity: Constructing Race and Gender in Image Databases for Facial Analysis. Proc. ACM Hum.-Comput. Interact. 4, CSCW1, Article 058 (May 2020), 35 pages. URL.

[12] Anna Tsing. 2004. Friction: An Ethnography of Global Connection. Princeton University Press.