About the workshop

graphic recording workshop overview

Overview

Purpose

Visualizing Empire Project, Stage 1

This workshop is the consensus decision-making first stage of a digital humanities project ethically visualizing the French cultural imagination, called Visualizing Empire. To this end, it unites top experts from relevant fields to address the conceptual and logistical challenges of visualizing French colonial historical text without reproducing their inherent ethnocentrism. To this end, the workshop will address two key issues:

how to create ethical data visualizations—and their underlying forms of training and analysis—that grapple with inherent source biases; and
how to computationally process non-modern, non-English languages for humanities research in a critically engaged way.

This workshop lays the foundations for:

using machine learning to broaden critical engagement with the past
expanding the tools of distant reading and visualization to a broad range of historical sources in ways that attend to their cultural, geographic, and linguistic diversity; and
identifying and preventing visual perpetuation of pernicious narratives about historical subjects that have persisted from the past into the present.

Sessions

Participants

Visualizing empire workshop participants

Organizers

Christopher M. Church
Assistant Professor of Digital History
Co-Director, Nevada Center for Data and Design
Department of History
University of Nevada, Reno
Katherine Hepworth
Associate Professor of Visual Journalism
Co-Director, Nevada Center for Data and Design
Research Director, Visualizing Science Project
The Reynolds School of Journalism
University of Nevada, Reno

Advisor

Karel van der Waarde
Professor of Visual Communication
Swinburne University of Technology
Melbourne, Australia

Invited experts

Charles Tshimanga-Kashama
Associate Professor of History
Department of History
University of Nevada, Reno
Katherine McDonough
Senior Research Associate
The Alan Turing Institute
British Library
London, United Kingdom
Ludovic Moncla
Maître de conférences in computer science
Computer Science Laboratory for Image Processing and Information Systems (LIRIS) National Institute of Applied Sciences (insa)
Lyon, France
David Bamman
Assistant Professor
Information School
University of California
Berkeley, U.S.A.
Mary Elings
Assistant Director and Head of Technical Services The Bancroft Library
University of California
Berkeley, U.S.A.
Claire Gardent
CNRS/LORIA
Université de Lorraine
Nancy, France
Doris Kosminsky
Professor
School of Fine Arts
Universidade Federal do Rio de Janeiro
Rio de Janeiro, Brazil
Teresa Schultz
Assistant Professor
University Libraries
University of Nevada, Reno
Elena Azadbakht
Health Sciences Librarian
University Libraries
University of Nevada, Reno

Support staff

Aya Sato
Journalism senior
Reynolds School of Journalism
University of Nevada, Reno

Credits

Support

This workshop was made possible by the generosity of:

National Endowment of the Humanities Office of Digital Humanities, Award No. HAA-266490-19
Reynolds School of Journalism, University of Nevada, Reno
College of Liberal Arts, University of Nevada, Reno

Workshop Acknowledgements

In addition to the participants and supporting instituions, this workshop was successful thanks to the contributions of many people, including:

Mikki Johnson, Department of History, University of Nevada, Reno
Mark Higgins, Facilities Management, University of Nevada, Reno
Sally Echeto and Barbara Trainor, Reynolds School of Journalism, University of Nevada, Reno
Fatima and Catherine Leland, Silver and Blue Catering, University of Nevada, Reno
The staff at Beaujolais Bistro and Perenn Bakery.

Site Acknowledgements

RJ Ramey of Plain Talk History for generously providing the co-investigators with permission to use data and resources from his site on Florence and Monroe Work, particularly his excellent Memorandum of Understanding, which we have used as a basis for our own.
Catherine D’Ignazio, Lauren Klein, and MIT Press for the excellent Data Feminisms summary site, which gave us helpful tips on includes for this site.

Documentation

Dataset

Since the beginning of this project, we have intended to share the work of the dataset harvested from the Journal des Voyages in its present state, before further work has been completed to make it into a ready-to-use corpus. However, several ethical concerns about this were raised at the workshop, due to the problematic nature of much of the content (including racist and sexist slurs, dehumanizing descriptions etc). We aim to balance our dual desires to do minimal harm, and to be as transparent as reasonably possible. Therefore, we welcome enquiries about using the dataset. However, we request that prior to your request, you read and sign the Memorandum of Understanding, and attach a scanned copy to an email to us at .

From a technical standpoint, we have cleaned the OCR output for one of these corpora, reaching a high degree of accuracy, well above the standard found in most digital collections, and we have marked-up the text. While the data set is large enough to enable the training of machine learning algorithms, such as named-entity recognition (NER), essential to the analysis of large textual corpora, it is of a manageable size with high accuracy. As a continuous time-series, the corpora also allow for the exploration of how preoccupations changed over the eighteenth and nineteenth centuries. By performing “distant reading” on these “medium data” — too large for traditional reading, but small enough to be managed in a targeted and effective manner — we can unearth the epistemological narratives targeted at the reading public. With the goal of using machine learning to broaden critical engagement with the past, this project will provide guidelines and two key training datasets, the ARTFL Encyclopédie and the Journal des Voyages, for modeling and analyzing other eighteenth and nineteenth century French sources.

Memorandum of understanding

If you are interested in using the Journal des Voyages dataset, please see the Memorandum of Understanding here.

Data processing notes

The following table outlines the cleaning processes have been performed to arrive at the latest version of our Journal des Voyages dataset.

Version	Date	Note
Version 0.3	2017	Cleaned data from FlatWorld solutions was post-processed using a workflow built around the OpeNER toolkit (https://www.opener-project.eu/), generating KWIC files as well as geolocated names. Process and code for the data processing are outlined at https://github.com/cmchurch/empire-viz-alpha. Initial visualizations were generated at this stage.
Version 0.2	2016	Data cleaning performed by FlatWorld Solutions. Data was entered from original pages through keyed entry, cross-checked against the OCR’d data gleaned from BNF. The data went through six rounds of corrections. FlatWorld solutions promised 95% accuracy, but it became clear much later that the 95% target was not achieved.
Version 0.1	2013	Data scraped from BnF holdings using pjscrape and PhantomJS. Data initially cleaned using regular expressions (https://github.com/cmchurch/OCR-Postprocessing) and predictive language correction (https://github.com/cmchurch/ngram-spellchecker-FRE) based on a French-language dictionary. Later versions replaced predictive language correction with keyed entry.

Files

All files are available by downloading the github repository. This includes audio files, briefing materials, presentation files, graphic recordings, and session notes. To browse, hear or see individual files, refer to the individual files in session pages.

Long term impact

This workshop, and intellectual activity leading up to it, have resulted in several related projects. The materials for each of these activities are made publicly available via Github repositories under open source licenses, and in some cases, as Github Pages websites. The main long-term impact of these activities is preparation for the next stage of the project, namely creating a visualization to explore the Journal des Voyages corpus.

Method: Ethical visualization workflow

We have also done extensive work on a prototype ethical visualization workflow, tested with present day materials from the humanities and the sciences. The workshop will allow adaptation of this workflow for ethical visualization of historical textual sources and non-English text mining. For the latest development of this work, see: https://kathep.github.io/ethics/

Teaching: Ethical Data Visualization: Taming Treacherous Data course

We teach the method described above as well as how to navigate the concerns explored in this project in the course ‘Ethical Data Visualization: Taming Treacherous Data’. This course was offered in 2018 and 2019 at the Digital Humanities Summer Institute (DHSI) at the University of Victoria, Vancouver Island, Canada, and in 2019 at DHDownunder at the University of Newcastle, New South Wales, Australia. See the DHSI course content and DHDownunder course content in these repositories.

Paper 1: Racism in the Machine

We wrote a paper on work preparatory to this workshop, it is available here: Katherine Hepworth and Christopher Church. 2018. “Racism in the Machine: Visualization Ethics in Digital Humanities Projects.” Digital Humanities Quarterly 12:4.

Paper 2: Make Me Care

This paper incorporates some considerations explored at the workshop, and applies them to an expanded view of ethical visualization, beyond the digital humanities. From Hepworth, K. 2020. (forthcoming) “Make Me Care: Ethical Visualization for Impact in the Sciences and Data Sciences”, HCII Conference 2020 Proceedings.

Additional long term impact

The additional long-term impacts of these activities are that, to date:

approximately 70 scholars have been trained in our considered approach to ethical visualization;
several government organizations, startups, and research groups are currently using the Ethical Visualization method, greatly expanding the scope and reach of our project beyond what we expected.

Additional award products

Github repository

All materials from the planning workshop have been collected into this online repository, accessible via GitHub and deposited at Zenodo with a document object identifier. These materials include recordings of all discussions and copies of all slide decks from the planning workshop itself, as well as a summary of the key findings from the grant period.

Journal des Voyages dataset

They also include the dataset of the Journal des Voyages corpus produced through a combination of OCR scanning with keyed re-entry. Alongside the Encyclopedie dataset hosted by Stanford University, these data will ultimately be used to create a training model for natural language processing and named-entity recognition in non-modern French.

White paper

See the full white paper here.

Future activities

Forthcoming paper on method and process

The award also contributed to the drafting of a peer-reviewed paper that will be submitted to Digital Scholarship in the Humanities outlining a method for computationally processing non-modern, non- English languages for humanities research in a critically engaged way.

Journal des Voyages corpus

This dataset will ultimately be refined and cleaned further, using ethical visualization principles, to create a complete corpus. This final corpus will be made publicly available.

Visualizing Empire website

A digital humanities project that seeks to ethically visualize the French historical imagination using cartographic and network visualizations of events contained within the Journal des Voyages corpus.

Participate

Follow

You can follow development of this work by subscribing to updates at our newsletter.

Contribute

We welcome communication, contributions, and thoughts on this work, particularly from people in the French diaspora. If you’d like to contribute to the development of this work, please join the vizempire google group. We welcome referrals to other projects, literature, and methods that may be relevant, as well as suggestions for improvement or other modes of implementation. Contact us at the shared project email address.

Ethical Visualization in the Age of Big Data

A Planning Workshop Summary

Contents