New Database Expands the Scope of ‘Printing Hate’ Series

By Jack Rasiel, Rachel Logan, Nick McMillan, Kara Newhouse, Sahana Jayaraman, Trisha Ahmed, Molly Castle Work, Adam Marton And Sean Mussenden
The Howard Center For Investigative Journalism

A white-owned Mississippi newspaper justified the 1907 lynching of Henry Sykes, a Black man who was hanged by a mob, writing: “When there is no law to reach the offender…the people must take reins in hand and mete out justice promptly and surely.”

In 1887, a white-owned Kansas newspaper expressed disappointment that townspeople had not immediately lynched Richard Wood, a Black man, instead of letting authorities take him to jail. “They were so derelict of their duty as to let him pass a tree without becoming a swinging ornament to a strong limb, well secured with a stout rope. Such fiendish acts deserve swift punishment, and it should have been meted out right where the crime was committed.”

When a lynch mob later broke into the jail and hanged Wood, the paper concluded: “a negro demon has met a just doom.”

Both newspapers — Mississippi’s Okolona Messenger and Kansas’ Leavenworth Times — are still published today. These examples of deeply harmful coverage of racial terror lynchings are included in a new database created for the Howard Center for Investigative Journalism’s “Printing Hate” series.

To date, “Printing Hate” has published 30 in-depth stories detailing how some white-owned newspapers helped create a culture that fomented thousands of racial terror lynchings.

The papers did this by promoting the brutality of white lynch mobs and exhaustively detailing the torture of Black victims. They did this by encouraging townspeople to join lynch mobs and by relying on racial tropes to justify extrajudicial murder as a necessary alternative to the criminal justice system.

The new database significantly expands the scope of the “Printing Hate” project. It includes historic examples from nearly 70 additional newspapers that featured racist and harmful coverage of the deaths of a lynching victim in their local coverage. All of the papers in the database are still published today in some form.

Taken as a whole, the database buttresses a key finding of the “Printing Hate” series: harmful, racist coverage of racial terror lynchings was not isolated to a handful of white-owned papers, but was commonplace.

The database was the product of nearly a year of computationally driven, historical research. It relied on a large-scale text analysis of historical newspaper scans stored in digital archives. It would not have been possible without the work of civil rights activists, journalists and historians over the last 150 years to document individual lynchings and the work of archivists, librarians and historians to build digital repositories of historic newspapers.

OUR METHOD

Lynching Data

We started with two academic databases that contain approximately 4,500 documented lynchings across the U.S., the Beck-Tolnay-Bailey inventory of Southern Lynch Victims and the Seguin-Rigby National Data Set of Lynchings. The datasets contain the name of each lynching victim (when available), the approximate or exact lynching date, and the county and state of each lynching, along with other variables.

Newspaper Data

To identify newspapers, we used the Library of Congress’ API to programmatically access a dataset with metadata on more than 150,000 historical and current newspaper titles, including the county and state. We then used that dataset to build a web scraper to harvest from the Library of Congress website historical lineage information about each newspaper.

Over 150 years, it was not uncommon for a newspaper to change names several times, often as the result of a merger with another newspaper. For our research, we needed to know the historical name of a currently operating paper at the time of a specific lynching. The newspaper lineage dataset we created — a family tree — allowed us to do that.

Connecting Newspapers To Lynching Events

Next, we joined our lynching dataset to our newspaper lineage dataset. This left us with a dataset of individual lynchings that met two criteria. First, the lynching happened in a county where a local newspaper was still operating, according to the Library of Congress data. Second, the current local newspaper was a direct descendant of a historical paper that was in operation at the time of the lynching.

Digital Newspaper Archives

For historical newspaper images, we relied on programmatic API access to the Library of Congress’ Chronicling America database, which contains nearly 18 million scanned and searchable pages from more than 3,000 historical newspapers, all in the public domain. Our work would not have been possible without the efforts of the Library of Congress and its partners to put so many newspaper archives in the public domain. Their work relied on the collective effort of dozens of universities, libraries, historical societies that helped populate the Chronicling America database. We also accessed some newspaper scans from Newspapers.com, a paid repository with a larger collection of historic newspaper titles.

Unfortunately, not every historic newspaper title we hoped to access was in the Chronicling America or Newspapers.com databases. In some cases, a historic newspaper title was represented in one of those two databases, but specific issues around a lynching date were unavailable.

We used the Chronicling America API to programmatically download scanned historical newspaper pages and assemble PDF packets. Each packet contained scans of all available issues of a given local paper published a month before and a month after the date of a given lynching, more than 40,000 pages.

Identifying Lynching Coverage

We then began a manual review of a subset of the packets to identify specific stories about lynching victims. We used this manual approach to create a sample of lynching stories, which we later used to write software that used natural language processing techniques to identify lynching coverage in our full dataset.

Historic newspaper scans are challenging to work with. Some original paper copies stored in physical archives are of poor quality, with ripped or creased pages and smudged text. The image quality of digital scans varies. The text-dense, occasionally illogical layout of articles in historic newspapers makes it hard to extract the text of individual articles. This all makes the process of extracting perfect machine-readable text from digital images — using “Optical Character Recognition” or “OCR” tools — difficult.

The digital image scans from the Chronicling America project came bundled with text extracted with OCR tools that existed when they were uploaded to the database. The accuracy of the extraction wasn’t always perfect. We made use of this embedded text in early, small-scale manual reviews of articles. But we hoped to improve the quality of the extracted text before writing computer programs to identify additional lynching articles.

Fortunately, OCR tools, driven by advances in machine learning and artificial intelligence, have continued to improve since the creation of the Chronicling America database. We found it helpful to re-OCR the images we downloaded, using Google Cloud Services’ Cloud Vision API and Origami, a software package tailor-made for extracting individual article text from historical newspaper scans. This process didn’t produce perfect results, but it was good enough for our purposes.

Using our manually curated sample, we identified keywords, language patterns and other features common to lynching stories. We then used that information to build a software tool that scanned through every article in our dataset and classified articles that had a high probability of being about a specific lynching. We built another software tool that allowed members of our team to manually classify individual stories using our problematic coverage guidelines.

Before including a newspaper in our database, we performed two key verification steps. We examined original newspaper images around a range of dates, to look for additional coverage of specific lynchings missed by our tool. We also did additional reporting to confirm that a newspaper listed as currently operating in the Library of Congress’ database was, indeed, still publishing. Hundreds of local newspapers have closed in recent years, and we found that some papers that are still listed as currently publishing in the Library of Congress’ database had closed. We also confirmed a current paper’s relationship with the historical paper in question with a second or third information source, in many cases calling the newspaper in question.

About The Database Web App

Our database tool was written in JavaScript, HTML and CSS.

The database isn’t exhaustive. It includes examples of problematic coverage we could find and confirm. We expect there are more examples of current papers that featured problematic coverage of historical lynchings that we could not find, either because of lack of digital archives or of limitations in the software tools we developed. We plan to continue updating the database with new examples in the coming months, as we can confirm their accuracy.

We Want To Hear From You

If you see an error in our database, please email smussend@umd.edu so we can fix it.

If you know of other examples we’ve missed, we will consider adding them to the database. Please email smussend@umd.edu with the subject: “Potential addition to lynching coverage database.” We’re most interested in how a paper that still exists today covered a local lynching.

We’re also eager to work with news organizations identified in our database that wish to do a deeper examination of their historical coverage of lynchings. We can provide larger packets of digital newspaper scans that — in most cases — cover the full month before and after a given lynching. And we can provide information about all documented lynchings that occurred in the county where a news organization operates. Please email smussend@umd.edu with the subject: “Interested in information about historical lynching coverage.”

Lastly, if your news organization has written about the organization’s lynching coverage in the modern era, we’d love to include a link in our database. Please email smussend@umd.edu with a link to your coverage.

This story was written and reported by the Howard Center for Investigative Journalism, and it is part of a larger series investigating how white-owned newspapers incited racial terror.