Groskopf, Christopher. “The Quartz Guide to Bad Data.” Quartz.

  • This guide lists 41 common issues that plague data, and examines each one, including whether it is an issue you can solve yourself, your source should solve for you, you should seek expert help, or it is unsolvable and therefore the data should not be used.

Lincoln, Matthew D. “Tidy Data for the Humanities.” Matthew Lincoln, PhD (blog), May 26, 2020.

  • You can either listen to the Price Lab for Digital Humanities’ podcast episode with Lincoln or read his write up of the same ideas (or both!), both of which are based in a four-day course that was cancelled due to Covid.  Lincoln compares the concept of tidying data to show Tidying Up by Marie Kondo. In both cases, tidiness is not an absolute, but is contextual — what matters most in your home, your research? Likewise, once things are tidy, the good stuff you can now easily find will bring you joy. He discusses why tidy data is important – tidy doesn’t mean that the data are representative, accurate, or complete enough for your research, but that it is vastly easier to iterate through different filters and analyses, often more quickly revealing holes or problems in your data that would be hidden in a less normalized format. Finally, he discusses the two models of database design; source-oriented and method-oriented, and encourages historians to explore the delicate balance we must find between the two. 

Katie Rawson and Trevor Muñoz, “Against Cleaning,” Curating Menus, July 7, 2016,

  • We talk a lot about cleaning data, but it is not a universal, explicitly set out list of actions. Instead, what ‘cleaning data’ involves varies based on general professional practices, materials, personal histories, and tools of the researchers. This collective acceptance of a connotative term, “cleaning,” suggests two assumptions: first, that researchers in many domains consider the consequences of whatever is done during this little-discussed 80 percent of the process devoted to “cleaning” as sufficiently limited or bounded so as not to threaten the ultimate value of any findings; and second, relatedly, that there is little to be gained from more precise description of those elements of the research process that currently fall under the rubric of “cleaning.” In some fields, data cleaning is indeed shorthand for an established protocol, but not so in digital history. So we benefit greatly by being transparent about what exactly we’re doing when we ‘clean data.’  Rawson and Munoz propose that we scrutinize, perhaps reimagine, and almost certainly rename this part of our practice and use their own project of analyzing data from historic menus to illustrate how ‘cleaning’ can hide or create issues of scalability and diversity.