Data Curation as Publishing for the Humanities

When I think of publishing, I think of preparing and distributing physical products, such as books, journals, and newspapers, for the public to read for free or for a fee. However, digitalization has revolutionized the nature of publishing, allowing authors to develop and disseminate digital works on platforms like Joomag or WordPress. It has also altered how digital humanists create, disseminate, and exchange data for publication. Digital humanities have had to add new aspects to their concepts of “publication”, namely data curation due to the interaction of theory, data and computation methods in digital humanities scholarship.

The active and continuous management of data throughout its lifecycle of relevance and utility to scholarship, science, and education is known as data curation. Data curation is simply preparing, collecting, organizing and maintaining datasets. A dataset is a logically organized collection of data related to a specific body of work or subject. Datasets can be itemized and can be quantitative, qualitative, or visual images and audio. My transcript on the Dominica State College, Orbund platform, which is the student information system (SIS) used, is an example of a dataset I engage with on a weekly basis. This dataset contains quantitative data on my course grades and attendance for each semester. Other individuals would interact with data curation when attending exhibits in museums.

By linking data curation and publishing, digital humanists ensure that curated data is as legible as published work. It can therefore be reused to enable data discovery and the creation of new digital humanities projects that visualise our experience and enable data retrieval.

During class, Dr Esprit showed us the dataset curated for the digital humanities project, Cariseland, which was built in collaboration with previous interns. This itemized information dataset contained the data and metadata necessary to develop the project and was created and populated in an excel spreadsheet. This was also an example of how data curation and datasets enable the retrieval of data, since if the data was not arranged it could not have been retrieved during class.

However, curating datasets like these and larger ones undoubtedly takes a lot of time, effort, and money. Nonetheless, digital humanists and organizations recognize the value of these datasets to ensure the open access of knowledge to the general public, and to allow other humanists the ability to reuse this data in their initiatives.

Thus, by recognising data curation work as a publishing activity, libraries would see the business opportunity and collaborate with digital humanities. Therefore, organizations will have both a “back end” (librarian) and a “front end” (publisher) to help preserve and distribute scholarly publications. Publishers add value to finished products through peer review and high-quality production and presentation, whereas libraries standardize these works while expanding open access.

So, businesses that include both a library and a publisher can envision this as a cohesive package of services covering the complete data lifecycle. Data can be compiled, cleaned of impurities, arranged and curated to present a story, and mined using computational approaches to build digital humanities projects.

Word Count – 504 Words