Back To Schedule
Wednesday, October 2 • 11:30am - 12:00pm
Seeing the big picture: A data science approach to making sense of archival collections via Jupyter notebooks

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Many organisations have trouble getting an overview of their data. A particular challenge for the archival community is that their data spans a large time period, during which systems and policies for creating metadata have changed, and even the meaning of the metadata itself, as language evolves. Archivists often struggle to answer questions such as: How many hours of television do we have? What percentage of the material is digital? What content do we have relating to migration? This information is essential to manage archival processes such as digitisation, and to select relevant content for individual users (recommendations) and for exhibitions. Scholars and journalists accessing the archive also need an overview of the type and content of material available, and the completeness of its metadata. At the Netherlands Institute for Sound and Vision, we have developed a data science approach for creating overviews of archive collections. We use Jupyter Notebooks, 'live' documents containing code, visualisations and narrative text. With these notebooks, we provide an up-to-date 'shop window' overview to interested parties, while allowing users with programming skills to flexibly create their own overview. Most importantly, we embed knowledge of the archive history and processes in the notebooks, to correctly assemble data fragments into the big picture. In this paper, we describe how our approach has been successfully applied to create overviews for diverse archive collections, including our own. We will discuss how these overviews support users in their work, and present the results of evaluations with scholars and archivists.

avatar for Mari Wigham

Mari Wigham

Data engineer, Netherlands Institute for Sound and Vision
Mari Wigham is a data engineer at the Netherlands Institute for Sound and Vision, working on innovative ways of helping researchers to work with the archive. She studied electronic engineering, and has spent her career working in applied research institutes, on projects ranging from... Read More →
avatar for Liliana Melgar

Liliana Melgar

The Netherlands Institute for Sound and Vision
Liliana Melgar is a postdoctoral researcher at Utrecht University investigating how to support Scholarly Video Annotation in the Dutch infrastructure for the digital humanities CLARIAH. She has been involved since 2016 in the CLARIAH project as part of the work package on audi... Read More →

Wednesday October 2, 2019 11:30am - 12:00pm BST
BenGLab 1