Wikidata as a knowledge base and research tool for open science in archaeology

S2: Archaeology, Open Science & the Digital Humanities | Standard presentation
  • Sophie C. Schmidt
    Freie Universität Berlin
  • Florian Thiery
    Leibniz-Zentrum für Archäologie (LEIZA)

Wikidata is an open knowledge base, which a growing number of volunteers and researchers use to add content to the Linked Open Data Cloud. Linked Open Data (LOD) aims at interlinking openly available datasets and thereby facilitating the combination and joint processing of existing data from various digital sources. Wikidata functions as the central storage for structured data of Wikimedia projects such as Wikipedia and Wikimedia Commons and does not only record statements, but also their sources, acting as a secondary database. It can be edited by anyone, the data is multilingual and available under the Public Domain (CC0). Archaeological data has already been integrated using i. a. subject specific classes, e.g. ‘Bronze Age’ (Q11761) as an “archaeological age” (Q15401699) and properties, such as the ‘Art & Architecture Thesaurus ID’ (P1014) to link concepts with the Getty AAT. Examples include data on Greek Vase Painters, Irish Ogham Stones, Samian Ware and African Red Slip Ware (Schmidt et al. 2022).

In this talk we would like to showcase a project to create a reference collection of Neolithic ceramics for volunteer field walkers in Wikidata. This is part of a small ongoing project, which was originally funded by Wikimedia Deutschland e.V. within the Open Science Fellows Program in 2020/21. It employs a Cradle tool form for easy and coherent adding of information by field walkers, including an image uploaded to Wikimedia. The Wikidata Query Tool enables searching for data within Wikidata without any knowledge of the underlying query language SPARQL, thereby facilitating the use by non IT specialists. OpenRefine and QuickStatements enable batch uploads and edits to Wikidata, which will be used by the authors to contribute to the reference collection.

By this example, we will discuss concepts of “Open Data” in the archaeological scientific context. Who decides, which data can or should be available and to whom? How are information flows mediated? The (currently) ‘extremely’ open, community-based approach of Wikidata, where scientists, as any other member of the community, are not able to control information statements made, may discourage crowdsourced knowledge sceptics. Studies have shown that some vandalism occurs (0,004% cf Heindorf et al. 2016). To mitigate the spread of misinformation, Wikidata encourages the use of source attribution as references into the statements, similarly to Wikipedia. Statements are also connected to a version control system and discussion pages where different views can be debated.

A main problem the volunteer field walkers identified for their work was access to scientific publication and reference material. Wikidata can be used to improve data sharing practices between them and the scientific community and thus facilitate their contribution to archaeological knowledge production.

References

  • Heindorf, S.; Potthast, M.; Stein, B.; Engels, G Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis, October 2016. Conference: 25th International Conference on Information and Knowledge Management (CIKM 2016), Indianapolis (USA), DOI: 10.1145/2983323.2983740
  • Schmidt, S.C.; Thiery, F.; Trognitz, M. Practices of Linked Open Data in Archaeology and Their Realisation in Wikidata. Digital 2022, 2, 333–364. https://doi.org/10.3390/digital2030019