AI based discovery of habitat from museum archive documents using Natural Language Processing (NLP)

Interesting paper published by Jones et al (2024) on applying Optical Character Recognition (OCR) and NLP to hand written archives. Some descriptions date to the 18th century with over 2 Billion records in the Global Biodiversity Information Facility (GBIF), including habitat information related to geographical location, land cover, hydrological, soil and bedrock. “Habitat data can provide important evolutionary insight into drivers of biogeographical patterns at a global scale. Exploiting historical habitat data and linking this with the field of museomics (genomic data from museum samples) over time could provide new opportunities to test evolutionary drivers of species change in the Anthropocene.”

Aligning to controlled vocabularies, funding and geo-referencing pose particular challenges.

Abstract “Museum collection records are a source of historic data for species occurrence, but little attention is paid to the associated descriptions of habitat at the sample locations. We propose that artificial intelligence methods have potential to use these descriptions for reconstructing past habitat, to address ecological and evolutionary questions.

https://www.sciencedirect.com/science/article/pii/S0169534724000314

Leave a comment

Website Powered by WordPress.com.

Up ↑