It can be quite addictive to experiment plotting word vectors of various geological entities and classifications. Here I have chosen lithology (thousands of terms grouped into higher lithological classifications shown by the colours and symbols) using the word vector 'orogenic' on the x-axis and 'lacustrine' on the y-axis. The lithologies further to the right are... Continue Reading →
Word Vector Clustered Heatmap: Data Discovery.
I’ve used Python to cluster the word vectors of critical minerals and their ores (copper, lithium, nickel, cobalt and REE) to Lithostratigraphy in a corpus of geological texts. In the heatmap the greens are above the mean (cosine), reds below the mean, whites are around the mean. Minerals and Lithostrat with ‘similar’ profiles will cluster... Continue Reading →
Natural Language Processing : Rules or Machine Learning?
When your only tool is a hammer, everything looks like a nail. One of the most frequent questions I get asked about Natural Language Processing (NLP) is which is best 'rules' or 'machine learning'. With the popularity, hype even, of Large Language Models (LLM) it would seem its obvious - machine learning because of its... Continue Reading →
Graph of Text Embeddings
Showing similar terms to a search query, in this case 'Pegmatite' using text embedddings in a graph network. This can support data exploration, the thickness of the edge is related to cosine similarity (the thicker the green line, the more similar the association). For this visualisation I modelled text embeddings into NetworkX to display in... Continue Reading →
Radar Plots Using Word Vectors
Radar plots using word vectors can be useful to compare what ‘the text’ of many reports might be saying (perhaps too much to read) to quantitative data on aspects such as risk or uncertainty. It may highlight mismatches or contradictions requiring further investigation. The illustrative example is driven from a corpus of millions of words... Continue Reading →
Ternary plots using word vectors
Ternary plots are used for visualisation and modelling in the geosciences for three component correlation. Typically they represent the compositions of soils, rocks and minerals. The interactive plots above have been generated just from the patterns of millions of words trained from geoscience literature. Each axis represents cosine similarity, the closer to 1 the more... Continue Reading →
Data-Driven Discovery in Geosciences: Opportunities and Challenges
Chen et al (2023) published a very interesting special edition editorial for Springer's Mathematical Geosciences recently. "This special collection explores scientific research related to data-driven discoveries in geosciences and provides a timely presentation of progress in developments and/or applications of AI and big data approaches to multiple aspects of geosciences. " I think this next... Continue Reading →
NASA BERT-E Earth Science Large Language Model (LLM)
To understand domain terminology effectively in areas like healthcare and geoscience, domain training has been shown to improve results. https://www.nature.com/articles/s41586-023-06291-2 BERT-E The NASA IMPACT team published a paper at AGU back in 2021 on BERT-E an Earth Science trained language model (270k articles) comparing to Sci-BERT (see screenshot). https://agu2021fallmeeting-agu.ipostersessions.com/default.aspx?s=9D-AC-B5-BA-E8-8D-CE-44-5F-17-8E-3F-B5-16-0E-60 The model may be superseded by... Continue Reading →
Mineral datasets released in US, Canada and Australia
A few days ago the US Geological Survey released a compilation of geological, geophysical and mineral resource datasets from Australia, Canada and the US through a collaboration between the geological surveys. “The data release includes more than 40 earth science data layers, including a new map of variations in the Earth’s natural magnetic field for... Continue Reading →
Digital Geoscience Conference: Geological Society of London
Looking forward to speaking at this exciting conference in November this year. https://www.geolsoc.org.uk/DigitalGeo2023