When your only tool is a hammer, everything looks like a nail. One of the most frequent questions I get asked about Natural Language Processing (NLP) is which is best 'rules' or 'machine learning'. With the popularity, hype even, of Large Language Models (LLM) it would seem its obvious - machine learning because of its... Continue Reading →
Graph of Text Embeddings
Showing similar terms to a search query, in this case 'Pegmatite' using text embedddings in a graph network. This can support data exploration, the thickness of the edge is related to cosine similarity (the thicker the green line, the more similar the association). For this visualisation I modelled text embeddings into NetworkX to display in... Continue Reading →
Radar Plots Using Word Vectors
Radar plots using word vectors can be useful to compare what ‘the text’ of many reports might be saying (perhaps too much to read) to quantitative data on aspects such as risk or uncertainty. It may highlight mismatches or contradictions requiring further investigation. The illustrative example is driven from a corpus of millions of words... Continue Reading →
Ternary plots using word vectors
Ternary plots are used for visualisation and modelling in the geosciences for three component correlation. Typically they represent the compositions of soils, rocks and minerals. The interactive plots above have been generated just from the patterns of millions of words trained from geoscience literature. Each axis represents cosine similarity, the closer to 1 the more... Continue Reading →
Data-Driven Discovery in Geosciences: Opportunities and Challenges
Chen et al (2023) published a very interesting special edition editorial for Springer's Mathematical Geosciences recently. "This special collection explores scientific research related to data-driven discoveries in geosciences and provides a timely presentation of progress in developments and/or applications of AI and big data approaches to multiple aspects of geosciences. " I think this next... Continue Reading →
NASA BERT-E Earth Science Large Language Model (LLM)
To understand domain terminology effectively in areas like healthcare and geoscience, domain training has been shown to improve results. https://www.nature.com/articles/s41586-023-06291-2 BERT-E The NASA IMPACT team published a paper at AGU back in 2021 on BERT-E an Earth Science trained language model (270k articles) comparing to Sci-BERT (see screenshot). https://agu2021fallmeeting-agu.ipostersessions.com/default.aspx?s=9D-AC-B5-BA-E8-8D-CE-44-5F-17-8E-3F-B5-16-0E-60 The model may be superseded by... Continue Reading →
Mineral datasets released in US, Canada and Australia
A few days ago the US Geological Survey released a compilation of geological, geophysical and mineral resource datasets from Australia, Canada and the US through a collaboration between the geological surveys. “The data release includes more than 40 earth science data layers, including a new map of variations in the Earth’s natural magnetic field for... Continue Reading →
Digital Geoscience Conference: Geological Society of London
Looking forward to speaking at this exciting conference in November this year. https://www.geolsoc.org.uk/DigitalGeo2023
Unsupervised clustering of word vectors using Principal Component Analysis (PCA)
My previous posts showed how a Geoscientist can choose concepts and entities to cross plot against different contexts. This example (7,000 mineral names) uses Principal Component Analysis (PCA) to cluster based on word vectors. This technique simply put, reduces high dimensional data to a 2D plane whilst retaining as much information as possible. The minerals/native... Continue Reading →
3D Word Vector Plots
3D Word Vector Visualisations: The provision of free web tools for all geoscientists to easily explore hidden semantic relations in textual content may increase the chances of abductive discovery in our discipline. Following on from previous posts on word vectors in space and time, in this example the 'first' axis is Lacustrine, 'second' is Evaporite... Continue Reading →