Geological knowledge from patterns of words in text

It can be quite addictive to experiment plotting word vectors of various geological entities and classifications. Here I have chosen lithology (thousands of terms grouped into higher lithological classifications shown by the colours and symbols) using the word vector 'orogenic' on the x-axis and 'lacustrine' on the y-axis. The lithologies further to the right are... Continue Reading →

Word Vector Clustered Heatmap: Data Discovery.

I’ve used Python to cluster the word vectors of critical minerals and their ores (copper, lithium, nickel, cobalt and REE) to Lithostratigraphy in a corpus of geological texts. In the heatmap the greens are above the mean (cosine), reds below the mean, whites are around the mean. Minerals and Lithostrat with ‘similar’ profiles will cluster... Continue Reading →

Graph of Text Embeddings

Showing similar terms to a search query, in this case 'Pegmatite' using text embedddings in a graph network. This can support data exploration, the thickness of the edge is related to cosine similarity (the thicker the green line, the more similar the association). For this visualisation I modelled text embeddings into NetworkX to display in... Continue Reading →

Radar Plots Using Word Vectors

Radar plots using word vectors can be useful to compare what ‘the text’ of many reports might be saying (perhaps too much to read) to quantitative data on aspects such as risk or uncertainty. It may highlight mismatches or contradictions requiring further investigation. The illustrative example is driven from a corpus of millions of words... Continue Reading →

Ternary plots using word vectors

Ternary plots are used for visualisation and modelling in the geosciences for three component correlation. Typically they represent the compositions of soils, rocks and minerals. The interactive plots above have been generated just from the patterns of millions of words trained from geoscience literature. Each axis represents cosine similarity, the closer to 1 the more... Continue Reading →

Data-Driven Discovery in Geosciences: Opportunities and Challenges

Chen et al (2023) published a very interesting special edition editorial for Springer's Mathematical Geosciences recently. "This special collection explores scientific research related to data-driven discoveries in geosciences and provides a timely presentation of progress in developments and/or applications of AI and big data approaches to multiple aspects of geosciences. " I think this next... Continue Reading →

NASA BERT-E Earth Science Large Language Model (LLM)

To understand domain terminology effectively in areas like healthcare and geoscience, domain training has been shown to improve results. https://www.nature.com/articles/s41586-023-06291-2 BERT-E The NASA IMPACT team published a paper at AGU back in 2021 on BERT-E an Earth Science trained language model (270k articles) comparing to Sci-BERT (see screenshot). https://agu2021fallmeeting-agu.ipostersessions.com/default.aspx?s=9D-AC-B5-BA-E8-8D-CE-44-5F-17-8E-3F-B5-16-0E-60 The model may be superseded by... Continue Reading →

Website Powered by WordPress.com.

Up ↑