A few days ago the US Geological Survey released a compilation of geological, geophysical and mineral resource datasets from Australia, Canada and the US through a collaboration between the geological surveys. “The data release includes more than 40 earth science data layers, including a new map of variations in the Earth’s natural magnetic field for... Continue Reading →
Digital Geoscience Conference: Geological Society of London
Looking forward to speaking at this exciting conference in November this year. https://www.geolsoc.org.uk/DigitalGeo2023
Unsupervised clustering of word vectors using Principal Component Analysis (PCA)
My previous posts showed how a Geoscientist can choose concepts and entities to cross plot against different contexts. This example (7,000 mineral names) uses Principal Component Analysis (PCA) to cluster based on word vectors. This technique simply put, reduces high dimensional data to a 2D plane whilst retaining as much information as possible. The minerals/native... Continue Reading →
3D Word Vector Plots
3D Word Vector Visualisations: The provision of free web tools for all geoscientists to easily explore hidden semantic relations in textual content may increase the chances of abductive discovery in our discipline. Following on from previous posts on word vectors in space and time, in this example the 'first' axis is Lacustrine, 'second' is Evaporite... Continue Reading →
Text embeddings by geology age: query=paleosol
Space and Time are the two key contexts for the Geoscientist. My previous post showed a query result on a map (spatial) using word vector similarity. This box-plot shows time; the x-axis is geological age, old to young, left to right - (Precambrian on left, Quaternary on right). The y-axis is cosine similarity, to the... Continue Reading →
Using text embeddings to display search results spatially.
Building a statistical vector-space model from your corpus of text documents affords many advantages. Take a search query such as 'carbonatite'. Using text embeddings (vectors) we can display results not just on a map, but also by how 'similar' those locations are to the query. See the associated screenshot. This allows us to discover locations... Continue Reading →
Video online: Unstructured Data Management and Artificial Intelligence, Society of Professional Data Managers (SPDM)
The video of my talk in June was posted on youtube by the Society of Professional Data Managers (SPDM) last week. https://m.youtube.com/watch?v=VD4Z7FpzGuk #datamanagement #artificialintelligence
World’s First Geoscience Large Language Model
I have to thank Richard Scott from BHP who pointed me at this recent (June 2023) published paper. "Learning a Foundation Language Model for Geoscience Knowledge Understanding and Utilization" by Deng et al (2023). https://arxiv.org/abs/2306.05064 Thought the community at large would be interested. The researchers mainly from Shanghai Jiao Tong University claim to have created... Continue Reading →
Geological sub-discipline query popularity in Google
Global search queries (Google) on geological sub-disciplines past 12 months. Graph shows relative popularity over time and the map shows what dominates per country. The spike for engineering geology (yellow) on 6th February 2023 coincides with the 7.8 magnitude earthquake in Turkey. #geology #engineeringgeology #hydrogeology #volcanology #mininggeology #petroleumgeology
The rise of the vector database
The rise of the vector database. I’ve been writing about the use of word vectors in geoscience since 2015, but recently some exciting developments have emerged. A vector is an array of numbers which can be used to represent words based on complex word co-occurrence. Taking the cosine similarity between vectors enables us to find... Continue Reading →