My previous posts showed how a Geoscientist can choose concepts and entities to cross plot against different contexts. This example (7,000 mineral names) uses Principal Component Analysis (PCA) to cluster based on word vectors. This technique simply put, reduces high dimensional data to a 2D plane whilst retaining as much information as possible. The minerals/native... Continue Reading →
3D Word Vector Plots
3D Word Vector Visualisations: The provision of free web tools for all geoscientists to easily explore hidden semantic relations in textual content may increase the chances of abductive discovery in our discipline. Following on from previous posts on word vectors in space and time, in this example the 'first' axis is Lacustrine, 'second' is Evaporite... Continue Reading →
Text embeddings by geology age: query=paleosol
Space and Time are the two key contexts for the Geoscientist. My previous post showed a query result on a map (spatial) using word vector similarity. This box-plot shows time; the x-axis is geological age, old to young, left to right - (Precambrian on left, Quaternary on right). The y-axis is cosine similarity, to the... Continue Reading →
Using text embeddings to display search results spatially.
Building a statistical vector-space model from your corpus of text documents affords many advantages. Take a search query such as 'carbonatite'. Using text embeddings (vectors) we can display results not just on a map, but also by how 'similar' those locations are to the query. See the associated screenshot. This allows us to discover locations... Continue Reading →
Video online: Unstructured Data Management and Artificial Intelligence, Society of Professional Data Managers (SPDM)
The video of my talk in June was posted on youtube by the Society of Professional Data Managers (SPDM) last week. https://m.youtube.com/watch?v=VD4Z7FpzGuk #datamanagement #artificialintelligence
World’s First Geoscience Large Language Model
I have to thank Richard Scott from BHP who pointed me at this recent (June 2023) published paper. "Learning a Foundation Language Model for Geoscience Knowledge Understanding and Utilization" by Deng et al (2023). https://arxiv.org/abs/2306.05064 Thought the community at large would be interested. The researchers mainly from Shanghai Jiao Tong University claim to have created... Continue Reading →
Geological sub-discipline query popularity in Google
Global search queries (Google) on geological sub-disciplines past 12 months. Graph shows relative popularity over time and the map shows what dominates per country. The spike for engineering geology (yellow) on 6th February 2023 coincides with the 7.8 magnitude earthquake in Turkey. #geology #engineeringgeology #hydrogeology #volcanology #mininggeology #petroleumgeology
The rise of the vector database
The rise of the vector database. I’ve been writing about the use of word vectors in geoscience since 2015, but recently some exciting developments have emerged. A vector is an array of numbers which can be used to represent words based on complex word co-occurrence. Taking the cosine similarity between vectors enables us to find... Continue Reading →
Mission and vision statements of geological societies and surveys
I took the mission and vision statements from geological societies and surveys at International, European and National levels then semantically clustered in a word cloud. The colours are words grouped by similarity (share similar words). A few themes emerge (my interpretation): 1. Red - Using knowledge to address humanities challenges for a sustainable planet 2.... Continue Reading →
The grand challenges of geoscience
I created this blog exactly 8 years ago in mid 2015. The aim was to share ideas, research, technologies and methods on text analytics, search and data management applied to geoscience. This would hopefully stimulate and accelerate the exploitation of geoscience information by practitioners for the benefit of industry and society. It has gone from... Continue Reading →