I've created text embeddings based on all of Geoscience Australia's Stratigraphic Unit Descriptions (18,500+ data points). The example below is a boxplot of the similarity of 'granite' (as a word vector) on the x-axis through geological time (vectors on y-axis) old at the base, young at the top. The further to the right (closer to... Continue Reading →
Text embeddings – baseflow, interflow and runoff
Using word vectors from 1,500 papers in the NERC Open Research Archive (NORA) on hydrogeology, automatically comparing minerals to the three component system baseflow-interflow-runoff in a ternary diagram. #hydrology #hydrogeology #geology #naturallanguageprocessing #groundwater #geotechnical #geoenvironmental #mineralogy #artificialintelligence
Detecting objects on images in documents
It can be useful to detect objects on images within documents. I labelled boreholes/well objects on 30 public domain images to illustrate what results can be achieved in less than an hour on unseen data. There are many other use cases in the subsurface such as objects on borehole logs, satellite imagery, remote sensing, thin... Continue Reading →
Academic Publishing on Geoscience Natural Language Processing (NLP)
Its been 10 years since my first academic research into Natural Language Processing (NLP) applied to Geoscience. I've done a quick analysis in Google Scholar for one measure on how the discipline has evolved. It is exciting to see the probable exponential growth in the number of papers about (or referencing) this topic. I've also... Continue Reading →
Towards General Geoscience Artificial Intelligence Systems
Interesting article from Zhang and Xu (2023) postulating what geoscience language models may become. Multi-disciplinary, Multi-modal inputs and outputs. They state Language Model's capability for scenario planning and qualifying uncertainty mean it could be a critical tool to address important issues such as climate change, natural hazards and sustainable development of natural resources. They describe... Continue Reading →
Generating questions
I've been experimenting using ChatGPT to generate candidate questions given document text input. The example is on Ground Source Heat Pumps (GSHP) from a British Geological Survey Report in the NORA collection. It might be useful for organisations to store a 'question bank' of such Generative AI outputs (questions) for a corpus, sliced in numerous... Continue Reading →
Text Embeddings – no single truth!
I’ve been experimenting using text embeddings to identify relative topic emphasis in text corpora, as an example of similarity based unsupervised machine learning. The examples below show the relative similarity of the word vectors for ‘aquifer’ (top) and ‘groundwater’ (bottom) to word vectors of various forms of contamination, comparing the US Geological Survey public collection... Continue Reading →
Text Embeddings App
Using text embeddings for lookbacks. I’m making this app freely available to the Norwegian Petroleum Directorate and UK North Sea Transition Authority along with various NLP outputs for the benefit of the geoscience community. This is as input to the hackathon organised by FORCE led by Peter Bormann This particular example uses 800 license relinquishment... Continue Reading →
Discovering topics in text
Discovering topics in text. This is an interactive noun-noun-phrase network of body text within 1,500 UK NERC Open Research Archive (NORA) groundwater hydrology reports related to aquifers. These inductive statistical type techniques can be a useful first pass to assess key topics and trends in a large amount of documents. Reference van Eck, N.J. and... Continue Reading →
Text Embeddings – Analogies
Text embeddings can capture some interesting semantic relationships. Given an analogy “Quartz is to Sandstone” what “…….. is to Limestone” - using vector additions and subtractions, latent trajectories in embedding space produce “calcite” as the answer. Given enough text, this technique may be capable of producing results that spark new lines of thought in science... Continue Reading →