The area of Natural Language processing (NLP) research has exploded in recent times. Building Large Language Models (LLM) is a big player within the NLP landscape, but not the only game in town. I would like to point you towards an excellent paper by Schopf et al (2023) who classified and analysed NLP research papers... Continue Reading →
Using Natural Language Processing (Transformers) for Subsurface Carbon Capture and Storage Site Selection.
Mathur et al (2023) published an interesting paper recently. Transformers for Site Assessment for Carbon Capture and Sequestration using Legacy Well Data Y Mathur, J Chen, I Folmar, Z Dong, Q Su, L Lu, M Sidahmed Third EAGE Digitalization Conference and Exhibition 2023 (1), 1-5, 2023 Carbon Capture and Sequestration (CCS) is one of the... Continue Reading →
Geoscience Sentiment (Using Text Embeddings)
I've been experimenting using text embeddings to generate sentiment of a corpus of documents. In this approach it is generated by geological age (but can be other contexts). Taking any input query e.g. "aquifer" then combining that (adding vectors) with geological age vectors and comparing to the cosine of the vector of various sentiment themes,... Continue Reading →
Word Vectors (Embeddings) through (Geological) Time.
I've created text embeddings based on all of Geoscience Australia's Stratigraphic Unit Descriptions (18,500+ data points). The example below is a boxplot of the similarity of 'granite' (as a word vector) on the x-axis through geological time (vectors on y-axis) old at the base, young at the top. The further to the right (closer to... Continue Reading →
Text embeddings – baseflow, interflow and runoff
Using word vectors from 1,500 papers in the NERC Open Research Archive (NORA) on hydrogeology, automatically comparing minerals to the three component system baseflow-interflow-runoff in a ternary diagram. #hydrology #hydrogeology #geology #naturallanguageprocessing #groundwater #geotechnical #geoenvironmental #mineralogy #artificialintelligence
Detecting objects on images in documents
It can be useful to detect objects on images within documents. I labelled boreholes/well objects on 30 public domain images to illustrate what results can be achieved in less than an hour on unseen data. There are many other use cases in the subsurface such as objects on borehole logs, satellite imagery, remote sensing, thin... Continue Reading →
Academic Publishing on Geoscience Natural Language Processing (NLP)
Its been 10 years since my first academic research into Natural Language Processing (NLP) applied to Geoscience. I've done a quick analysis in Google Scholar for one measure on how the discipline has evolved. It is exciting to see the probable exponential growth in the number of papers about (or referencing) this topic. I've also... Continue Reading →
Towards General Geoscience Artificial Intelligence Systems
Interesting article from Zhang and Xu (2023) postulating what geoscience language models may become. Multi-disciplinary, Multi-modal inputs and outputs. They state Language Model's capability for scenario planning and qualifying uncertainty mean it could be a critical tool to address important issues such as climate change, natural hazards and sustainable development of natural resources. They describe... Continue Reading →
Generating questions
I've been experimenting using ChatGPT to generate candidate questions given document text input. The example is on Ground Source Heat Pumps (GSHP) from a British Geological Survey Report in the NORA collection. It might be useful for organisations to store a 'question bank' of such Generative AI outputs (questions) for a corpus, sliced in numerous... Continue Reading →
Text Embeddings – no single truth!
I’ve been experimenting using text embeddings to identify relative topic emphasis in text corpora, as an example of similarity based unsupervised machine learning. The examples below show the relative similarity of the word vectors for ‘aquifer’ (top) and ‘groundwater’ (bottom) to word vectors of various forms of contamination, comparing the US Geological Survey public collection... Continue Reading →