
I’ve been assessing the potential of using patterns of words in large volumes of text to map geology. A hypothesis could be that there are subtle word association patterns in reports that might be useful in some way for geoscience. Perhaps by impacting uncertainty in our existing models or highlight differences that may warrant further investigation.
The map on the left is the distribution of the mineral plagioclase in soils from the eastern US (source: United States Geological Survey (USGS)). Presence of plagioclase likely indicates underlying mafic and intermediate igneous rocks, sedimentary rocks and recent alluvial deposits sourced from these rock types.
The map on the right is the vector cosine similarity between the dense vectors (text embeddings) of geographical place names to ‘plagioclase’. The closer the cosine is to 1 (reds) the ‘more similar’ the vector. The source is 4,000 public USGS reports that were used to create a text embeddings model using unsupervised machine learning. Research is ongoing as I perform more tests and collect more data.
#DigitalGeoscience #geosciences #digital #geotechnical #geohazards #miningexploration #oilandgas #carboncapture #hydrogeology #gismapping #artificialintelligence #naturallanguageprocessing #subsurface #gis
Leave a comment