Text Embeddings for Minerals and Lithologies to support Data Discovery

I’ve been experimenting taking a large volume of text, building embeddings and using PCA for dimensionality reduction. These data can be an input for clustering e.g. k-means. In this example I’ve used thousands of minerals and lithologies. I’ve highlighted some of the associations to illustrate. Where there are associations (complex word association co-occurrence) that are not commonly known or obvious, may point to an area for further research or investigation.

Whilst there are published examples using these types of techniques for minerals (Lawley et al 2022) I had not seen lithologies included before with minerals. There are many examples where latent implicit associative knowledge hidden in large amounts of text has yielded new scientific discoveries for societal and industrial impact.

Leave a comment

Website Powered by WordPress.com.

Up ↑