
My previous posts showed how a Geoscientist can choose concepts and entities to cross plot against different contexts.
This example (7,000 mineral names) uses Principal Component Analysis (PCA) to cluster based on word vectors. This technique simply put, reduces high dimensional data to a 2D plane whilst retaining as much information as possible.
The minerals/native metals circled in red are copper, silver and gold. They share associations through their word vectors which is why they appear close together.
Using this technique (and others) we can find other associations, driven by the text, that are not so easily explainable by our current knowledge. This can lead to potential abductive discovery of new ideas, opportunities, and areas for further research.
If you wish to read more on PCA there is a good paper here by Jolliffe and Cadima (2016). Principal component analysis: a review and recent developments.
https://royalsocietypublishing.org/doi/10.1098/rsta.2015.0202
Leave a comment