
I’ve created text embeddings based on all of Geoscience Australia’s Stratigraphic Unit Descriptions (18,500+ data points). The example below is a boxplot of the similarity of ‘granite’ (as a word vector) on the x-axis through geological time (vectors on y-axis) old at the base, young at the top. The further to the right (closer to 1), the ‘more similar’ vectors are. I’ve been researching why the Permian stands out compared to other ages, as there is quite a bit of Permian aged granites. It could be due to the large volume of Permian age coal, which nudge the overall vector similarity to lower cosine numbers.
Leave a comment