It was Firth who first said a word’s meaning can be somewhat defined by the company it keeps – in other words its word associations. This theory is behind high dimensional vectorspace and many disambiguation techniques to determine the ‘sense’ of a word or phrase when it can have many meanings.
What is often not addressed is that the same word or phrase with the same meaning can be viewed differently based on the intent of the discipline or persona. In the example image above, word associations are shown to the term ‘permeability’ by descending frequency. On the left from petroleum engineering literature, on the right from petroleum geoscience literature. Colour coding is from a variant of the NASA SWEET Ontology.
Some interesting observations. The term ‘fault’ which is the 4th most popular association to permeability from geoscience text, is not even in the Top 50 for associations to permeability from petroleum engineering text. As are other terms like ‘facies’. Conversely, word associations to ‘permeability’ like ‘damage’ and ‘production’ in petroleum geoscience text, don’t get a mention in the Top #50 for geoscience literature.
As well as highlighting differences between the two disciplines for interest, these statistical patterns can be used for relevancy ranking for persona/discipline based intelligent search. The critical aspect here is building text embedding models which are tuned to diferent roles and disciplines. Normally all text tends to be ‘lumped’ together and these subtle patterns and diferences are smoothed out.
Big data is after all about small patterns.