Using Search Term Word Co-Occurrence for Browsing Search Results: What is most popular is not necessarily the most interesting.

I conducted some exploratory search research with geoscientists and engineers in numerous oil & gas companies back in 2014-2015, which I have recently revisited. Unlike lookup/known item search where a user is seeking something specific, an existing need, something they know exists where there is a ‘right answer’, exploratory search tasks are more loosely defined. Where a question is perhaps not fully formed in the mind, where learning and serendipitous discovery are invited by the searcher, there is no ‘right answer’. In this case, there are unlikely to be absolute laws, but there may be tendencies for certain algorithms to give a greater propensity for valuable information discoveries.

Although studies show these types of search tasks are smaller in number than the ‘lookup/known item’ search tasks in organizations, they can be of potentially higher value as they can lead to new knowledge discovery.

One area of research investigated search term word co-occurrence – the terms that occur ‘around’ the query terms made by a user in body text of articles (Fig 1). Data courtesy of the Society of Economic Geology (SEG) via GeoscienceWorld.

Fig 1 – The words that occur around a search query of ‘precambrian’ in the search results from thousands of geoscience articles. The words themselves are clustered by their similarity to one another (in clouds and list form). For example, clicking on the word ‘iron’ shows the user all the paragraphs where Precambrian and iron occur together .

In one experiment these were presented back to geoscientists and engineers as ‘filters’ and data captured on what filters seemed to be of most interest to people. One interesting finding, was that users clicked on as many terms outside the top 10 most frequently occurring (around their search terms in the body text of search results), as they did within the top 10 most frequently occurring (Fig 2). There appeared a latent need to ‘show me something I don’t already know’.

Fig 2 – Exactly the same content as Fig 1 but with the Top 30 most frequency co-occurring terms to the search term removed. Due to the ‘PowerLaw’ nature of statistical word frequency, what is most popular can often hide in some cases more interesting and unusual associations for subject matter experts. Many people thought this yielded more interesting terms to filter.

Another finding showed evidence that the specificity of the search terms entered by users, may be a predictor of what algorithm was most optimal to use for presenting co-occurrence filters to match intent. For example, for a broad term (such as ‘Geology’) this may be better suited to showing the very common words that occur around it in text as filters too match intent. Whereas a very specific search term ‘injectites’, filtering out the most common words around the term may have tendencies to be more beneficial. An area for further research.

Published Research Articles

Journal of Information Science here

Journal of Information and Knowledge Management here

Journal of Knowledge Organization here

Semantic Word Cloud here

Share this:

Leave a comment Cancel reply