Review of Enterprise Search: Journal of Information Science Paper

Martin White (Visiting Professor at the University of Sheffield and Managing Director of IntranetFocus) has written a review of a recent academic paper I authored Here with Professor Simon Burnett on enterprise search:

“Dr Paul Cleverley and Professor Simon Burnett (Robert Gordon University) have published in the Journal of Information Science what is without doubt a landmark research paper on the factors that influence user satisfaction with enterprise search applications”

“No matter how small or large your organization, if you have responsibility for search management you should be taking this remarkable paper, marking it up para by para, and then using it to benchmark your approach to achieving the levels of search satisfaction that your employees expect”

“This research will change the way that the enterprise search community (and that includes software vendors) consider the opportunities and challenges of effective enterprise search management”


First large scale empirical study of enterprise search


First large scale empirical study of enterprise search & discovery capability published in the Journal of Information Science (JIS) this week. Here

Many organizations have deployed ‘Google-like’ enterprise search engines in order to improve access to their own information, a key part of the digital workplace. Despite significant investments, it has been reported that dissatisfaction with search in the enterprise is widespread and enduring. A study was undertaken in order to develop a deeper understanding of what may be occurring. 

Using a large oil & gas company as a case study on their fourth generation of enterprise search technology, over 1,000 feedback comments from the user interface over a 2 year period were triangulated with interviews conducted with a search service team and management. This was combined with an extensive literature review.

Well known structural and formal factors for user satisfaction such as ‘information quality’, ‘technology quality’ and ‘service quality’ were identified. The study finding that 62% of user dissatisfaction events were likely due to non-technological factors may provide the first empirical support for what some enterprise search practitioners have been saying for some time: effective search capability in the enterprise requires more than technology. For some search queries, improving knowledge organization practices for structuring content may be more useful than tuning the search technology. In addition, the criticality of informal behaviours and agency (information literacy) was clearly identified, which is often downplayed or ignored altogether in the practitioner literature.

 The ‘Google Habitus’ was identified as a generative mechanism influencing expectations and behaviours at all levels in the organization for search, often leading to sub-optimal outcomes. There are aspects of search in the enterprise that differ considerably from Internet consumer based search, which has been well documented. Cognitive biases were postulated as another generative mechanism, such as simplicity bias (technology solutionism), where a preference for simple explanations ‘we can fix search with better technology’ often wins out over more complex explanations.

Whilst general purpose search capability is undoubtedly useful as a utility, approaches which also focus on very specific work tasks may be more likely to gain executive support. Advancing enterprise search capability is therefore likely to lend itself to multi-modal approaches; a system of agency and structure rather than any single component; not a single technology or interface, or single media type (text documents/web pages) or single set of behaviours. It is probable that organizations adopting holistic approaches towards search capability will in the long run, out-perform those that have more reductionist approaches.

STEPS Distinguished Lecture on Big Data

Invited to give the Distinguished Lecture on Big Data next month for the Science and Technology Exploration and Production (STEPS) program run by Halliburton. The program aims to foster geoscience excellence through the facilitation of thematic research and offers the opportunity for academics to engage with Landmark (Halliburton) and the wider exploration and production community.

The lecture title is Big Data – Small Patterns: Applying Geoscience Sentiment Analysis to Unstructured Text.

Will be sharing recent results and findings of the Geoscience aware sentiment AnalyZER (GAZER) algorithm I developed in Python which has been applied to Geological elements in public domain texts. It is designed to surface interesting associative patterns relating to concepts such as ‘source rock’, ‘reservoir’, ‘trap’ and ‘seal’ that might be unknown to exploration geoscientists as they are buried in volumes of documents too large to ever be read and too subtle to be detected by traditional search engines.

The hypothesis is that if a geoscientist can be surprised by these patterns, and there is legitimate evidence for that ‘surprise’, it is likely to lead to a learning event and potentially a new play/model; changing what people know – or think they know.

More here:–-Small-Patterns-

Sentiment Analysis of Oil Company Annual Reports


A research paper I co-authored with Laura Muir, Associate Professor at the School of Computing Edinburgh Napier University has been published this week in the Journal of Knowledge Organization.

It is being increasingly recognized that sentiment analysis is a key part of enterprise search & discovery capability.

We applied sentiment analysis to public oil company annual reports. One company stands out for its over-positive rhetoric, the “Pollyanna Effect” towards the future, relative to its peers.

A lexicon was developed to detect edge member strong and hesitant forward looking language. Biologically inspired diversity algorithms were used to identify word patterns over time in companies, compared to subsequent revenue changes. One oil company showed a statistically significant association: their diversity of strong/hesitant language increased prior to a subsequent decrease in relative business performance.

A major industrial accident was also detected in another company’s reports without a need to read them. These were manifested through spike increases in the relative frequency of the topic ‘lessons’ followed by a spike in topics relating to the ‘future’. The effects of the catastrophe were still evident in word patterns several years after its occurrence. This supports the probable existence of Discourse of Renewal Theory (DRT) in practice.

The findings support the assertion that various social phenomena can be found in company reports by analysing word patterns over time – and some may have predictive properties. There may be benefits of applying sentiment algorithms (as standard) in enterprise search and discovery deployments.

Links here: Issue 2 KO and Institutional Repository


Enterprise Search: New Methods for Inferring User Satisfaction ?

Measuring user satisfaction with an enterprise search tool can be difficult. Feedback mechanisms on the user interface tend to only capture a small self-selected sample that may be skewed towards negative views. Whilst surveys can capture more data, they are also self-selecting and tend to be small scale compared to actual enterprise usage. Clickthrough data is useful as a surrogate for search quality and session behaviour but does not necessarily translate into user satisfaction.

A small experiment was undertaken with a domain search tool in a large oil & gas company. Using the search log data, a random sample (n=47) of users who had used the search tool in the past 2 weeks were invited to participate in a questionnaire. They were asked to provide their level of satisfaction with the search tool based on the previous 2 week period using a 5 point Likert item. This was subsequently correlated with the existing search log data that they did not see (the number of days they had used the search tool during that 2 week period). Figure 1 shows the results.

Search Usage and Satisfaction Figure 1 – Search satisfaction against usage (number of days during a 2 week period)

There were 6 users who were very satisfied but only used the search tool once/twice during the 2 week period. Conversely, there were six users who were dissatisfied that used the search tool only once/twice during the 2 week period. Gender and age was not statistically significant.

The data points outlined in the red circle are interesting. In the small sample tested, all the users who had used the search tool on over 50% of the working days (5 working days) over the prior 2 week period (10 working days), were satisfied/very satisfied. These could be considered ‘happy repeat customers’.

This could be a marker for inferring from large volumes of search log data, one subset of users who are satisfied with a search tool. This could be a marker for inferring from large volumes of search log data, one subset of users who are satisfied with a search tool. It is only a subset, as this group only represented 30.7% of all users who were satisfied (recall) but it was 100% accurate (precision).

As a causal mechanism, it is postulated that it would be unlikely that a user would use a search tool in an enterprise ‘every other day’ if they were not getting some value out of it. An alternative explanation is that the user has no choice, they have to use the tool as there is no other way to locate their information, i.e. high usage does not necessarily translate into satisfaction. However, there is plenty of evidence for poor take-up of enterprise search tools (people find other ways to locate what they need), so the best explanation is likely to be that they have some positive experience with the tool to explain the recurring behaviour.

That is not to say that users who use a search tool less are not satisfied of course (as these data show). This could be one marker for companies to assess user satisfaction exploiting large usage volumes rather than self selecting surveys. At present, there is no statistical significance to this finding and the data set is small, presenting an area for further research.