STEPS Distinguished Lecture on Big Data: Geoscience Sentiment Analysis applied to Petroleum Systems


Thoroughly enjoyed giving the STEPS Distinguished Lecture on Big Data at ENI in Milan last week. A big thank you to Halliburton and ENI for inviting me, arranging and making me feel so welcome. Some fascinating questions!


First large scale empirical study of enterprise search


First large scale empirical study of enterprise search & discovery capability published in the Journal of Information Science (JIS) this week. Here

Many organizations have deployed ‘Google-like’ enterprise search engines in order to improve access to their own information, a key part of the digital workplace. Despite significant investments, it has been reported that dissatisfaction with search in the enterprise is widespread and enduring. A study was undertaken in order to develop a deeper understanding of what may be occurring. 

Using a large oil & gas company as a case study on their fourth generation of enterprise search technology, over 1,000 feedback comments from the user interface over a 2 year period were triangulated with interviews conducted with a search service team and management. This was combined with an extensive literature review.

Well known structural and formal factors for user satisfaction such as ‘information quality’, ‘technology quality’ and ‘service quality’ were identified. The study finding that 62% of user dissatisfaction events were likely due to non-technological factors may provide the first empirical support for what some enterprise search practitioners have been saying for some time: effective search capability in the enterprise requires more than technology. For some search queries, improving knowledge organization practices for structuring content may be more useful than tuning the search technology. In addition, the criticality of informal behaviours and agency (information literacy) was clearly identified, which is often downplayed or ignored altogether in the practitioner literature.

 The ‘Google Habitus’ was identified as a generative mechanism influencing expectations and behaviours at all levels in the organization for search, often leading to sub-optimal outcomes. There are aspects of search in the enterprise that differ considerably from Internet consumer based search, which has been well documented. Cognitive biases were postulated as another generative mechanism, such as simplicity bias (technology solutionism), where a preference for simple explanations ‘we can fix search with better technology’ often wins out over more complex explanations.

Whilst general purpose search capability is undoubtedly useful as a utility, approaches which also focus on very specific work tasks may be more likely to gain executive support. Advancing enterprise search capability is therefore likely to lend itself to multi-modal approaches; a system of agency and structure rather than any single component; not a single technology or interface, or single media type (text documents/web pages) or single set of behaviours. It is probable that organizations adopting holistic approaches towards search capability will in the long run, out-perform those that have more reductionist approaches.

STEPS Distinguished Lecture on Big Data

Invited to give the Distinguished Lecture on Big Data next month for the Science and Technology Exploration and Production (STEPS) program run by Halliburton. The program aims to foster geoscience excellence through the facilitation of thematic research and offers the opportunity for academics to engage with Landmark (Halliburton) and the wider exploration and production community.

The lecture title is Big Data – Small Patterns: Applying Geoscience Sentiment Analysis to Unstructured Text.

Will be sharing recent results and findings of the Geoscience aware sentiment AnalyZER (GAZER) algorithm I developed in Python which has been applied to Geological elements in public domain texts. It is designed to surface interesting associative patterns relating to concepts such as ‘source rock’, ‘reservoir’, ‘trap’ and ‘seal’ that might be unknown to exploration geoscientists as they are buried in volumes of documents too large to ever be read and too subtle to be detected by traditional search engines.

The hypothesis is that if a geoscientist can be surprised by these patterns, and there is legitimate evidence for that ‘surprise’, it is likely to lead to a learning event and potentially a new play/model; changing what people know – or think they know.

More here:–-Small-Patterns-

Sentiment Analysis of Oil Company Annual Reports


A research paper I co-authored with Laura Muir, Associate Professor at the School of Computing Edinburgh Napier University has been published this week in the Journal of Knowledge Organization.

It is being increasingly recognized that sentiment analysis is a key part of enterprise search & discovery capability.

We applied sentiment analysis to public oil company annual reports. One company stands out for its over-positive rhetoric, the “Pollyanna Effect” towards the future, relative to its peers.

A lexicon was developed to detect edge member strong and hesitant forward looking language. Biologically inspired diversity algorithms were used to identify word patterns over time in companies, compared to subsequent revenue changes. One oil company showed a statistically significant association: their diversity of strong/hesitant language increased prior to a subsequent decrease in relative business performance.

A major industrial accident was also detected in another company’s reports without a need to read them. These were manifested through spike increases in the relative frequency of the topic ‘lessons’ followed by a spike in topics relating to the ‘future’. The effects of the catastrophe were still evident in word patterns several years after its occurrence. This supports the probable existence of Discourse of Renewal Theory (DRT) in practice.

The findings support the assertion that various social phenomena can be found in company reports by analysing word patterns over time – and some may have predictive properties. There may be benefits of applying sentiment algorithms (as standard) in enterprise search and discovery deployments.

Links here: Issue 2 KO and Institutional Repository


Enterprise Search: New Methods for Inferring User Satisfaction ?

Measuring user satisfaction with an enterprise search tool can be difficult. Feedback mechanisms on the user interface tend to only capture a small self-selected sample that may be skewed towards negative views. Whilst surveys can capture more data, they are also self-selecting and tend to be small scale compared to actual enterprise usage. Clickthrough data is useful as a surrogate for search quality and session behaviour but does not necessarily translate into user satisfaction.

A small experiment was undertaken with a domain search tool in a large oil & gas company. Using the search log data, a random sample (n=47) of users who had used the search tool in the past 2 weeks were invited to participate in a questionnaire. They were asked to provide their level of satisfaction with the search tool based on the previous 2 week period using a 5 point Likert item. This was subsequently correlated with the existing search log data that they did not see (the number of days they had used the search tool during that 2 week period). Figure 1 shows the results.

Search Usage and Satisfaction Figure 1 – Search satisfaction against usage (number of days during a 2 week period)

There were 6 users who were very satisfied but only used the search tool once/twice during the 2 week period. Conversely, there were six users who were dissatisfied that used the search tool only once/twice during the 2 week period. Gender and age was not statistically significant.

The data points outlined in the red circle are interesting. In the small sample tested, all the users who had used the search tool on over 50% of the working days (5 working days) over the prior 2 week period (10 working days), were satisfied/very satisfied. These could be considered ‘happy repeat customers’.

This could be a marker for inferring from large volumes of search log data, one subset of users who are satisfied with a search tool. This could be a marker for inferring from large volumes of search log data, one subset of users who are satisfied with a search tool. It is only a subset, as this group only represented 30.7% of all users who were satisfied (recall) but it was 100% accurate (precision).

As a causal mechanism, it is postulated that it would be unlikely that a user would use a search tool in an enterprise ‘every other day’ if they were not getting some value out of it. An alternative explanation is that the user has no choice, they have to use the tool as there is no other way to locate their information, i.e. high usage does not necessarily translate into satisfaction. However, there is plenty of evidence for poor take-up of enterprise search tools (people find other ways to locate what they need), so the best explanation is likely to be that they have some positive experience with the tool to explain the recurring behaviour.

That is not to say that users who use a search tool less are not satisfied of course (as these data show). This could be one marker for companies to assess user satisfaction exploiting large usage volumes rather than self selecting surveys. At present, there is no statistical significance to this finding and the data set is small, presenting an area for further research.


Transforming Digital Worlds

transforming digital worlds

Along with 450 academics and practitioners, I attended the iSchools Transforming Digital Worlds conference this week at the University of Sheffield. Some fascinating presentations on information behaviour, information seeking and information retrieval.

I was particularly interested in the keynote from  Dr Lynn Connaway. Many of the messages although not new and perhaps well known to some, were put in a tone and context that really resonated with me – in a business world when we are often too quick to jump to the solution or answer:

“To identify why and how people get information we must first watch and listen”

“We need to understand motivations and expectations for using technologies”

In an interview study of 164 people from high schools and universities, some insightful gems were uncovered regarding digital literacy. This is in a landscape where critical thinking skills – the ability to examine the credibility and trustworthiness of information are increasingly significant. Take this quote from a 17 year old high school student gathered during interviews:

“I always stick with the first thing that comes up on Google because I think that’s the most popular site which means that’s the most correct.”

Connaway then makes the point “Critical thinking skills are a primary concern of university administrators and are crucial for developing an informed citizenry.” This was supported by the quote from a University Provost during the interviews:

We should be helping people learn how to think, learn how to be skeptical, learn how to use critical thinking skills, learn how to be self-reflective. I think because those things are so much harder to assess and to demonstrate we have not done as good a job telling that story.”

Although no mention was made of the business workplace, I have seen equivalent issues with digital literacy amongst seasoned professionals especially around ‘search’. Not only in their use of their own corporate search engines but also using Internet search engines for work.

This is by no means universal, for example, I was asked recently by a Geoscientist to recommend Internet search engines other than Google (e.g. duckduckgo) because they recognized and were concerned Google was personalizing the results too much and blinding them to potential information discoveries. There are many cases however, where I have observed critical geoscientific information missed in work tasks, simply because of search literacy capabilities.

Continuing to develop digital literacy capabilities in the ‘Digitalization’ workplace (not just how people use technology, but how they interact with information through technology) may be highly significant for organizations in gaining a competitive advantage.