Automatically Summarizing Petroleum Exploration Texts by Events and Dates.

One form of text summarization is by a timeline of some sort. In academic literature, this can help follow a discourse through time using bibliographic reference dates in the body of text.

In business literature, this may be more related to events and dates of some activity. In Petroleum Exploration for example, it may refer to the opening up of acreage, license rounds, seismic surveys, well drilling, dry holes or hydrocarbon discoveries, farm-in’s, field development, relinquishments and so forth.

It is relatively easy using Named Entity Recognition (NER) techniques to detect many patterns in text including People, Places and Locations Stanford GATE . Accepting that language understanding is hard so nothing is perfect (but then humans make mistakes as well..).

Dates are also straightforward, although the range of possibilities to express times and dates can be vast in certain contexts. Python has several libraries, there is also research from Facebook duckling .

A particularly useful web tool in my opinion that illustrates the potential of what can be done applying these techniques is TimeLineCurator by the University of British Columbia InfoVis Group.. A nice overview diagram of Visual Analytics is here.

For example, the image below (Figure 1) shows events automatically detected in text discussing the exploration history of the Norwegian Sea.


Figure 1 – Automatic Summary of Exploration History in a Basin

On the far left in the top half of the screen, exploration begins (1980’s), the black circle highlighted allows the user to interrogate key events (in this case the first Permian Discovery by Statoil in 1994), moving towards present day on the right. Sometimes dates are points (circles), in other cases ranges (lines). In this case the different colours are different information collections (e.g. NPD v Oil & Gas Journal). The panels in the bottom half of the screen show the text fragments/sentences on interrogation.

These interactive visuals may be particularly useful to interrogate a body of text that is simply too large (in this age of big data) for a human to read, given some time constraint.

Our cognitive processing limitations.

We know 95% of the time we never look beyond page 1 in Google. In these cases paraphrasing Nicholas Carr, instead of a scuba diver in a sea of words, we zip along the surface on a Jet Ski.

So these techniques may provide some use in surfacing events of interest, that we may have otherwise missed (or simply don’t know we missed). An area for a deeper dive.


TimeLineCurator: Interactive Authoring of Visual Timelines from Unstructured Text
IEEE Transactions on Visualization and Computer Graphics (TVCG).
Proceedings of
IEEE Conference on Visual Analytics Science and Technology (VAST), Chicago, USA, 2015.

Machine Learning in the Subsurface

Excited to be presenting at the Machine Learning in the Subsurface seminar on 20th September in Stavanger, Norway. Organised by the Norwegian Petroleum Directorate (NPD), ConocoPhillips, Repsol, Equinor and AkerBP. More here

Extracting Knowledge from Text using AI


Thoroughly enjoyed two days workshops with the Oil and Gas Technology Centre (OGTC) this week in Aberdeen. The OGTC’s goal is to maximise economic recovery from the UK Continental Shelf, supported by Government.

As well as participating in workshops, I also shared some results of my research on predictive geoscience sentiment analysis and its role to stimulate new insights. Thanks to all the staff for coordinating the event and some great participation from Operators, Service Companies and Academia.

An exciting time to be involved in Geoscience and Data Science!

Artificially Intelligent Sub-Surface

Delighted to be invited as a keynote speaker for the Oil and Gas Technology Centre (OGTC) workshop on artificially intelligent sub-surface this month, 19-20 June in Aberdeen, representing Robert Gordon University.


Artificially intelligent sub-surface is one of the six themes the OGTC are working on for Digital Transformation in the oil and gas industry. More here:


Review of Enterprise Search: Journal of Information Science Paper

Martin White (Visiting Professor at the University of Sheffield and Managing Director of IntranetFocus) has written a review of a recent academic paper I authored Here with Professor Simon Burnett on enterprise search:

“Dr Paul Cleverley and Professor Simon Burnett (Robert Gordon University) have published in the Journal of Information Science what is without doubt a landmark research paper on the factors that influence user satisfaction with enterprise search applications”

“No matter how small or large your organization, if you have responsibility for search management you should be taking this remarkable paper, marking it up para by para, and then using it to benchmark your approach to achieving the levels of search satisfaction that your employees expect”

“This research will change the way that the enterprise search community (and that includes software vendors) consider the opportunities and challenges of effective enterprise search management”

First large scale empirical study of enterprise search


First large scale empirical study of enterprise search & discovery capability published in the Journal of Information Science (JIS) this week. Here

Many organizations have deployed ‘Google-like’ enterprise search engines in order to improve access to their own information, a key part of the digital workplace. Despite significant investments, it has been reported that dissatisfaction with search in the enterprise is widespread and enduring. A study was undertaken in order to develop a deeper understanding of what may be occurring. 

Using a large oil & gas company as a case study on their fourth generation of enterprise search technology, over 1,000 feedback comments from the user interface over a 2 year period were triangulated with interviews conducted with a search service team and management. This was combined with an extensive literature review.

Well known structural and formal factors for user satisfaction such as ‘information quality’, ‘technology quality’ and ‘service quality’ were identified. The study finding that 62% of user dissatisfaction events were likely due to non-technological factors may provide the first empirical support for what some enterprise search practitioners have been saying for some time: effective search capability in the enterprise requires more than technology. For some search queries, improving knowledge organization practices for structuring content may be more useful than tuning the search technology. In addition, the criticality of informal behaviours and agency (information literacy) was clearly identified, which is often downplayed or ignored altogether in the practitioner literature.

 The ‘Google Habitus’ was identified as a generative mechanism influencing expectations and behaviours at all levels in the organization for search, often leading to sub-optimal outcomes. There are aspects of search in the enterprise that differ considerably from Internet consumer based search, which has been well documented. Cognitive biases were postulated as another generative mechanism, such as simplicity bias (technology solutionism), where a preference for simple explanations ‘we can fix search with better technology’ often wins out over more complex explanations.

Whilst general purpose search capability is undoubtedly useful as a utility, approaches which also focus on very specific work tasks may be more likely to gain executive support. Advancing enterprise search capability is therefore likely to lend itself to multi-modal approaches; a system of agency and structure rather than any single component; not a single technology or interface, or single media type (text documents/web pages) or single set of behaviours. It is probable that organizations adopting holistic approaches towards search capability will in the long run, out-perform those that have more reductionist approaches.