Detecting surprise in text : Expert Centric Digital Technology


I presented at the Finding Petroleum Expert Centric Digital Technology event at the Geological Society of London today. A big thanks to Karl Jeffery for organizing. The themes were about putting the domain expert and models at the centre of technology designs. There were many insightful presentations including those from David Bamford (Director PetroMall Ltd), Dimitris Lyras (Director Lyras Shipping), Murray Callandar (CEO Eigen Ltd), Julian Zec (Chief Engineer National Oilwell Varco) and Paul Helm (Director Geologix).

I was particularly interested in the demonstrations by Paul Helm (Geologix) and pictured above. Using Unreal Engine  gaming engine (which is behind the game Fortnite) they have created collaborative, immersive experiences where people ‘forget they are in a simulation‘. Examples were shown for training of remote firefighting teams and replacing expensive physical Real Time Operating Centres (RTOC). I was particularly interested in the geoscience applications, using avatars to ‘dip into’ subsurface data and models to make decisions/gain greater insights.

I asked questions on heuristics regarding how long teams typically immerse themselves in headsets. As a rule of thumb, less than 10 mins appears more of a ‘play’, typical times are 10-15mins, 20-30mins for in-depth analysis/decision making before the headsets come off. There have been a few false dawns for VR/AR but my instincts tell me the technology has caught up with aspirations that have always existed in business.

Surprise – the ultimate disruptor

My presentation was about using algorithms to surface the ‘surprising’ in text using linguistic syntax, lexicons and machine learning. From a human-centric approach, a surprise is when we receive information which differs from our existing prior expectations. This differs from Shannon Surprise (unlikely events are surprising events), Bayesian Surprise (the bigger the impact a new data point has on model parameters, the bigger the surprise) and a view that information==surprise (if someone is telling you an answer you already know some argue you are not receiving any information). Critically these definitions do not cater for prior expectation.

Surprise can force us confront our own ignorance and potentially update our mental models (schema) through a learning event. So for people involved in generating ideas (e.g. oil & gas exploration geoscientists), being surprised by such algorithms is beneficial to creativity and learning, potentially leading to unexpected, insightful and valuable encounters (serendipity). This is especially relevant in an environment where there is too much potentially relevant information for a geoscientist to ever read, hiding new knowledge from us.

Like any complex system there are unlikely to be absolute laws; what is surprising for one geoscientist is not surprising to another geoscientist. However, some algorithms may have tendencies over others to suggest a ‘surprising piece of information’. Research supports this assertion.

In addition to presenting existing published research, I shared some active research where over the past couple of months I have painstakingly collated several thousand labelled example clues of ‘surprise’ found in public domain petroleum geoscience texts. For example:

  • “the grainstone thickness in reality turned out to be…”
  • this level of porosity had not been encountered before“.
  • “it turned out to be true!”
  • “the company entering Ghana was totally unexpected”

In addition there are syntax rules using Part of Speech (POS) tagging and statistical techniques which can highlight facts and assertions that may be surprising. For example, the sentence “The Blue Formation contained black oil staining” may or may not be surprising. There are statistical techniques based on concepts such as Topic Diversity and Temporal Emergence that can be used which may give ‘tendencies’ to suggest the surprising.

Although the concept of ‘surprise’ has been labelled as an emotion historically, others see it as a neutral epistemic concept with emotion (pleasant, unpleasant, denial, wonder) as a by-product.

I have applied this in a ‘fuzzy way’ to public domain texts to create a list of suggestions of associations/sentences deemed ‘surprising’. These are currently being labelled by retired geologists as to the extent to which they are deemed surprising, along with a control set not deemed ‘surprising’ by the algorithms.

The resulting output labels as well as assessing accuracy of various algorithms, can be used to train a machine learning model to refine the approach. More to follow.





2 thoughts on “Detecting surprise in text : Expert Centric Digital Technology

Add yours

  1. Hi Paul. Sorry i missed your talk. I was planning to attend but a client meeting came up last minute. The surprise concept is an important one for oil explorers. Will your talk be published? Regards Mike Naylor


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Website Powered by

Up ↑

%d bloggers like this: