Big Data in the Geosciences : Geoscience Aware Sentiment Analyzer

Blog_Picture5

Geoscience-aware text sentiment algorithm improves on out-of-the-box specific sentiment tools like IBM Watson, Google, Microsoft and Amazon by over 30% for geoscience sentiment in text.

Presented early research findings today at the Janet Watson ‘Big Data in the Geosciences’ conference at the Geological Society of London.

Google opened proceedings with a talk on Satellite Imagery and the Earth Engine, subsequent talks ranged from using Twitter for early warnings of Earthquakes, Virtual Reality and Digital Analogues through to applying deep learning to detect volcano deformation. Some fascinating insights.

My latest research addressed sentiment/tone, the context, around mentions of petroleum system elements (such as source rock, migration, reservoir and trapping) in literature, company reports and presentations. The hypothesis is that stacked somewhat independent opinion/tone in text, the averages, the outliers, the contradictions –may potentially show geoscientists what they don’t know and challenge what they think they do know.

The research question was to assess whether a geoscience-aware algorithm could improve on existing API’s/algorithms in use for sentiment analysis and how useful resulting visualization might be.

Using a held-back set of 750 labelled examples to test, the Geoscience Aware text sentiment analyZER (GAZER) algorithm achieved 90.4% accuracy for two classes (positive and negative) and 84.26% accuracy for 3 classes (positive, negative and neutral sentiment). This compared favourably with generic paragraph Vector and Naïve Bayes out-of-the-box generic approaches. It also compares favourably to the out-of-the-box sentiment Cloud API’s from IBM Watson, Microsoft, Amazon and Google that averaged approximately 50% accuracy for the 3 classes.

This supports findings in in other areas showing the need for customization for sentiment in domain areas and the criticality of specific training data for the work task in hand. The findings also support existing literature that suggested generative probabilistic machine learning algorithms may perform better than discriminatory ones when trying to classify snippets of information such as sentences and bullets in PowerPoint presentations.

Early evidence suggested resulting visualizations such as streamgraphs of the sentiment data could be used to challenge individual biases and organizational dogma, potentially generating new knowledge – presenting an area for further research.

Presentation available in SlideShare Click Here

750 Labelled sentences (the test set) and simple Python Extraction Script on Github

Advertisements

2 thoughts on “Big Data in the Geosciences : Geoscience Aware Sentiment Analyzer

Add yours

  1. Happy to have contributed to the “Retired Geologist’s label” !

    More seriously though, we seem to be converging even to the point of employing the same language (stacking opinion) but does the experiment I am compiling on identifying valid petroleum systems using observational scores compiled from petroleum system elements get us to a similar place or is there no connection?

    Hope it went well for you!

    Guy
    ___

    From: “Enterprise Search & Discovery: Systems Thinking”
    Reply-To: Systems Thinking
    Date: Tuesday, 27 February 2018 at 20:16
    To: Guy WF Loftus
    Subject: [New post] Big Data in the Geosciences : Geoscience Aware Sentiment Analyzer

    phcleverley posted: ” Geoscience aware text sentiment algorithm improves on out-of-the-box specific sentiment tools like IBM Watson, Google, Microsoft and Amazon by over 30% for geoscience sentiment in text. Presented early research findings today at the Janet Watson ‘Big D”

    Like

    1. Thanks Guy and your help labelling the sentences was very much appreciated. I hope to publish a research paper on this later this year. Your ground breaking work using the minds of people ‘today’ does, in my opinion, fit nicely with the stacking of tone and opinion already present in the explicit written literature. It would be fascinating to see where the outputs converge..and contradict.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Powered by WordPress.com.

Up ↑

%d bloggers like this: