OpportunityFinder™ is a first of its kind. A transformative data driven method that revolutionises ideation in Geoscience New Ventures Exploration and Innovation. The algorithm detects potential Hydrocarbon Plays in unstructured text which are not explicitly stated; assisting the Geoscientist to new lines of thought and knowledge which may not have previously been considered. This could... Continue Reading →
Machine Learning Models for Disambiguating Key Petroleum Geoscience Concepts in Text
I've created more ML models for disambiguating key petroleum systems concepts in text. These are needed when creating algorithms to parse text to detect patterns and extract entities to improve search (Information Retrieval), Visual Analytics or populate a KnowledgeGraph. For example, 'migration' (migrates, migrated, migrating) to go with the 'source' (as in source rock) model... Continue Reading →
Congratulations Dean Pereira de Melo!
Congratulations to Dean Pereira de Melo, Geological Data Manager at Petrobras in Brazil on his MSc with Distinction award! His dissertation on "Information Culture in Oil & Gas Companies" is an insightful work of value to academics & practitioners. He undertook his Masters in Petroleum Data Management at the University of Aberdeen where I acted... Continue Reading →
Erratics in Central Park, New York
Even amongst the hustle and bustle of New York City, Geological marvels can be found. I could not resist taking a photo of these 12 feet high boulders while visiting this week. Central Park is peppered with huge boulders that look precariously perched on top of the ancient glistening bedrock. These are rounded glacial 'erratics'... Continue Reading →
Transforming Text Extraction in Petroleum Geoscience through Machine Learning: 94.52% Accuracy
One of the key tasks in Natural Language Processing (NLP) for the Petroleum Geoscientist is detecting entities in text, such as 'source rock'. The challenge is that just using the term 'source rock' and it's plural form 'source rocks', would miss 22% (recall) of all occurrences (false negatives) for 'source' in its word sense of... Continue Reading →
Machine Learning in Oil & Gas Exploration: Clustering Annotations
I've clustered the labels I annotated recently for 22,528 sentences (extracted from randomly sampled public domain petroleum exploration reports). There are 73 labels, I've shown a subset in the poster above. The labels represent 96,197 label relations (arc edges). The hierarchical cluster heatmap (Metsalu and Vilo 2015) in the poster uses Pearson Correlation (rather than... Continue Reading →
Bypassed Information Pay
I have been thinking about conceptual models relating to the vast (and continually growing) unstructured text collections within enterprises. Regardless whether this is in hardcopy form in libraries/archives or digital form on file systems or document management systems. In the oil & gas industry, the concept of 'missed pay' is given to a reservoir zone... Continue Reading →
Ammonite Pavement
Always a privilege this week to see the 'ammonite pavements' in Lyme Regis on the Jurassic Coast in Dorset, UK. Above are some of my photographs of hundreds of large ammonites exposed at low tide. Thought provoking to imagine as you walk over the fossilised sea floor of 200 Million years ago, when the UK... Continue Reading →
Finished labelling 25,000 petroleum geoscience sentences for machine learning
It’s taken me 6 months elapsed time, but I have finally finished manually labelling 25,000 (yes - twenty-five thousand!) petroleum geoscience sentences from global public domain sources. I’m using these to experiment training a machine learning classifier which, using deep context, can predict the topics of any passage of geoscience text hitherto unseen by the... Continue Reading →
Introducing the DMA Model for Text Analytics
When presented with large volumes of text there are a number of techniques when applying text analytics. I developed the DMA Model as a simple conceptual way to categorize the main types. Rules based or machine learning techniques can be used individually or together for each of these 3 areas: Document Centric This scenario occurs... Continue Reading →