Machine Learning in Oil & Gas Exploration: Clustering Annotations

Matrix

I’ve clustered the labels I annotated recently for 22,528 sentences (extracted from randomly sampled public domain petroleum exploration reports). There are 73 labels, I’ve shown a subset in the poster above. The labels represent 96,197 label relations (arc edges).

The hierarchical cluster heatmap (Metsalu and Vilo 2015) in the poster uses Pearson Correlation (rather than Euclidean) better suited for text extractions and clustering ‘DNA’ profiles of geoscience elements. The red/orange colours indicate above average association between labels, those in dark blue show the opposite. Label categories (commercial, geoscience, petroleum system, potential negativity and potential play/opportunity) are colour coded on the edges of the heatmap using pastel colours. Principal Component Analysis (PCA) and KnowledgeGraph plots are also included in the poster to hint at the richness of these annotations.

I’ve highlighted a few areas where there is preferential annotation association and groups. For example, ‘tectonics’ with ‘magmatism’ and at a finer scale within the ‘petroleum system’, the association between ‘salt tectonics’ and ‘trap’.

This is all a bit of fun really as this clustering is just using the annotations, so pretty coarse. It is unlikely a geoscientist will discover something they don’t already know. In the coming weeks and months I will start building supervised machine learning models with much finer grained statistical models using the words in the labelled sentences (467,314 words) and the annotations. The word order combinations in such a set will run into many millions.

I have created labels such as ‘opportunity’ where I identified potentially favourable situations for oil and gas plays and opportunities. Using these (combined with other labels) I will test to what extent (if at all) ‘unseen’ favourable patterns can be detected highlighting potentially new oil and gas plays and opportunities just from text.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Powered by WordPress.com.

Up ↑

%d bloggers like this: