It’s taken me 6 months elapsed time, but I have finally finished manually labelling 25,000 (yes – twenty-five thousand!) petroleum geoscience sentences from global public domain sources.
I’m using these to experiment training a machine learning classifier which, using deep context, can predict the topics of any passage of geoscience text hitherto unseen by the algorithm.
I used a very specific methodology when labelling which will allow a variety of novel use cases. One use case I’ll be testing is the potential to predict contexts which could lead to new plays and opportunities.