I recently ran 1 million sentences from public domain geoscience literature articles & reports through the OpportunityFinder® algorithm.
The aim was detecting hydrocarbon exploration play elements and interesting combinations using Natural Language Processing (NLP) and Machine Learning.
This involved analysing over 2 Trillion possible permutations hidden within the text. Through iterative design, I arrived at an optimised method which means the Python based algorithm can complete this process in under one hour using a modest high street i5 laptop.
The key ingredients involved combining hash tables with hierarchical logic. I’m currently experimenting on additional techniques in this area.