Some research I conducted recently comparing the counts of potential oil and gas source rock “mentions” by geological age in unstructured text, to some (rather old) actual data published in the literature on the age of hydrocarbons generated from source rocks in producing oil & gas fields. Over 48,000 terms from a lexicon were applied to 16 million geoscience sentences covering most countries.
There is a statistically significant correlation with two obvious anomalies. Potential Silurian source rocks appear under-represented by frequency in the text (based on their relative contribution to fields), whilst potential Eocene source rocks appear over-represented by frequency in the text (based on their relative contribution to fields). There may be a variety of causal factors to explain these and no assertions are drawn here. The point made is that data driven approaches (using text) can stack these data and bring anomalies to the surface that in themselves, may lead to interesting research questions or business ideas.
Reference: GEO ExPro – Rich Petroleum Source Rocks
#oilandgas #petroleum #geosciences #naturallanguageprocessing