Figure 1 – Association (Text) Frequency between chemical elements (by group) and hydrocarbon source rock (x-axis) and hydrocarbon occurrence (y-axis).
Deriving Hydrocarbon to Metal / Mineral associations found in unstructured text for use as potential ore analogues and exploratory data analysis: Applying Machine Learning and Natural Language Processing (NLP) to 16 Million Geoscience Sentences.
Hydrocarbon accumulations are well known geochemical barriers for saline metal bearing waters where metals are reduced and precipitated (such as Lead and Zinc sulphide minerals mined today).
In addition, it has been relatively recently discovered that migrating liquid hydrocarbons can also both source and be a carrier of significant concentrations of metals (such as Zinc, Nickel and Vanadium) transporting them considerable distances to form ore deposits. Hydrocarbon-bearing basinal brines may leach metalliferous source rocks to mobilise and deposit metals and Rare Earth Elements (REE).
Furthermore, some enriched (hydrothermal, other) organic black shales contain anomalously high concentrations of elements such as Copper, Selenium, Uranium, Vanadium, Chromium, Cobalt, Arsenic, Silver, Gold, Platinum and REE.
These are the materials needed for a clean energy future such as solar, wind power and electric vehicles.
The 48,000 lexicons in the OpportunityFinder algorithm for petroleum system analysis were combined with a list of all known elements, metals and minerals and applied to the unstructured text of public domain geoscience reports (16 Million sentences).
It is believed an exercise of this nature using such a large lexicon of subtle clues for hydrocarbons combined with all the names of elements, metals and minerals has never been undertaken before.
This produced a vast network (graph) of associations of Locations, Geological Age, Lithostratigraphy, Petroleum System Elements, Chemical Elements, Metals, Minerals and PPM.
The cross plot in Figure 1 shows one simple small slice of these data. This compares the different element association frequency to subtle evidence for potential organic source rocks versus subtle evidence for hydrocarbon migration.
For example, from the text, the rare metal Germanium has a relatively high (text) frequency of association to potential organic source rocks (mainly coals) but very low association to hydrocarbon occurrence (moveable hydrocarbons). Vanadium on the other hand has relatively high (text) frequency for both. Helium has a relatively high (text) frequency for hydrocarbon occurrence, but low for potential organic source rocks. This is probably because Helium in hydrocarbon accumulations is derived from radioactive decay of Uranium and Thorium in the source rock (not present there).
These are simply text associations from public domain geoscience literature, there are likely to be oddities as there are in any automated process. At first glance though, some well known mechanisms are visible through the patterns.
By stacking vast quantities of mining and petroleum literature and applying smart Machine Learning and NLP, we may surface interesting patterns hidden in plain sight leading to new knowledge. These could lead the Geoscientist to a line of thinking and business opportunity they may not have otherwise had.
#naturallanguageprocessing #geosciences #machinelearning #mining #petroleum #solar #wind #energytransition #digitaltransformation #bigdata #energy