I have been thinking about conceptual models relating to the vast (and continually growing) unstructured text collections within enterprises. Regardless whether this is in hardcopy form in libraries/archives or digital form on file systems or document management systems.
In the oil & gas industry, the concept of ‘missed pay’ is given to a reservoir zone containing economically recoverable hydrocarbons that was historically missed during initial interpretation.
This could have been due to human cognitive limitations at the time or subsequent changes in the body of knowledge that has not been reapplied to that situation.
The concept is typically a prediction (based on a model) until proven by targeting this zone for drilling/testing, often years after identification. The information used to build the model directly relates to imaging the subsurface. Modern methods use machine learning with training examples of ‘proven pay’ in order to surface overlooked areas with similar patterns.
Drawing on this, I see potential value in explicitly defining the broader concept of ‘Bypassed Information Pay’.
- Bypassed Information Pay defined in this context as overlooked information contained within a document or document collection which has the potential to lead to economic, health or social value.
As I have mentioned previously on this blog, back in 1986 Swanson, using Literature Based Discovery (LBD) techniques and the ‘ABC’ method, postulated a link between dietary fish oil and Reynauld’s disease. This was never stated explicitly in any medical paper, but was inferred by Swanson through -in effect – word association. It was proven 3 years later in a clinical trial. This is an example of Bypassed Information Pay.
The GeoDeepDive initiative run by the University of Wisconsin has produced automated Geoscience equivalents to Swanson’s medical ones. The creation of new knowledge from associative patterns from larger bodies of literature too vast to ever be read by one person.
Philosophically we could take a stance that in any collection of texts there is the potential for Bypassed Information Pay. This changes through time as more information is created enabling the emergence of something not visible a priori. We don’t know that to be true, but we could behave as if it is true. Certain strategies would therefore follow this stance and how we manage and exploit our textual resources.
Most data scientists applying Natural Language Processing (NLP) and Machine Learning to organisational texts are arguably in the pursuit of Bypassed Information Pay.
Any thoughts please get in touch at email@example.com.