
Transformers over pre-trained Large Language Models (LLM) can be applied to facts expressed in natural language ‘sentences’ to answer certain queries. They can perform the selections, joins and projections required. An advantage of this approach is that ‘the database’ has no predefined schema and queries can be written as people prefer.
Take these 3 sentences (which could be generated from structured data or exist in a document):
The Kimmeridge Clay Fm is Jurassic.
The Kimmeridge Clay is a Source Rock.
The Kimmeridge comprises marine claystones enriched with Uranium.
In order to answer the query “Can Jurassic Source Rocks contain Uranium?” one has to join data and infer across 3 sentences. Kimmeridge=Jurassic, Kimmeridge=Source Rock, Kimmeridge=Uranium therefore:
Jurassic Source Rock=Uranium
Due to the input limits of LLM’s, aggregation queries such as “How many Geological Formations are there?” are beyond simple solutions at this time but may be possible in the future with neural databases.
Reference: Thorne et al From Natural Language Processing to Neural Databases.
Leave a comment