The Three Laws of Data Management

Inspired by Isaac Asimov’s Three Laws of Robotics from the 1940’s. I’ve applied these to Data Management in a world of Frontier AI. This is to provoke debate and discussion in a light hearted way in what is a fast moving emerging field.

The first law is about the governance of Data Management (including documents and other forms). The second is about the quality and provenance of data, and the third is about the discoverability of these data and their derivatives.

My observations as a practitioner in this area focus here on the third law. Specifically Large Language Models (LLM) and unstructured data.

In order to apply Artificial Intelligence (AI) to vast amounts of organisational unstructured information often held in Document Management Systems, there is a need to crack open these repositories and proprietary formats such as PDF, Word and PPT, in order to create a ‘text lake’ which is AI-ready. This in itself poses questions on governance.

Furthermore, when this text lake is consumed by LLM’s either through Retrieval Augmented Generation (RAG) or fine tuning, the resulting derivatives such as answers and summaries can lose their link to source provenance and quality, not to mention ‘hallucinations’ and potential shrouding of falsehoods in plausible language.

ChatGPT in itself, if you go along with these laws, breaks the third law. This is because the consumer of the outputs has no idea of the provenance and quality of where the answers, summaries and recommendations came from.

With care and thought, organisations can (and are) building systems that are compliant with the third law in this frontier AI space. The takeaway may be that Data Managers have a key role to play to uphold the laws.

#artificialintelligence #chatgpt #largelanguagemodels #datamanagement #data #informationmanagement #digital

Leave a comment

Website Powered by WordPress.com.

Up ↑