Democratisation of Generative Artificial Intelligence (AI)

Applying Generative AI is easier than perhaps many people may think. The hard work has been done by the engineers and data scientists that have created Large Language Models (LLM).

Some smaller models in Huggingface can be downloaded and run locally on your laptop. Others like OpenAI GPT can be used via an API key in just half a dozen lines of Python code!

Any individual can effectively have access to the same AI technology as the world’s multinational IT consultancies.

Any individual and IT company can build AI solutions, without actually doing any AI work.

Many AI deployments will be (are) more about traditional IT system integration, connecting an organisations document repositories to these model API’s via vector databases.

Transfer learning

There are 2 ways to use LLM’s. Firstly by fine tuning an existing model with your own content. This means organisational content becomes part of the model. This takes significant computer resources to train and can be time consuming and expensive.

Before embarking on this companies should in my opinion test state-of-the-art models like GPT- 4 as a baseline – as fine tuning may not be needed for the use case in mind. Models also become out of date quickly. For example if you ask Chat GPT-4 what the (geological age) base of the Barremian is, it gives the wrong answer because it was changed in 2022 after Chat GPT-4 was built. Hallucinations are also well documented.

One reason for fine tuning (or building from scratch) would be to create domain based text embeddings (word vectors). These allow discovery of new associations hidden in a large corpus of text (100,000 to millions of documents/articles), that can have both scientific and economic significance.

Prompt Engineering

Secondly is through prompt engineering, where you provide a subset of your content with a question/prompt and the LLM uses its semantic structure to answer using just the content you provided. Organisational content does not become part of the model and can be security trimmed so people only see what they are entitled to see. If you keep your vector database built from your text up to date, people will be able to find the latest facts and information.

In prompt engineering, the process of deciding what subset of your information should be provided (from your vector database) to the LLM, in order to answer your question, may well turn out to be the most critical part of the process. There is a lot more to Natural Language Processing than just LLM’s.

Types of questions

The criticality of text selection in prompt engineering is likely to depend on the nature of the question. Simple facts are probably less sensitive. If you are asking a question to summarise a situation it depends on how high level the summary is. If a summary should include contradictions and gaps, significant processing of the text prior to sending to the LLM API is most likely required.

Question levels from Bloom’s taxonomy (from low to high level).

1. Recall knowledge: Who, when

2. Understand: summarise, classify

3. Apply: interpret, analogies, solve, sketch

4. Analyse: compare, contrast, differentiate

5. Evaluate: judge, critique, contradictions, gaps

6. Create: investigate, design, author

Note that Gen AI can be applied at all levels 1-6 in Bloom’s taxonomy.

Summary

Applying Gen AI to your content is perhaps easier than some people may think. It is still just probabilistic statistics – however receiving an opinion from an algorithm that has read almost every document in your organisation might be useful.

Leave a comment

Website Powered by WordPress.com.

Up ↑