Are we building ethical AI Chatbots in our Scientific and Industry Sectors?

Are we building ethical AI chatbots in our scientific disciplines and industry sectors? There are many aspects to ethical AI for domain chatbot apps powered by Large Language Models (LLM). From a human centric perspective these include (1) privacy and data protection, (2) equality and non-discrimination and (3) transparency and explainability.

From a content and literature perspective these include respecting copyright. Some content is not open access so cannot be used when building LLM’s without permission especially if publicly distributed. Some content is open access but covered by Creative Commons licenses which may have restrictions. Such as no commercial use, and/or must be attributed with a link to the article and/or no redistribution of derivatives. Just because content is accessible on the Internet does not mean you can do anything with it.

The restriction of attribution has a significant impact on chatbot architecture. This likely means Retrieval Augmented Generation; so every assertion, answer or response generated by the chatbot is followed by a link(links) to the original source article(s) used to generate the text.

If someone was writing a scientific article and used your work, you would expect them to cite it. An AI Chatbot must operate in the same way otherwise its plagiarism.

This also helps address the human centric need for transparency and explainability. It helps people understand and verify the source of text generated by the chatbot, essential to help spot misinformation. This also aligns with UNESCO’s core values for ethical AI.

The data used needs careful thought. For example, should only peer reviewed content be used for training or underlying data? Are there biases favouring certain cultures in the data that may be amplified by a chatbot?

There is too much potentially relevant information for us to read. Domain AI chatbots offer a way to help us synthesize and interact with information in new ways, complementing traditional search engines. Chatbot generated text can appear fluent and informative, but frequently contains unsupported statements. Certain algorithmic biases may be present that could be hidden from the user. Highlighting the original article(s) and the passages of text side by side to the generated text may help verification with controls on article selection parameters.

Engaging with a chatbot (a probability distribution over words) is never going to be as deterministic as a keyword search engine so healthy scepticism is required. Just as you should not automatically believe the content on the first link you find in Google, you should not automatically believe anything a chatbot says.

Whatever the scientific discipline or industry sector, the ethical governance of AI involves more than technology decisions. Many things could technically be done; it does not mean we should do them. A questioning mind is probably the most useful tool.

Some useful references:

https://scholarlykitchen.sspnet.org/2024/02/21/guest-post-there-is-more-to-reliable-chatbots-than-providing-scientific-references-the-case-of-scopusai/

https://library.hkust.edu.hk/sc/retrieval-augmented-generation-based-academic-search-engines/

https://www.ox.ac.uk/news/2023-11-20-large-language-models-pose-risk-science-false-answers-says-oxford-study-0#:~:text=Large%20Language%20Models%20(LLMs)%20pose,at%20the%20Oxford%20Internet%20Institute.

https://www.unesco.org/en/articles/recommendation-ethics-artificial-intelligence

https://www.chathamhouse.org/sites/default/files/2023-01/2023-01-10-AI-governance-human-rights-jones.pdf

#artificialintelligence # #ethicalai #geoscience