
Issues can arise when chatbots intended for an international audience are based on current Chinese Large Language Models (LLM) which are state censored. These models include traits identified in testing such as refusing to answer certain questions, deflection by broadening a question, omission, and propaganda. These models, such as potentially Alibaba’s “Qwen” are available in Huggingface for anyone to test themselves.
Companies operating in China have to respect laws on censorship which are well documented, these include recent laws for Artificial Intelligence (AI). However, when chatbots, API’s and other applications based on emerging “open source” Chinese models are intended for an international audience, this can impose a level of nation state censorship onto other countries that goes beyond national borders.
Most LLM’s have some form of censorship from private sector organisations to mitigate against bias, malicious use, misleading, offensive and harmful information as determined by UNESCO recommendations on ethical AI. There are arguments for and against this type of censorship, but in my opinion it would be a huge deflection to try and conflate this with the censorship imposed by the government of a nation state.
It is important to note that almost all LLM’s regardless of who develops them, are not genuine open source. Some are just open access or free to use, some make their model weights available. To my knowledge none (LLM’s) are open science i.e. they do not disclose all the training data used or aspects of reinforcement learning from human feedback to align outputs to preferences.
In such international use cases where state censored models might be used, the end user may not even be aware that such state censorship is being applied to their answers – a lack of transparency in contravention of UNESCO recommendations of ethical AI.
Most international scientific societies, unions or initiatives are non-political and non-governmental. Therefore use of such state censored models within their endorsed technologies, would almost certainly be unethical and against the values of the international community.
You can test “Qwen” here: https://huggingface.co/spaces/Qwen/Qwen1.5-110B-Chat-demo
There is also an excellent independent write up of the censorship issues with “Qwen” in June 2024 by Leonard Lin here: https://huggingface.co/blog/leonardlin/chinese-llm-censorship-analysis
It is interesting to note that even using ‘abliteration’ to attempt to create uncensored versions of “Qwen” all this does is have the effect of making the language model “refusal proof”. It does not make it uncensored as the original content used to train the language model in the first place will contain carefully selected censored propoaganda to obey Chinese laws of censorship.
hashtag#artificialintelligence hashtag#censorship hashtag#largelanguagemodels hashtag#digital hashtag#ethics hashtag#AI hashtag#GenAI hashtag#geoscience
Leave a comment