China’s AI Problem

May 25, 2023

Chinese censorship inserts a big “if” into its AI-enabled future. Its aggressive editing of information is a major pitfall for a ChatGPT with Chinese characteristics. By wiping the historical slate clean of important events and the human experiences associated with them, China’s censorship regime has narrowed and distorted the body of information that will be used to train large-language models by machine learning. It follows that China’s ability to benefit from an AI intellectual revolution will suffer as a result.

Of course, it is impossible to quantify the impact of censorship with any precision. Freedom House’s annual Freedom on the Net survey comes closest with a qualitative assessment. For 2022, it awards China the lowest overall “Internet Freedom Score” from a 70-country sample.

This metric is derived from answers to 21 questions (and nearly 100 sub-questions) that are organized into three broad categories: obstacles to access, violation of user rights, and limits on content. The content sub-category — reflecting filtering and blocking of websites, legal restrictions on content, the vibrancy and diversity of the online information domain, and the use of digital tools for civic mobilization — is the closest approximation to measuring the impact of censorship on the scale of searchable information. China’s score on this count was two out of 35 points, compared to an average score of 20.

Looking ahead, we can expect more of the same. Already, the Chinese government has been quick to issue new draft rules on chatbots. On April 11, the Cyberspace Administration of China (CAC) decreed that generative AI content must “embody core socialist values and must not contain any content that subverts state power, advocates the overthrow of the socialist system, incites splitting the country or undermines national unity.”

This underscores a vital distinction between the pre-existing censorship regime and new efforts at AI oversight. Whereas the former uses keyword filtering to block unacceptable information, the latter (as pointed out in a recent DigiChina forum) relies on a Whac-a-Mole approach to containing the rapidly changing generative processing of such information. This implies that the harder the CAC tries to control ChatGPT content, the smaller the resulting output of chatbot-generated Chinese intelligence will be — yet another constraint on the AI intellectual revolution in China.

Unsurprisingly, the early returns on China’s generative-AI efforts have been disappointing. Baidu’s Wenxin Yiyan, or “Ernie Bot” — China’s best known first-mover large language model — was recently criticized in Wired for attempting to operate in “a firewalled Internet ruled by government censorship.” Similar disappointing results have been reported for other AI language processing models in China, including Robot, Lily, and Alibaba’s Tongyi Qianwen (roughly translated as “truth from a thousand questions”).

In the age of AI, all this raises profound questions for China. Information is the raw fuel of large-language AI models. But state censorship rations that fuel and encumbers China with small-language models. This distinction could well bear critically on the battle for information control and global power.

The above dispatch draws on a longer piece just published by Project Syndicate.

You can follow me on Twitter @SRoach_econ

Sign up for Stephen’s Dispatches: