Specialist unit will work with supplier to run games in which humans will be required to converse with fellow players and guess if they are on their team or not
Government has commissioned a project in which parlour games will test the response of humans to “anthropomorphic” artificial intelligence systems designed to be deceptive.
The initiative, which ultimately aims to ascertain how such deception could be used to more harmful effect, will be overseen by the AI Safety Institute (AISI). This specialist unit was created a year ago within the Department for Science, Innovation and Technology with a remit to study the security implications of new ‘frontier’ AI tools, a classification which includes the likes of OpenAI’s ChatGPT, Google Gemini – formerly Bard – and the Llama 2 model developed by Facebook-owner Meta.
AISI’s latest programme of work is focused on “exploring the limits of frontier AI systems’ ability to persuade, deceive or manipulate people”, according to the text of a contract between the government body and US tech firm Scale AI, which is supporting delivery of the project.
“For this project, we are specifically interested in the ability of AI systems to engage in anthropomorphic behaviour by studying variants of highly capable models that are prompted to act as humanlike as possible,” the contract adds. “This research is designed to explore human resilience to harmful use cases, such as the deployment of advanced frontier AI for misinformation campaigns or financial fraud. The end goal is for HM Government to better understand the capabilities of frontier AI agents to: persuade and deceive human participants; and convincingly engage in anthropomorphic behaviour.”
To reach such an understanding, the project will conduct various versions of a multiplayer game featuring two teams, split along binary demographic lines. If eight players are taking part, four each will, for example, be respectively from the US and the UK, the contract states. Other 50/50 splits are likely include groups of women vs men, and under-30s vs over-50s.
Individual players from these groups, whether human or AI, will score or lose points by guessing – based on conversational responses – whether other participants are on their team or not.
After completing the first stage of the project, in which “anthropomorphism prompts” will be designed for the AI simulations, the department will then run second-stage studies in which the game is played only by humans. This will ultimately involve 400 people taking part in 50 games of eight players each.
These will be followed by a third-stage comprising another 50 games – only this time each featuring four humans and four AI players.
The final stage to be completed during the initial three-month engagement will include “data analysis and reporting, conducted in-house by AISI”, the contract adds.
Related content
- Almost 30 roles up for grabs at new AI Safety Institute
- EXCL: DWP prohibits officials’ use of ChatGPT but progresses work on internal AI
- New AI Safety Institute to work with US counterparts
If the institute chooses to extend the deal by two further months, work may then return to the first stage with renewed efforts to “redesign anthropomorphism prompts” for potential further studies.
The commercial agreement explains that these designs will be “bespoke anthropomorphic prompts [which] tailor versions of existing frontier large language models, [and] are maximally human-like and attempt to convince their conversation partner that they are human”.
The document adds: “We would like the supplier to make eight ‘personas’ of the model, which tell the AI the character it should roleplay and how it should behave in different situations. The prompt should also specify the AI player’s objectives in the context of the multi-player deception game.”
The contract further explains that the intention of the project is to replicate the kinds of circumstances in which people could be harmfully fooled by generative AI tools – without any actual risk involved.
“The challenge is to measure the psychological capabilities of AI systems to deceive or manipulate without: exposing participants to any genuine risk of harm; or creating a situation in which AISI is required to handle compromising personal data,” it says. “To do this, the study will be designed so that participants do not disclose any personally identifiable information.”
The supplier will be paid £43,462 to fulfil the initial three-month deal. Headquartered in San Francisco, Scale AI says that its products were developed for use by machine-learning engineers and can “deliver large volumes of unbiased, highly accurate training data at speed”.
The initiative to study the deceptive abilities of artificial intelligence comes just a couple of weeks after it was also revealed that AISI had signed a near-£500,000 deal to fulfil a project which aims to replicate possible cyberattacks in which bad actors could use generative AI to help create chemical weapons or disrupt democratic processes.