NCSC warns over possible AI prompt injection attacks

The UK’s National Cyber Security Centre (NCSC) has been discussing the damage that could one day be caused by the large language models (LLMs) behind such tools as ChatGPT, being used to conduct what are now being described as prompt injection attacks against unwitting victims.

The NCSC has by-and-large taken a rather sanguine approach to the potential for LLMs to be coopted into cyber crime, urging organisations not to take an overly alarmist approach. However, its technical experts have now shared their concerns over some issues with LLMs.

“Among the understandable excitement around LLMs, the global tech community still doesn’t yet fully understand LLM’s capabilities, weaknesses, and – crucially – vulnerabilities. While there are several LLM APIs already on the market, you could say our understanding of LLMs is still ‘in beta’, albeit with a lot of ongoing global research helping to fill in the gaps,” said the organisation’s technology director for platforms research, writing on the NCSC’s website.

The NCSC said that while LLMs are still more machine learning (ML) tools, they are starting to show some signs of more general artificial intelligence (AI) capabilities.

Work is ongoing to understand how this is happening, but it may be more useful for now to think of LLMs as a third type of entity rather than trying to apply our current understanding of machine learning or AI to them, said the team.

Semantics aside, said the team, research is now beginning to suggest that LLMs cannot distinguish between an instruction and data that has been provided to complete said instruction.

In one observed example, a prompt used to create Microsoft’s Bing LLM-powered chatbot was, with the appropriate coaxing, subverted to cause the bot to experience something close to an existential crisis.

Fundamentally, this is what a prompt injection attack is – not an attack against the underlying AI model, but an attack against the applications that are built on top of them. Amusing as causing a chatbot to experience clinical depression may sound, it heralds a far more dangerous set of scenarios.

For example, a bank that in that future deploys an LLM-powered assistant to speak to customers or help them with their finances could be coaxed by a threat actor to send a user a transaction request, with the reference hiding a prompt injection attack.

As an example in this instance, if a customer asked, ‘Am I spending more this month?’, the LLM could analyse their transactions, encounter the malicious one, and have the attack reprogram that to transfer the victim’s funds to the attacker’s account.

Complex as this may seem, some early developers of LLM-products have already seen attempted prompt injection attacks against their applications, albeit generally these have been either rather silly or basically harmless.

Research is continuing into prompt injection attacks, said the NCSC, but there are now concerns that the problem may be something that is simply inherent to LLMs. This said, some researchers are working on potential mitigations, and there are some things that can be done to make prompt injection a tougher proposition.

Probably one of the most important steps developers can take is to ensure they are architecting the system and its data flows so that they are happy with the worst-case scenario of what the LLM-powered app is allowed to do.

“The emergence of LLMs is undoubtedly a very exciting time in technology. This new idea has landed – almost completely unexpectedly – and a lot of people and organisations (including the NCSC) want to explore and benefit from it,” wrote the NCSC team.

“However, organisations building services that use LLMs need to be careful, in the same way they would be if they were using a product or code library that was in beta. They might not let that product be involved in making transactions on the customer’s behalf, and hopefully wouldn’t fully trust it yet. Similar caution should apply to LLMs.”

“The potential weakness of chatbots and the simplicity with which prompts can be exploited might lead to incidents like scams or data breaches,” commented ESET global cyber security advisor Jake Moore.

“However, when developing applications with security in mind and understanding the methods attackers use to take advantage of the weaknesses in machine learning algorithms, it’s possible to reduce the impact of cyber attacks stemming from AI and machine learning.

“Unfortunately, speed to launch or cost savings can typically overwrite standard and future proofing security programming, leaving people and their data at risk of unknown attacks. It is vital that people are aware of what they input into chatbots is not always protected,” he said.