As AI Grows, so Does Our Knowledge of Its Weaknesses

September 20 2024 Blog

AI—it’s arguably the single largest paradigm shift in technology since the mobile boom almost 20 years ago. In fact, with the capability and capacity to redefine how organizations, entities, and individuals engage, collaborate, and solve problems, AI could be the largest technological advancement in human history.  

As the world jumps on the AI train, however, what else are we opening ourselves up to as we depart the station? Since the initial public release of ChatGPT 3.5 just 18 months ago, attackers and defenders alike have been trying to find AI’s weaknesses and vulnerabilities; and since then, a lot has been learned. And while there are many different nuanced layers to AI security, here are a few particular areas of note that should be on your radar as you create, interact with, or rely more on, AI solutions. 

Data Poisoning 

Where better to start than … the start. With AI, the output is only as good as the data put in to train the model.  

All AI solutions are “trained” over time on sets of data that teach the model, providing the foundation for how it responds to questions or input; however, if this training is poisoned, that is, contaminated with data that wasn’t intended for the model to be trained on, it can lead to unintended responses. Those unintended responses can then have a variety of consequences—from  poor performance when prompting the AI to downright inaccurate information when receiving responses.  

This poisonous effect can cause reputational harm for both the creator of the solution—as  it would no longer be trustworthy—as well as the user, if they fail to validate the info and then erroneously use it themselves. Of equal concern is that a malicious actor could also cause a model to respond in such a way that it pushes a particular message or prevents an intended response due to unexpected or conflicting information in the training data. Ensuring AI models are trained on properly curated data has therefore  quickly become a critical top priority for all creators.  

Prompt Injections 

When thinking about security risks, there is often the image of a figure, cloaked in darkness, using high level technical skills to compromise users, data, or services. The reality is, while that can happen, there is just as much need to protect AI models from everyday interactions that occur from standard users, who may just be trying to get a quick answer to a question they shouldn’t be asking. The essential threat of  prompt injections is they are meant to fool an AI model, causing it to act in a way that goes beyond its intended expectations. These attacks take a multitude of shapes and forms, from getting the model to return information beyond its “guardrails” to seeing if an attacker can reach restricted training or client data via a backdoor.  

According to the National Institute of Standards and Technology (NIST), there are two main forms of prompt injections: direct and indirect. Direct prompt injections come from a user prompting the AI, hoping to directly influence it. This can be done by cascading prompts about a subject to bypass a content filter or asking the model to give you instructions on something nefarious. Indirect prompts go a step further, often by using some piece of malicious content to cause the model to perform a task it wouldn’t otherwise. The most common example is sending an email to someone that has hidden instructions to perform a task—the recipient then puts this email into a summarization AI, and the AI completes the hidden task as part of its summary, unbeknownst to the recipient. 

Model Inversions 

Whenever a human element is involved, there will always be a need to safeguard data and intellectual property (IP). This becomes especially urgent with AI, given the wealth of knowledge that can be supplied in just seconds thanks to powerful new GPT models. This necessitates implementing plans to protect both users and the model they are interacting with. Model inversions attack these very needs and occur when a malicious actor, internal or external, gains access to the source code, or training data, of the model, allowing them to replicate it, poison it, or extract information from it.  

From the standpoint of an IP, model inversions are incredibly concerning. Model inversions can potentially wipe out the effort and money poured into research and development of an IP and in a matter of moments, render anything that made that model unique useless. Whether the cause is an internal actor with too many privileges, or an external one using a vulnerability or backdoor, they would now be able to create their own model in a fraction of the time and cost. As if that wasn’t concerning enough, this attack can also lead to … 

Data Exfiltration 

As with any web-based application, a lack of proper input validation and access control can lead to the threat of data exfiltration. Data exfiltration can be the end goal for several different attack types, and the consequences can be extremely harmful. In an age where data is everywhere, safeguarding it is the difference between being trusted and falling headfirst into a litany of compliance issues that can mean the loss of said trust, revenue, or even accreditation.  

The rise of AI usage brings those risks even more to the fore, as data that was previously locked away in web servers or applications, is now able to be queried against by anyone with access to the model, internal or external. As such, any information associated with a model needs to be securely protected from both threat avenues. This means having a very strong idea of the data that is being accessed, how securely it needs to be protected (such as PII/PHI), and then implementing according to measures to prevent actors from accessing and moving the data.  

What Does the Future Hold? 

What are we to do to protect against the risks that increased AI prevalence and usage will lead to? Well, even though goal posts may be shifting, to a degree, we already have the frameworks in place to mitigate these attacks. While there is a change in how, and where, these interactions are occurring, GPT models, at their core, remain web applications. Thus, the same security measures that protect standard web apps should apply to all deployed models. This includes measures such as performing input validation to the AI’s code. Doing so could erase many of the risk factors, such as external data poisoning, by ensuring that threat actors cannot create the backdoors needed to complete a poisoning attack or exfiltration of data. Additionally, running penetration tests and vulnerability scans will help to find potential holes before attackers do, hardening your model(s) even more.  

More pointedly, proper identity and access management will prevent inside attacks like model inversions and internal data exfiltration. There will also be a continued need to define your audience and know what data the model will need access to for RAG purposes. This would ensure that, in the case of a direct prompt injection or data exfiltration attack, sensitive or private data is cordoned off in areas that threat actors cannot possibly reach, thereby dampening the potential damage of the attack. For those peskier indirect prompt injections, Azure has released Prompt Shields, to strengthen prompt engineering, and mitigate the attack surface through the process of “Spotlighting” which determines if external inputs are untrustworthy.  

If you are interested in increasing the security of your AI model or solution or want to further secure your environment as AI prevalence increases, we invite you to reach out to us at Cloudforce to discover solutions that will help keep you protected.  

Sam Young
Author

Recommended for you.