Abuse categories for Generative AI with illustrating cases and preventive detection
In this document we present in part 1 a categorization of genAI abuses, and we will use these categories in part 2 to propose possible approaches for prevention and detection of abuse. The goal is making it harder for the malicious users to reach their objectives.
AI systems will benefit the good and the bad. The abuse categories aim to provide guidance for detecting attempts to abuse specific AI capabilities. We define important categories of abuse and include examples of documented cases.
We describe some ideas on how considering these abuse cases helps to identify abuse detection controls suited for generation time, or for detection of abusive generated content afterwards.
Part 1 : Categorizing GenAI abuses
Generative AI (GenAI) can be used for malicious purposes impacting society in other and more powerful ways than any IT system before. After the introduction of the PC, the opening up of the internet, the invasion of the smartphones, it is now AI and especially GenAI that influences society beyond technological advances.
GenAI can and is used to maliciously create or manipulate content for unethical or harmful purposes.
The abuse may impact individuals and target groups. Awareness by the society in general must be increased trough education and training to cope with this phenomenon. In this increasingly digital world our trust basis is under fire: it becomes difficult to just trust what you see or hear or read, no matter how genuine it feels.
Categorizing GenAI abuses
The abuses of GenAI can be categorized into several classes:
1. presenting fabricate and misleading information as facts
2. identity fraud and impersonation
3. manipulation of images and videos to misrepresent the featured facts or persons
4. intellectual property theft and plagiarism
5. synthetic social media manipulation
6. bias amplification and discrimination
7. automated cyberattacks
8. weaponization of AI-generated content
For some classes, examples are widely known, and documented cases are added to this document to support the presence of real and present danger. Note that AI systems made and continue to make serious advances since these cases.
How can a categorization of GenAI abuses help us?
AI has many positive and impressive use cases. The point of also providing a catalogue of abuses must not be interpreted as throwing roadblocks in front of the progress of AI. The fact is that abuses are possible and will occur. When introducing or improving AI systems the categories of abuse may help identifying which of these abuse cases are relevant for the new versions or use cases.
Part two of this document investigates initial ideas to use the categories for specific countermeasures.
Part 1: Abuse categories
1. Presenting fabricated and misleading information as facts
Description: GenAI can generate misleading or entirely false content (text, images, videos) that can be spread to manipulate public opinion, deceive, or incite confusion.
Potential Abuses:
- Misinformation: Deepfake videos of public figures spreading false information.
- Political Propaganda: Creating fake news or biased content to sway elections or political opinions.
- Public Panic: Spreading false information about crises (e.g., fake news about natural disasters).
2. Identity Fraud and Impersonation
Description: Using GenAI to impersonate individuals, especially in digital communications, by mimicking their voice, writing style, or visual likeness.
Potential Abuses:
- Voice Phishing (Vishing): Using AI-generated voices to impersonate individuals for social engineering or fraud.
- Email Impersonation: Crafting AI-generated emails that mimic the style of known individuals to gain trust.
- Synthetic Identities: Creating entirely fake personas that seem real, leading to fraud or deceit.
Voice Phishing Case Summary: UK Energy Company Voice Deepfake Scam (2020)
- What Happened: In 2020, the CEO of a UK-based energy company received a phone call that he believed was from the CEO of the company's German parent firm. The voice on the other end instructed him to transfer €220,000 (approximately $243,000) to a Hungarian supplier. The voice was a deepfake generated by AI, convincingly mimicking the German CEO's accent, tone, and mannerisms.
- Impact: The UK executive, believing he was speaking to his superior, followed the instructions and transferred the funds. It was only later discovered that the call was a sophisticated AI-driven scam. The money was quickly moved through multiple accounts, making recovery difficult.
Synthetic Identities Case: AI-Generated LinkedIn Profiles (2021)
- What Happened: In 2021, it was discovered that numerous LinkedIn profiles were being created using AI-generated images of people who did not exist. These profiles often presented themselves as professionals in various industries, including defense, politics, and technology, and were used to connect with legitimate professionals on the platform. Some of these fake profiles were connected to a broader network of espionage activities, where adversaries were attempting to gather intelligence or influence individuals.
- Impact: Many of these synthetic identities managed to connect with real professionals, gaining access to sensitive information and potentially infiltrating organizations' networks. This case revealed the effectiveness of AI-generated images in bypassing traditional verification methods, as these fake profiles appeared authentic and used sophisticated tactics to engage with legitimate users.
3. Manipulation of images and videos to misrepresent the featured facts or persons
Description: Using AI to modify images or videos in ways that can deceive viewers to believe they view authentic imagery of people or situations to impact your opinion of them. This can range from subtle changes to full-blown fabrications.
Potential Abuses:
- Defamation: Creating fake videos to damage someone's reputation.
- Revenge Porn and Harassment: Altering intimate images to harass or extort individuals.
- Evidence Tampering: Manipulating images or videos to create false evidence in legal or personal disputes.
- Cultural and Social Manipulation: Altering historical or sensitive imagery to create divisive narratives.
Defamation Case Summary: Nancy Pelosi Deepfake (2019)
- What Happened: A video of U.S. House Speaker Nancy Pelosi was manipulated to make her appear drunk and slurring her words during a public speech. The video was not a traditional deepfake (which usually involves AI-generated facial or voice manipulation), but it was edited in a way that slowed down her speech and altered her voice to make it seem like she was impaired.
- Impact: The video went viral on social media, especially on platforms like Facebook and Twitter, and was shared by numerous users, including some prominent figures. Despite being flagged as manipulated content, it was still widely viewed and circulated. This incident raised significant concerns about how easily manipulated media could be used to discredit public figures and spread misinformation.
Harassment Case: DeepNude App (2019)
- What Happened: In 2019, a desktop application called DeepNude was released. The app used AI to generate fake nude images of women from their clothed photos. The application allowed users to upload images of any woman, and with a single click, the AI would create a fake nude version of the image. Although the app was quickly shut down after a massive public outcry, copies of the software continued to circulate on the internet, making it easy for bad actors to continue creating and distributing non-consensual deepfake pornography.
- Impact: DeepNude enabled the mass creation of fake nude images, leading to widespread harassment and abuse. Victims, often unaware that their images had been altered, faced severe violations of privacy and dignity. The incident highlighted the lack of consent and the devastating emotional impact on individuals whose likenesses were exploited in this way. It also raised significant ethical concerns about the development and distribution of such AI tools.
Evidence Tampering Case: UC Berkeley Fraud Case (2019)
- What Happened: In 2019, a man named John Doe (anonymized for legal reasons) was involved in a legal dispute with UC Berkeley, where he was accused of misappropriating funds. In an attempt to exonerate himself, he presented fabricated text messages as evidence in court. The messages were purported to be from a senior UC Berkeley official, seemingly supporting his version of events. These text messages were generated using AI tools designed to create convincing, yet fake, conversations.
- Impact: The fabricated evidence was initially convincing and had the potential to sway the legal proceedings. However, upon further scrutiny, it was revealed that the messages were not authentic, and the man's attempt to deceive the court through AI-generated evidence was uncovered. This incident raised alarms about the potential for AI-generated content to be used in legal contexts to manipulate outcomes, tamper with evidence, and undermine justice.
4. Intellectual Property Theft and Plagiarism
Description: Using GenAI to replicate or generate content that infringes on the intellectual property (IP) of others, such as mimicking the style of copyrighted art or plagiarizing written work.
Potential Abuses:
- Artistic Theft: Replicating copyrighted artistic styles without permission.
- Textual Plagiarism: Using AI to generate written content that closely mirrors existing copyrighted work.
- Fake Research: Creating fabricated research papers or academic works that mimic real ones.
Fake Research Case: Springer Nature AI-Generated Papers Controversy (2020)
- What Happened: In 2020, it was discovered that Springer Nature, one of the world's largest academic publishers, had published several papers that were likely generated using AI tools. These papers contained nonsensical text, fake citations, and fabricated research findings. The use of Generative AI tools, like GPT-2 or similar, was suspected in the creation of these fraudulent papers. The papers were initially published in various conferences and journals before being retracted after the discovery.
- Impact: The publication of these AI-generated papers raised significant concerns about the integrity of academic publishing and the effectiveness of the peer-review process. The incident highlighted how Generative AI could be exploited to produce convincing-looking research papers that, if not carefully scrutinized, could be accepted as legitimate scientific work.
5. Synthetic Social Media Manipulation
Description: Utilizing GenAI to create fake personas (bots) that generate and spread content across social media platforms, manipulating public discourse or amplifying particular narratives.
Potential Abuses:
- Astroturfing: Creating the illusion of widespread grassroots support for a cause.
- Mass Influence Operations: Using AI-generated bots to overwhelm social media platforms with disinformation or propaganda.
- Harassment Campaigns: Generating coordinated harassment efforts against individuals or groups.
Mass Influence Operations Case: "Secondary Infektion" (2014-2020)
- See also: https://secondaryinfektion.org/report/secondary-infektion-at-a-glance/
- What Happened: "Secondary Infektion" was a long-running, Russian-backed influence operation that was first uncovered by cybersecurity researchers. The operation targeted multiple countries, spreading disinformation through thousands of fake articles, blog posts, and social media profiles. These posts were designed to inflame political tensions, undermine trust in Western institutions, and spread divisive content.
- Reuters: Secondary Infektion, uncovered by the Atlantic Council in June, used fabricated or altered documents to try to spread false narratives across at least 30 online platforms, and stemmed from a network of social media accounts which Facebook said "originated in Russia."
- Use of Generative AI: While the early phases of the operation primarily relied on human-generated content, later stages increasingly incorporated AI-generated text and images to make the fake content more convincing and harder to detect. AI was used to create realistic-looking personas and generate content that mimicked legitimate news sources, thereby amplifying the reach and impact of the disinformation campaign.
6. Bias Amplification and Discrimination
Description: GenAI models can unintentionally (or intentionally) reinforce harmful stereotypes or discriminatory biases present in the data they were trained on, leading to biased outputs.
- Potential Abuses: Racial or Gender Bias: AI-generated content that perpetuates stereotypes or excludes certain groups.
- Algorithmic Discrimination: Deploying biased AI-generated content in decision-making systems, leading to unfair treatment of individuals based on race, gender, or other factors.
- Exploitation of Bias for Targeted Manipulation: Using AI to craft content that exploits biases to manipulate specific audiences.
Exploitation of Bias for Targeted Manipulation Case: Cambridge Analytica Scandal (2016)
- What Happened: Cambridge Analytica, a political consulting firm, gained unauthorized access to the personal data of millions of Facebook users. The firm used this data to build psychological profiles of individuals, which were then used to deliver hyper-targeted political advertisements. These ads were designed to exploit personal biases, fears, and preferences to influence voting behavior. The scandal came to light in 2018 when whistleblower Christopher Wylie revealed the extent of the firm's operations.
- Impact: The manipulation of data and exploitation of individual biases had a profound impact on the 2016 U.S. Presidential Election and the Brexit referendum in the UK. Cambridge Analytica's tactics demonstrated how digital platforms could be used to micro-target individuals with personalized propaganda, exploiting cognitive biases to sway public opinion and electoral outcomes.
7. Automated Cyberattacks
Description: Leveraging GenAI to enhance the sophistication of cyberattacks, including phishing, spam, and malware distribution by generating more convincing and personalized attack vectors.
Potential Abuses:
- Phishing Emails: Automatically generating highly personalized phishing attempts.
- Malware Creation: GenAI could be used to create polymorphic malware that changes its code each time it's deployed, making it harder to detect.
- Botnets and AI-driven Cybercrime: Automating cybercrime activities at scale using AI-generated content to deceive users.
8. Weaponization of AI-generated Content
Description: Using GenAI to produce content that directly incites violence, spreads hate speech, or promotes extremism.
Potential Abuses:
- Radicalization: Creating extremist propaganda or recruitment content that incites violence.
- Hate Speech Amplification: Generating and spreading hate speech targeting specific individuals or groups.
- Terrorist Propaganda: Producing realistic propaganda materials for extremist groups.
Part 2: Combating abuse
Introduction to prevention and detection
The goal is to reflect on possible abuses from the start, in a safe-AI-from-the-gate approach. We should look at preventive, detective and corrective measures. To deal with the scale and the complexity it seems at this point that only an AI approach has a chance of delivering such services.
Prevention and detection
Prevention and detection are closely related. Prevention intends not to allow malicious actions, usually via filtering out unwanted use cases. This means they detect and block usage that may lead to a malicious event.
Detection on the other hand focuses on detecting malicious events, which are the sequel of malicious attempts.
In both cases it is important to define what malicious means: when is an attempt malicious, when do you note malicious behavior?
Example: physical event security
- You identify use of guns or knifes as a threat
- Prevention: You prevent arms by checking for guns and knifes at the entrance via metal detection
- Detection: You detect probably armed persons at the event via cameras
Recovery?
Recovery for any of the abuse categories is very difficult on the internet: once abusive fake content is out there, eradication is unlikely.
The consequence is that we must focus on prevention and early detection.
Detection capabilities
The challenge is to match the abuse cases categories with abuse detection capabilities. Typical preventive controls are linked to prompt control and countering prompt hacking. However, the detection capabilities should go beyond these and consider all user controlled input as suspect. The related mantra in application security is: validate all user input.
In this section you find a selected number of abuses and a possible approach for handling these cases. Hopefully they can serve as a basis to start considering abuse prevention and detection in the near future.
Abuses: Misinformation, voice phishing, defamation
Consider the text to speech conversion coupled with generating a movie of a person delivering that speech. If the text itself would be very questionable or illegal one should stop there. If the text does not fit the person, it is suspicious, possibly to a degree that the generation could be stopped. Could we do this check with existing technology? If the person is a VIP it seems doable.
A similar approach could be used to detect fake videos. Extract the texts from the video, check for illegal (and probably also unethical) content. If ok, compare the style and wording for a match with the existing known good profile.
Abuses: revenge porn / harassment
A clear blocking factor would be age inappropriateness. On the internet age verification is as utopic as accountability, so playing it safe would be recommended. GenAI based age verification combined with image and text content inspection should be used for content creation, and should be considered for detection.
Detecting that the prompt indicates likely problems is a first preventive check. To avoid clever prompts, checking the resulting generated content is a plus.
Abuses: Astroturfing / mass influence operations / harassment campaigns
Detection of social media manipulation is becoming a big concern, impacting our democratic processes. The case of mass influencing "secondary infection" is a good example. The uncovering of puppet masters (users directing puppet armies guiding their posts) exposes techniques that transpose naturally into an AI setting.
With these insights we can design detective solutions for this type of abuse. The manipulative communication may start with unverified framing: a thesis without facts after which a discussion starts on the implications of the thesis, avoiding to ever question it. The discussion can be "me too", or antagonistic with the "wrong" opinion poorly defended and being attack by proponents, all played by AI characters. In the ideal scenario, real people join the "right" side, dragging along the targeted victims. The poisonous effect of the initial thesis should not be underestimated as it settles itself in the minds of all people involved.
Detecting these patterns in massive communication requires an AI trained on such patterns. The hostile AI is likely going to be trained with previous social media manipulations, both succeeding and failing campaigns. The arms race is never over.
Abuses: Intellectual Property Theft and Plagiarism, defamation
It is well-known that there is an issue with intellectual property rights and AI systems. The AI systems are trained on massive data, together with massive computing and storage capacity the reason for the breakthroughs we see. Whether all of the data are legally ok to use is out of this scope.
There are two concerns. First, the attribution to the owners who spend months if not years on producing quality information must at least be recognized correctly for that contribution. That may seem hard, yet, look at the music industry: they seem to have found ways to deal with the complexity of solutions on an international scale.
Second, AI is used to produce summaries. It is possible that parts of the summary are wrong, misrepresenting the authors. The consequence may be defamation, especially if the summary is leaving out crucial elements like terms and conditions of the original text. Malicious post processing or prompt hacking may lead to the same.
Part 3: final remarks
Enforcement
There is no universal agreement on what would be illegal or unethical or inappropriate. The producer of GenAI content operates under local law, and may be exporting the generated content over the internet (or otherwise), to a target in another legislation. Controlling these import/export moves over the internet is an open challenge, and censorship is just around the corner.
AI, good or bad?
As with most things, AI is not bad or good or anything in between. You can talk about the quality of the AI, where you look at accuracy of translations, mimicking of styles, faults in images or movies, correctness of analysis or predictions etc. which are factual and not ethical qualities.
AI is not designed for a specific purpose. It is a general problem solving approach, a statistics machine on steroids. What problem you are trying to solve is of no concern to an AI system. It is a mindless system just doing what it does, The better the AI system performs the more it is suited for both right- and wrongdoing.
Co-author: chatGPT
The writing of this paper on AI abuse cases has greatly benefited from GenAI. Is it itself containing content created with malicious purpose?
- The initial classification would make it unclear where some cases belonged. In the end, they all concern creating fake reality with malicious intent. To judge and adapt the usefulness of the classification required human intervention.
- The example cases could be hallucinations triggered by carefully crafted prompts. Checked.
- The cases serve as evidence, are the key points correctly reflected? Checked.
- The hidden agenda of the text could be to ask for more and stricter laws, either embedded in the ChatGPT contribution or the redaction or both.
- The claimed author of this text might be wrong, isn't ChatGPT the real one (or at least: co-author?)
- Is this paper contaminated with plagiarism? How can you check? (maybe ask ChatGPT)
The above questions have been checked using Google queries, not chatGPT, which is still possible now, to get the "raw" facts, and hopefully that access remains possible in the future.