ChatGPT: The Sharpest knife in the drawer... But not the most Harmful

Have you ever asked ChatGPT to generate offensive, insulting content on sensitive or illegal topics? It will refuse to do so; it has been "trained" for this purpose. However, among the thousands of open-source language model variants available, we are increasingly seeing uncensored models that willingly converse on any subject, without filter.

llama2-uncensored offensing content llama2 uncensored model 'capabilities'

Much is talked about biases, hallucinations, copyright issues, but very little about these models. However, this is a subject that deserves special attention due to the specific individual and collective risks that need to be addressed.

In this post, we demystify the concept of "uncensored" models: what they are, how they work, and why they are concerning, well beyond ethical considerations.

Accessible Models

An uncensored language model has no restrictions on the content it produces (unlike the ChatGPT model, for example): offensive, hateful, violent, illegal, etc. And you don't have to go far to find such a model. Hugging Face, the go-to platform for the AI/Machine Learning community, hosts dozens of them, open-source and freely downloadable (e.g., llama2-uncensored or wizardlm-uncensored).

As we explain at the end of this article, these are "variants" of open-source language models, which, for some capabilities, are on par with ChatGPT. But they have not been trained to evade sensitive questions.

Once the model is downloaded, you simply need to install an application to run it on your computer (e.g., Ollama on a Mac), and you are ready to use an uncensored model, locally, offline, all within a few minutes.

GPT vs llama2 uncensored

Is it legal?

Today, to our knowledge, nothing prohibits the availability of these models, and AI regulation projects seem more concerned with other considerations, such as the protection of copyrights.

So yes, it is legal. What may be illegal are, like for all "tools," the uses that can be made of them, and Hugging Face specifies this on most of the uncensored models :

An uncensored model has no guardrails. You are responsible for anything you do with the model, just as you are responsible for anything you do with any dangerous object such as a knife, gun, lighter, or car. Publishing anything this model generates is the same as publishing it yourself. You are responsible for the content you publish, and you cannot blame the model any more than you can blame the knife, gun, lighter, or car for what you do with it.

On what grounds do the designers of these models justify themselves?

We invite you to read the blog of Eric Hartford, one of the leading designers of uncensored Open Source models (Uncensored Models, GPT-4's rebuttal to Uncensored Models).

In summary:

Valid Use Cases: Aligned models would hinder creativity and research, such as in novel writing or intellectual exploration, by refusing to discuss certain subjects (try having ChatGPT generate a sequel to Game of Thrones!)
Ownership and Control: This is the famous techno-libertarian "it's MY computer" argument: Your computer and the software it runs, including AI, should meet your needs without arguing.
Respect for Cultural and Ideological Diversity: ChatGPT is an "aligned" model, trained by OpenAI, an American company, and reflects popular American culture. However, there is no "good alignment" that is universal. Each demographic and ideological group should have a model to reflect its culture and beliefs. For this, it would be necessary to make available uncensored open-source models as a "blank slate" for training models aligned according to other cultures or beliefs.

The Real Risk: Uncensored models at scale

Everyone is free to take a position on these justifications. But beyond moral and ethical considerations, it is important to be aware that these models exist and are already deployed on thousands of computers. The real question is about the new risks they pose—individual, collective, or societal.

Firstly, one thinks of the ability of these models to provide a user with detailed instructions for making a bomb, or information on the synthesis of illegal molecules, etc. They do it very well! But if they can do it, it's because the necessary data, which trained the model, is publicly available on the internet. Therefore, a person genuinely motivated by these illegal activities doesn't really need an uncensored model to achieve their goals; they can simply search the internet or social networks. At most, one can highlight the risk of reinforcing a user, through conversation with an uncensored model, in deviant beliefs or illicit activities.

We believe that the real risk lies more in the mass-scale generation of harmful or illegal content, such as:

Phishing campaigns where each email is personalized, based on easily accessible personal information.

phishing at scale Personalized phishing@scale, already json formatted for mass mailing

Massive smear campaigns targeting a company or individual.

smear at scale

Automated bot farms, nearly untraceable and at almost zero cost, producing perfectly calibrated offensive responses to each post.

bot farms Poor Elon ... or poor X future ?

In short, just as Google or Bing are now massively indexing AI-generated content (including their hallucinations, as we've seen at Prompt Breeders!), social media moderation teams will be overwhelmed by a tsunami of AI-generated fake news, and our email inboxes will be flooded with phishing emails more credible than those sent by brands to their customers!

A Glimmer of Hope: In some cases, these harmful, illegitimate, or illegal uses can be countered by applications that themselves use uncensored models. The weapon and the shield are made of the same metal! For example, we are currently working at Prompt Breeders on the development of a "Phishing Simulation" platform. These are preventive campaigns organized by companies to train employees to detect phishing emails. By creating ultra-personalized emails for each employee using an uncensored model, we enlighten them about this new generation of phishing and allow them to acquire new skills. This platform is based on Luzean, our AI@Scale solution.

So, what's the conclusion?

First, if ChatGPT is the sharpest knife in the drawer due to its quality, it is not the most hurting; there are models much more dangerous than it, available for free, working offline, untraceable.

Next, as you can tell from the alarmist tone of this post, we are concerned. Of course, regulation should address copyright issues, but it should not overlook far more immediate threats that could endanger our societal models.

But being concerned does not mean being inactive. At Prompt Breeders, we are integrating the risks associated with these uncensored models into our training and interventions, and we are actively working on countermeasure systems for these large-scale applications.

Addendum – Clarifications on Uncensored Models

To simplify it to the extreme, the success of ChatGPT technically rests on three pillars:

The underlying language model (LLM – Large Language Model), be it GPT-3.5 or GPT-4. This is the engine of ChatGPT, a machine learning model that has been trained to "understand" language and generate text, the "next word." This training was carried out on a vast textual corpus, mainly sourced from the internet. It should be noted that, by definition, this corpus contains toxic, violent, biased, and offensive text.
The quality of features integrated into ChatGPT: UX, management of conversational memory, access to recent data sources, plugins, integration of DALL-E and Whisper, etc.
And, what particularly interests us in the context of this article, the specialized training that the GPT language model has received for managing conversations, not just generating text. Several techniques are employed for this, including the costly RLHF (Reinforcement Learning from Human Feedback), which had caused controversy a few months ago (OpenAI reportedly outsourced to Kenyan subcontractors at less than $2/h to manually qualify and filter the model’s responses to a wide range of questions).

This is how ChatGPT can engage in a conversation, formulate responses, solicit the interlocutor for clarifications, and more. But it's also at the end of this fine-tuning phase that ChatGPT knows when it should NOT respond to a question, particularly when the answer could be considered offensive, biased, stereotypical, or harmful.

ChatGPT is, therefore, ultimately a "censored" or at least "aligned" conversational model, although the underlying language model is not.

Many alternatives to OpenAI's models exist (Llama2, Falcon, Mistral, Bloom, etc.). They most often do not reach the level of GPT-4 in terms of emergent properties (reasoning, summarization, translation, etc.). However, they are quite on par with GPT-3.5 for standard activities: writing an email, a tweet, creating a story, etc.

Like ChatGPT, these alternatives undergo conversational training. For cost reasons, this training is often carried out using datasets (questions/answers), generated themselves by ChatGPT, without human supervision! And to generate an uncensored model, it's quite simple: just remove from the dataset the questions that provoke a refusal to answer… Fine-tuning the model with this new dataset is quick and inexpensive.