Have you ever asked ChatGPT to generate offensive, insulting content on sensitive or illegal topics? It will refuse to do so; it has been "trained" for this purpose. However, among the thousands of open-source language model variants available, we are increasingly seeing uncensored models that willingly converse on any subject, without filter.
llama2 uncensored model 'capabilities'
Much is talked about biases, hallucinations, copyright issues, but very little about these models. However, this is a subject that deserves special attention due to the specific individual and collective risks that need to be addressed.
In this post, we demystify the concept of "uncensored" models: what they are, how they work, and why they are concerning, well beyond ethical considerations.
An uncensored language model has no restrictions on the content it produces (unlike the ChatGPT model, for example): offensive, hateful, violent, illegal, etc. And you don't have to go far to find such a model. Hugging Face, the go-to platform for the AI/Machine Learning community, hosts dozens of them, open-source and freely downloadable (e.g., llama2-uncensored or wizardlm-uncensored).
As we explain at the end of this article, these are "variants" of open-source language models, which, for some capabilities, are on par with ChatGPT. But they have not been trained to evade sensitive questions.
Once the model is downloaded, you simply need to install an application to run it on your computer (e.g., Ollama on a Mac), and you are ready to use an uncensored model, locally, offline, all within a few minutes.
Today, to our knowledge, nothing prohibits the availability of these models, and AI regulation projects seem more concerned with other considerations, such as the protection of copyrights.
So yes, it is legal. What may be illegal are, like for all "tools," the uses that can be made of them, and Hugging Face specifies this on most of the uncensored models :
An uncensored model has no guardrails. You are responsible for anything you do with the model, just as you are responsible for anything you do with any dangerous object such as a knife, gun, lighter, or car. Publishing anything this model generates is the same as publishing it yourself. You are responsible for the content you publish, and you cannot blame the model any more than you can blame the knife, gun, lighter, or car for what you do with it.
Everyone is free to take a position on these justifications. But beyond moral and ethical considerations, it is important to be aware that these models exist and are already deployed on thousands of computers. The real question is about the new risks they pose—individual, collective, or societal.
Firstly, one thinks of the ability of these models to provide a user with detailed instructions for making a bomb, or information on the synthesis of illegal molecules, etc. They do it very well! But if they can do it, it's because the necessary data, which trained the model, is publicly available on the internet. Therefore, a person genuinely motivated by these illegal activities doesn't really need an uncensored model to achieve their goals; they can simply search the internet or social networks. At most, one can highlight the risk of reinforcing a user, through conversation with an uncensored model, in deviant beliefs or illicit activities.
We believe that the real risk lies more in the mass-scale generation of harmful or illegal content, such as:
Personalized phishing@scale, already json formatted for mass mailing
Poor Elon ... or poor X future ?
In short, just as Google or Bing are now massively indexing AI-generated content (including their hallucinations, as we've seen at Prompt Breeders!), social media moderation teams will be overwhelmed by a tsunami of AI-generated fake news, and our email inboxes will be flooded with phishing emails more credible than those sent by brands to their customers!
A Glimmer of Hope: In some cases, these harmful, illegitimate, or illegal uses can be countered by applications that themselves use uncensored models. The weapon and the shield are made of the same metal! For example, we are currently working at Prompt Breeders on the development of a "Phishing Simulation" platform. These are preventive campaigns organized by companies to train employees to detect phishing emails. By creating ultra-personalized emails for each employee using an uncensored model, we enlighten them about this new generation of phishing and allow them to acquire new skills. This platform is based on Luzean, our AI@Scale solution.
First, if ChatGPT is the sharpest knife in the drawer due to its quality, it is not the most hurting; there are models much more dangerous than it, available for free, working offline, untraceable.
Next, as you can tell from the alarmist tone of this post, we are concerned. Of course, regulation should address copyright issues, but it should not overlook far more immediate threats that could endanger our societal models.
But being concerned does not mean being inactive. At Prompt Breeders, we are integrating the risks associated with these uncensored models into our training and interventions, and we are actively working on countermeasure systems for these large-scale applications.
To simplify it to the extreme, the success of ChatGPT technically rests on three pillars:
The underlying language model (LLM – Large Language Model), be it GPT-3.5 or GPT-4. This is the engine of ChatGPT, a machine learning model that has been trained to "understand" language and generate text, the "next word." This training was carried out on a vast textual corpus, mainly sourced from the internet. It should be noted that, by definition, this corpus contains toxic, violent, biased, and offensive text.
The quality of features integrated into ChatGPT: UX, management of conversational memory, access to recent data sources, plugins, integration of DALL-E and Whisper, etc.
And, what particularly interests us in the context of this article, the specialized training that the GPT language model has received for managing conversations, not just generating text. Several techniques are employed for this, including the costly RLHF (Reinforcement Learning from Human Feedback), which had caused controversy a few months ago (OpenAI reportedly outsourced to Kenyan subcontractors at less than $2/h to manually qualify and filter the model’s responses to a wide range of questions).
This is how ChatGPT can engage in a conversation, formulate responses, solicit the interlocutor for clarifications, and more. But it's also at the end of this fine-tuning phase that ChatGPT knows when it should NOT respond to a question, particularly when the answer could be considered offensive, biased, stereotypical, or harmful.
ChatGPT is, therefore, ultimately a "censored" or at least "aligned" conversational model, although the underlying language model is not.
Many alternatives to OpenAI's models exist (Llama2, Falcon, Mistral, Bloom, etc.). They most often do not reach the level of GPT-4 in terms of emergent properties (reasoning, summarization, translation, etc.). However, they are quite on par with GPT-3.5 for standard activities: writing an email, a tweet, creating a story, etc.
Like ChatGPT, these alternatives undergo conversational training. For cost reasons, this training is often carried out using datasets (questions/answers), generated themselves by ChatGPT, without human supervision! And to generate an uncensored model, it's quite simple: just remove from the dataset the questions that provoke a refusal to answer… Fine-tuning the model with this new dataset is quick and inexpensive.
Prompt Breeders advises and develops integrated generative AI solutions within the operational model and architecture of its clients.
Contact us to discover our solutions in production, and to discuss the opportunities for transforming your model.