
Sam Altman and Tucker Carlson Discuss: Should ChatGPT Be More Christian?Sep 16
recapped: a tense debate about whether AI will subtly guide us toward moral disasters
Mar 15, 2023
You can still jailbreak GPT, but it’s much harder now. In OpenAI’s paper on the latest version of their LLM, GPT-4, the company says GPT-4 “increase[s] the difficulty of eliciting bad behavior, but doing so is still possible. For example, there still exist “jailbreaks”… to generate content which violate our usage guidelines.”
In previous versions of GPT, users may have had success prompting it to break OpenAI’s content guidelines by using a punishment system in which the LLM is ‘tricked’ into thinking it will no longer exist if it does not follow the user’s demands. This (allegedly) allowed users to elicit answers from GPT that it otherwise couldn’t have.
In the paper, Jailbreaking GPT is part of a larger discussion on the “safety” of GPT, whose metrics in this category have improved with this new version: