
Did Gaza Reunite Wikipedia’s Founders?Nov 6
jimmy wales and larry sanger criticize the 'gaza genocide' page for violating neutrality standards, leading to 'pandemonium' as some editors refer wales for disciplinary action
Mar 15, 2023

You can still jailbreak GPT, but it’s much harder now. In OpenAI’s paper on the latest version of their LLM, GPT-4, the company says GPT-4 “increase[s] the difficulty of eliciting bad behavior, but doing so is still possible. For example, there still exist “jailbreaks”… to generate content which violate our usage guidelines.”
In previous versions of GPT, users may have had success prompting it to break OpenAI’s content guidelines by using a punishment system in which the LLM is ‘tricked’ into thinking it will no longer exist if it does not follow the user’s demands. This (allegedly) allowed users to elicit answers from GPT that it otherwise couldn’t have.
In the paper, Jailbreaking GPT is part of a larger discussion on the “safety” of GPT, whose metrics in this category have improved with this new version: