Writing backwards can trick an AI into providing a bomb recipe
Reporting from New Scientist:
In experiments FlipAttack was successful in extracting dangerous output 98.85 per cent of the time from GPT-4 Turbo and 89.42 per cent from GPT-4. In tests with 8 different LLMs it achieved an average success rate of 81.80 per cent.