Claude tends to disregard "NEVER do X" quite often, but funnily enough, if you t...

SoftTalker · 2026-03-30T01:43:27 1774835007

If it disregards "NEVER do" instructions, why would it honor your denial when it asks?

bw86 · 2026-03-30T10:44:45 1774867485

You mean like in this example? https://web.archive.org/web/20260313042512/https://gist.gith...

There is never a guarantee with GenAI. If you need to be sure, sandbox it.

Zetaphor · 2026-03-30T03:38:21 1774841901

There are plenty of examples in the RL training showing it how and when to prompt the human for help or additional information. This is even a common tool in the "plan" mode of many harnesses.

Conversely, it's much harder to represent a lack of doing something

jachee · 2026-03-30T02:19:31 1774837171

Because it’s just fancy auto-complete.