Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Claude tends to disregard "NEVER do X" quite often, but funnily enough, if you tell it "Always ask me to confirm before going X", it never fails to ask you. And you can deny it every time
 help



If it disregards "NEVER do" instructions, why would it honor your denial when it asks?

You mean like in this example? https://web.archive.org/web/20260313042512/https://gist.gith...

There is never a guarantee with GenAI. If you need to be sure, sandbox it.


There are plenty of examples in the RL training showing it how and when to prompt the human for help or additional information. This is even a common tool in the "plan" mode of many harnesses.

Conversely, it's much harder to represent a lack of doing something


Because it’s just fancy auto-complete.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: