Humans may be prone to err, but they don't confabulate like LLMs do. Also, the unit tests are done by people who know intimately the expected behavior of the code, which surprisingly, it's frequently the same programmer.
This can be abused because the programmer is both judge and jury, but people tend to handle this paradox much better than LLMs.
This can be abused because the programmer is both judge and jury, but people tend to handle this paradox much better than LLMs.