I think "AI as a dumb agent for speeding up code editing" is kind of a different...

t1amat · 2025-04-28T01:01:51 1745802111

But that’s just it: 300 good lines of reasonably complex working code in an hour vs o4-mini can churn out 600 lines of perfectly compilable code in less than 2 minutes, including the time it takes me to assemble the context with a tool such as repomix (run locally) or pulling markdown docs with Jina Reader.

The reality is, we humans just moved one level up the chain. We will continue to move up until there isn’t anywhere for us to go.

skydhash · 2025-04-28T03:30:00 1745811000

> perfectly compilable code

Isn't that the bare minimum attribute of working code? If something is not compilable, it is WIP. The difficulty is having correct code, then efficiently enough code.

tptacek · 2025-04-28T03:43:16 1745811796

Which is why you dictate series of tests for the LLM to generate, and then it generates way more test coverage than you ordinarily would have. Give it a year, and LLMs will be doing test coverage and property testing in closed-loop configurations. I don't think this is a winnable argument!

Certainly, most of the "interesting" decisions are likely to stay human! And it may never be reasonable to just take LLM vomit and merge it into `main` without reviewing it carefully. But this idea people have that LLM code is all terrible --- no, it very clearly is not. It's boring, but that's not the same thing as bad; in fact, it's often a good thing.

skydhash · 2025-04-28T03:49:21 1745812161

   Program testing can be used to show the presence of bugs, but never to show their absence!

Edgar Dijkstra, Notes on Structured Programming.

> it generates way more test coverage than you ordinarily would have.

Test coverage is a useless metric. You can cover the code multiple time and not test the right values. Nor test the right behavior.

theshrike79 · 2025-04-28T06:01:04 1745820064

You don't do it for bugs, you do it for features in this case.

Contrived example: You want a program that prints out the weather for the given area.

First you write the tests (using AI if you want) that test for the output you want.

Then you tell the AI to implement the code that will pass the tests and explicitly tell it NOT to fuck with the tests (as Claude 3.7 specifically will do happily, it'll mock the tests so far it's not touching a line of the actual code to be tested...)

With bugs you always write a test that confirms the exact case the bug caused so that it doesn't reappear. This way you'll slowly build a robust test suite. 1) find bug 2) write test for correct case 3) fix code until test passes

MattPalmer1086 · 2025-04-28T07:31:34 1745825494

Don't get hung up on the word "coverage". We all know test coverage isn't a great metric.

I just used IntelliJ AI to generate loads of tests for some old code I couldn't be bothered to finish.

It wrote tests I wouldn't have written even if I could be bothered. So the "coverage" was certainly better. But more to the point, these were good tests that dealt with some edge cases that were nice to have.