I routinely call out people of writing in an LLM assisted fashion that clearly shows they have just been "vibe commenting". You know, just paste it in and copy the output without even thinking. The people who for some insane reason think they are making a genuine conversation with their copy pasting skills and $20/mo subscription. As if they are like the archive.whatever of the AI era. Because those comments are objectively terrible and contribute little. The ones with all the consultant sycophant speak and distracting prose that comes off the default prompt and RLHF.
But that's really what you're now enforcing: writing in an easily detectable LLM prose and voice. LLM detection is very difficult especially at small comment scale texts. There is never proof, only telltale phrases. How will this be enforced? What the heck even is "AI"?
The thing that really frustrates me is that I can't put tokens through a transformer in any way in editing my post? I can't have an LLM turn a bare link after a sentence into a [1]? I can't have it literally do nothing more than spell check in an LLM, but could with a rule based model? Or what about other LLMs or SLMs or classic NLP chained together? Or is it just the transformer?
And it is officially sanctioned that people ought to be keeping in the back of their mind "does this feel LLMish?" instead of "is this a good comment that contributes to the discussion?" Maybe LLM prose is so annoying and insufferably sycophantic that even if all the content and logic was sound, it still should be moderated completely out. But the entire technological form is profane and unclean?
I am 100% not interested in participating in a community that seeks to profile and police the technological infrastructure that its members use. I want my comments judged by the contributions they make and do not make to the discussion. If the LLM makes the comment better, it is good. If it makes it worse, it is bad.
Definitely agree. If you look at comments posted in places like Slashdot - is is basically ruined forever (and at one time it was quite excellent for real comments, from real experts and experienced people)
>But that's really what you're now enforcing: writing in an easily detectable LLM prose and voice.
That's a good start already. Don't let the impossibility of the perfect prevent implementing the good.
>I want my comments judged by the contributions they make and do not make to the discussion. If the LLM makes the comment better, it is good. If it makes it worse, it is bad.
Nope, it's all bad. If I wanted the comments of an LLM, I'd ask an LLM.
>I am 100% not interested in participating in a community that seeks to profile and police the technological infrastructure that its members use.
>I want my comments judged by the contributions they make and do not make to the discussion
There used to be a sort of gentleman's agreement that I could spare the time to read and judge your comment because you went through the effort of writing it.
I think a more generous interpretation of dang's comment is that it's fine to use LLMs / tools to fix grammatical errors / spellchecking, but a heavier pass where the prose, wording and tone is altered (even mildly) can create a 'slop ambience' over time, death by a thousand paper cuts.
There's a gradient here for sure, but it's getting clear that people using LLMs "only" for grammar and spelling fixes are underestimating how much else the LLMs are doing.
Slop ambience just sure sounds to me like HN is banning a prose style. I guess I just think that if this is how the rule will be enforced, that is how it should be written.
HN already does a decent amount of content-policing, which helps keep the discussion higher quality. I don't see a huge diversion here from the usual moderation.
Home can be sure the LLM is modifying just the prose style? Moreover, prose style is one of signal that convey information about what you are trying to transmit (unlike code, which is totally debatable if it should have meaning on its own).
> ollama benchmark ... for now, it's purely CPU, with DeepSeek R1 models tested based on the RAM available.
Then the results aren't comparable across different boards across RAM sizes. It'd be better to test a set of different model sizes on all and report -- if it didn't fit. But could you report the full ollama model name and version size slug for each?
> I pull Jeff's fork of the ollama-benchmark software
Hmm, I'm not sure if I'm missing something but that 1st comment is what I'm doing. I have 3 different sized Deepseek R1 models (1.5, 8, 16) and they run on each board that can handle them and then the data is reported.
For the 2nd, the file I grabbed initially was https://github.com/geerlingguy/ai-benchmarks/blob/main/obenc... - which I now notice wasn't modified in his repository, so I can check that out, but either way, the same version has been tested across everything thus far.
> Throughout this series, “we” refers to maderix (human) and Claude Opus 4.6 (by Anthropic) working as a pair. The reverse engineering, benchmarking, and training code were developed collaboratively
Sure, "collaboratively." Why would I ever trust a vibe coded analysis? How do I, a non expert in this niche, know that Opus isn't pulling a fast one on both of us? LLMs write convincing bullshit that even fools experts. Have you manually verified each fact in this piece? I doubt it. Thanks for the disclaimer, it saved me from having to read it.
Actually… no. Now that you mention it, and thanks for the interesting thought, the failure modes seem pretty similar to me.
Shoddy research / hallucination, tendency to lose the thread, lack of historical / background context… the failure modes are at least qualitatively similar.
Show me an LLM failure and I’ll show you a high profile journalist busted for the same thing. And those are humans who focus on these things!
Humans as a class are error prone but some humans in their respective fields are very very good. It's often not terribly hard to figure out based on resume and credentials who these folks are and as a shortcut we can look for markers in terms of terminology specifics confidence if it's less important like deciding what to read vs cancer care for your mom.
AI can trip all the right searches to fool these shortcuts whilst sometimes being entirely full of shit and they have no resume nor credentials to verify should we desire to check.
If you have such and vouch for it I can consider your trustworthiness rather than its. If you admit you yourself are reliant on it then this no longer holds
Humans also write endless amounts of convincing bullshit, and have done since time immemorial. False papers and faked results have been a growing scourge in academia before LLMs were a thing, and that's just counting the intentional fraud - the reproducibility crisis in science, especially medical and psychological science, affects even the best designed and well intentioned of studies.
Humans also make mistakes and assumptions while reverse engineering, so it will always need more engineers to go through the results, test things
Benchmarks all in part 2. Training progress in part 3(upcoming)
Also I think AI human collaboration is important for goal management.
Sure LLMs bullshit all the time, but that's the role of the human to create good goals and gating criteria to what constitutes as good.
I am saddened by your gullibility. Your first instinct is to trust this administration? Who has repeatedly showed utter contempt for the very idea of truth, the constitution, the rule of law, and science, merely because half of American voters are brainwashed?
This administration's arguments do not deserve to be steelmanned.
Because HNers are not so gullible to swallow and regurgitate this pretext. The Trump administration doesn't care about the people of Iran, any more than Bush cared about the Iraqi Kurds or Afghan women. Just a pretext for geopolitics.
But that's really what you're now enforcing: writing in an easily detectable LLM prose and voice. LLM detection is very difficult especially at small comment scale texts. There is never proof, only telltale phrases. How will this be enforced? What the heck even is "AI"?
The thing that really frustrates me is that I can't put tokens through a transformer in any way in editing my post? I can't have an LLM turn a bare link after a sentence into a [1]? I can't have it literally do nothing more than spell check in an LLM, but could with a rule based model? Or what about other LLMs or SLMs or classic NLP chained together? Or is it just the transformer?
And it is officially sanctioned that people ought to be keeping in the back of their mind "does this feel LLMish?" instead of "is this a good comment that contributes to the discussion?" Maybe LLM prose is so annoying and insufferably sycophantic that even if all the content and logic was sound, it still should be moderated completely out. But the entire technological form is profane and unclean?
I am 100% not interested in participating in a community that seeks to profile and police the technological infrastructure that its members use. I want my comments judged by the contributions they make and do not make to the discussion. If the LLM makes the comment better, it is good. If it makes it worse, it is bad.