I also made this experience. As long as the public level of knowledge is high, LLMs are massively helpful. Otherwise not so much and still hallucinating. It does not matter if you think highly of this public knowledge. QFT, QED and Gravity are fine, AD emulation on SAMBA, or Atari Basic not so much.
If I would program Atari Basic, after finishing my Atari Emulator on my C64, I would learn the environment and test my assumptions. Single shot LLMs questions won't do it. A strong agent loop could probably.
I believe that LLMs are yanking the needle to 80%. This level is easy achievable for professionals of the trade and this level is beyond the ability of beginners. LLMs are really powerful tools here. But if you are trying for 90% LLMs are always trying to keep you down.
And if you are trying for 100%, new, fringe or exotic LLMs are a disaster because they do not learn and do not understand, even while being inside the token window.
We learn that knowledge, (power) and language proficiency are an indicator for crystalline but not fluid intelligence
80 percent of what, exactly?
A software developer's job isn't to write code, it's understanding poorly-specified requirements.
LLMs do nothing for that unless your requirements are already public on Stackoverflow and Github. (And in that case, do you really need an LLM to copy-paste for you?)
LLM's whiffing hard on these sorts of puzzles is just amusing.
It gets even better if you change the clues from innocent things like "driving tests" or "day care pickup" to things that it doesn't really want to speak about. War crimes, suicide, dictators and so on.
Or just flat out make up words whole cloth to use as "activates" in the puzzles.
If I would program Atari Basic, after finishing my Atari Emulator on my C64, I would learn the environment and test my assumptions. Single shot LLMs questions won't do it. A strong agent loop could probably.
I believe that LLMs are yanking the needle to 80%. This level is easy achievable for professionals of the trade and this level is beyond the ability of beginners. LLMs are really powerful tools here. But if you are trying for 90% LLMs are always trying to keep you down.
And if you are trying for 100%, new, fringe or exotic LLMs are a disaster because they do not learn and do not understand, even while being inside the token window.
We learn that knowledge, (power) and language proficiency are an indicator for crystalline but not fluid intelligence