I absolutely agree, although even that doesn't solve the root problem. The underlying LLM architecture is fundamentally insecure as it doesn't separate between instructions and pure content to read/operate on.
I wonder if it'd be possible to train an LLM with such architecture: one input for the instructions/conversation and one "data-only" input. Training would ensure that the latter isn't interpreted as instructions, although I'm not knowledgeable enough to understand if that's even theoretically possible: even if the inputs are initially separate, they eventually mix in the neural network. However, I imagine that training could be done with massive amounts of prompt injections in the "data-only" input to penalize execution of those instructions.
I think there are two distinct attack types for LLMs. Jailbreaking is what most people think of, and consists of structureing a prompt so the LLM does what the prompt says, even if it had prior context saying not to.
The other type of attack would be what I would call "induced hallucinations", where the attacker crafts data not to get the LLM to do anything the data says, but to do what the attacker wants.
This is a common attack to demonstrate on neural network based image classifiers. Start with a properly classified image, and a desired incorrect classification. Then, introduce visually imperceptible noise until the classifier reports it as your target classification. There is no data/instruction confusion here: it is all data.
The core problem is that neural networks are fairly linear (which is what makes it possible to construct efficient hardware for them). They are, of course, not actually linear functions, but close enough to make linear algebra based attacks feasible.
It is probably better to think of this sort of attack in term of crypto analysis, which frequently exploits linearity in cryptosystems.
The depth of LLM networks make this sort of attack difficult; but I don't see any reason to think you can add enough layers to make it impossible. Particularly given that there is other research showing structure across layers, with groupings of layers having identifiable functionality. This means it is probably possible to reason about attacking individual layers like an onion.
This problem isn't really unique to AI either. Human written code has a tendency to be vulnerable to a similar attack, where maliciously crafted data can exploit the processor to do anything (e.g buffer overflow into arbitrary code execution).
However, you may immediately see how using same input space essentially relies on the model itself to do the judgement which can't be ultimately trusted
> one input for the instructions/conversation and one "data-only" input
We learned so many years ago that separating code and data was important for security. It's such a huge step backwards that it's been tossed in the garbage.
I find Gemini is outstanding at reasoning (all topics) and architecture (software/system design). On the other hand, Gemini CLI sucks and so I end up using Claude Code and Codex CLI for agentic work.
However, I heavily use Gemini in my daily work and I think it has its own place. Ultimately, I don't see the point of choosing the one "best" model for everything, but I'd rather use what's best for any given task.
You don't need to propagate it, you just need to show the gradient of the current position alongside with the classical evaluation, to give more context to the viewers.
Why would they be? Cursor took an existing editor and added some AI features on top of it. Features that are enabled by a third party API with some good prompts, something easily replicable by any editor company. Current LLMs are a commodity.
You could be right, but I suspect that you're underestimating the degree to which GPT has become the Kleenex of the LLM space in the consumer zeitgeist.
Based on all of the behaviour psychology books I've read, Claude would have to introduce a model that is 10x better and 10x cheaper - or something so radically different that it registers as an entirely new thing - for it to hit the radar outside of the tech world.
I encourage you to sample the folks in your life that don't work in tech. See if any of them have ever even heard of Claude.
I don’t think people outside of tech hearing about OpenAI more than Claude is really indicative of much. Ask those same people how much they use an LLM and it’s often rare-to-never.
Also, in what way has OpenAI become the Kleenex of the LLM space? Anthropic, Google, Facebook have no gpts, nobody “gpts” something, nobody uses that companies “gpt”.
I would say perhaps OpenAI has become the Napster, MySpace, or Facebook of the LLM space. Time will tell how long they keep that title
But that does not prove anything. We don't know where we are on the AI-power scale currently. "Superintelligence", whatever that means, could be 1 year or 1000 years away at our current progress, and we wouldn't know until we reach it.
50 years ago we could rather confidently say that "Superintelligence" was absolutely not happening next year, and was realistically decades ago. If we can say "it could be next year", then things have changed radically and we're clearly a lot closer - even if we still don't know how far we have to go.
A thousand years ago we hadn't invented electricity, democracy, or science. I really don't think we're a thousand years away from AI. If intelligence is really that hard to build, I'd take it as proof that someone else must have created us humans.
Umm, customary, tongue-in-cheek reference to McCarthy's proposal for a 10 person research team to solve AI in 2 months (over the Summers)[1]. This was ~70 years ago :)
Not saying we're in necessarily the same situation. But it remains difficult to evaluate effort required for actual progress.
Ancestry does more than genetic analysis. Their claim to fame is their tools to search through old public records to help one build their genealogy/family tree.
I wonder if it'd be possible to train an LLM with such architecture: one input for the instructions/conversation and one "data-only" input. Training would ensure that the latter isn't interpreted as instructions, although I'm not knowledgeable enough to understand if that's even theoretically possible: even if the inputs are initially separate, they eventually mix in the neural network. However, I imagine that training could be done with massive amounts of prompt injections in the "data-only" input to penalize execution of those instructions.
reply