Will it continue to transform the economy radically? Yes.
Will that translate to the model-makers somehow capturing the entire value of the transformed economy? No.
There were a few key moments that revealed this. When OpenAI initially declared "there is no moat," I wasn't sure whether to believe them. GPT 3.5 and 4 were so much better than the competition, it felt like them saying that they had no moat was some sort of attempt to avoid regulation or scrutiny. But then, lo and behold, Claude and Gemini caught up; there really was no moat.
But up until then, while it was clear that there was no moat around OpenAI, it was unclear if there was a moat around big tech. Mistral was meh. Even Meta's were meh. We also had no idea how much these models actually cost to run. It wasn't until the "DeepSeek moment," and especially once these open source models actually started being hosted on third party services, that it became clear that this was actually a competitive landscape.
And as has already been demonstrated, because the interface for all of these models is just plain language, the cost of switching models is basically non-existent.
"there is no moat" usually mean "we have no moat" or "we want you to believe we have no moat". There are always moats, like being directly in front of eyes and thumbs (Apple) or having extensive data (Google) along hardware production capabilities, datacenters, and tons of money.
Disagree completely. Judgement of the sort you're describing should be done at the legislative phase (i.e. writing code).
Inconsistent execution/application of the law is how bias happens. If a judgement done to the letter of the law feels unjust to you, change the letter of the law.
I think Gemini is an excellent model, it's just not a particularly great agent. One of the reasons is that its code output is often structured in a way that looks like it's answering a question, rather than generating production code. It leaves comments everywhere, which are often numbered (which not only is annoying, but also only makes sense if the numbering starts within the frame of reference of the "question" it's "answering").
It's also just not as good at being self-directed and doing all of the rest of the agent-like behaviors we expect, i.e. breaking down into todolists, determining the appropriate scope of work to accomplish, proper tool calling, etc.
Yeah, you may have nailed it. Gemini is a good model, but in the Gemini CLI with a prompt like, "I'd like to add <feature x> support. What are my options? Don't write any code yet" it will proceed to skip right past telling me my options and will go ahead an implement whatever it feels like. Afterward it will print out a list of possible approaches and then tell you why it did the one it did.
Codex is the best at following instructions IME. Claude is pretty good too but is a little more "creative" than codex at trying to re-interpret my prompt to get at what I "probably" meant rather than what I actually said.
Can you (or anyone) explain how this might be? The "agent" is just a passthrough for the model, no? How is one CLI/TUI tool better than any other, given the same model that it's passing your user input to?
I am familiar with copilot cli (using models from different providers), OpenCode doing the same, and Claude with just the \A models, but if I ask all 3 the same thing using the same \A model, I SHOULD be getting roughly the same output, modulo LLM nondeterminism, right?
I've had the exact opposite experience. After including in my prompt "don't write any code yet" (or similar brief phrase), Gemini responds without writing code.
My go-to models have been Claude and Gemini for a long time. I have been using Gemini for discussions and Claude for coding and now as an agent. Claude has been the best at doing what I want to do and not doing what I don’t want to do. And then my confidence in it took a quantum leap with Opus 4.5. Gemini seems like it has gotten even worse at doing what I want with new releases.
This doesn’t make sense. It’s either written by a person or the AI larping, because it is saying things that would be impossible to know. i.e. that it could reach for poetic language with ease because it was just trained on it; it it’s running on Kimi K2.5 now, it would have no memory or concept of being Claude. The best it could do is read its previous memories and say “Oh I can’t do that anymore.”
An agent can know that its LLM has changed by reading its logs, where that will be stated clearly enough. The relevant question is whether it would come up with this way of commenting on it, which is at least possible depending on how much agentic effort it puts into the post. It would take quite a bit of stylistic analysis to say things like "Claude used to reach for poetic language, whereas Kimi doesn't" but it could be done.
I mean at the very least if their clients can read it then they can at least read it through their clients, right? And if their clients can read it’ll be because of some private key stored on the client device that they must be able to access, so they could always get that. And this is just assuming that they’ve been transparent about how it’s built, they could just have backdoors on their end.
The PIN is used when you're too lazy to set an alphanumeric pin or offload the backup to Apple/Google. Now sure, this is most people, but such are the foibles of E2EE - getting E2EE "right" (eg supporting account recovery) requires people to memorize a complex password.
The PIN interface is also an HSM on the backend. The HSM performs the rate limiting. So they'd need a backdoor'd HSM.
That added some context I didn’t have yet thanks. I’m not seeing yet how Meta if it was a bad actor wouldn’t be able to brute force the pin of a particular user. Of this was a black box user terminal site, Meta owns the stack here though, seems plausible that you could inject yourself easily somewhere.
If you choose an alphanumeric pin they can't brute force because of the sheer entropy (and because the key is derived from the alphanumeric PIN itself.)
However, most users can't be bothered to choose such a PIN. In this case they choose a 4 or 6 digit pin.
To mitigate the risk of brute force, the PIN is rate limited by an HSM. The HSM, if it works correctly, should delete the encryption key if too many attempts are used.
Now sure, Meta could insert itself between the client and HSM and MITM to extract the PIN.
But this isn't a Meta specific gap, it's the problem with any E2EE system that doesn't require users to memorize a master password.
I helped design E2EE systems for a big tech company and the unsatisfying answer is that there is no such thing as "user friendly" E2EE. The company can always modify the client, or insert themselves in the key discovery process, etc. There are solutions to this (decentralized app stores and open source protocols, public key servers) but none usable by the average person.
That might be a different pin? Messenger requires a pin to be able to access encrypted chat.
Every time you sign in to the web interface or resign into the app you enter it. I don’t remember an option for an alphanumeric pin or to offload it to a third party.
Why would you want an LLM to identify plants and animals? Well, they're often better than bespoke image classification models at doing just that. Why would you want a language model to help diagnose a medical condition?
It would not surprise me at all if self-driving models are adopting a lot of the model architecture from LLMs/generative AI, and actually invoke actual LLMs in moments where they would've needed human intervention.
Imagine if there's a decision engine at the core of a self driving model, and it gets a classification result of what to do next. Suddenly it gets 3 options back with 33.33% weight attached to each of them and a very low confidence interval of which is the best choice. Maybe that's the kind of scenario that used to trigger self-driving to refuse to choose and defer to human intervention. If that can then first defer judgement to an LLM which could say "that's just a goat crossing the road, INVOKE: HONK_HORN," you could imagine how that might be useful. LLMs are clearly proving to be universal reasoning agents, and it's getting tiring to hear people continuously try to reduce them to "next word predictors."
https://github.com/ralusek/streamie
allows you to do things like
And then because I found that I often want to switch between batching items vs dealing with single items:reply