I don't think you can store the cache on client given the thinking is server side and you only get summaries in your client (even those are disabled by default).
If they really need to guard the thinking output, they could encrypt it and store it client side. Later it'd be sent back and decrypted on their server.
But they used to return thinking output directly in the API, and that was _the_ reason I liked Claude over OpenAI's reasoning models.
For 4.7 it is no longer possible to disable adaptive thinking. Which is weird given the comment from Boris followed with silence (and closed github issue). So much for the transparency.
> Claude Opus 4.7 (claude-opus-4-7), adaptive thinking is the only supported thinking mode. Thinking is off unless you explicitly set thinking: {type: "adaptive"} in your request; manual thinking: {type: "enabled"} is rejected with a 400 error.
It's early days for Opus 4.7, but I will say this: Today, I had a conversation go well into the 200K token range (I think I got up to 275K before ending the session), and the model seemed surprisingly capable, all things beings considered.
Particularly when compared to Opus 4.6, which seems to veer into the dumb zone heavily around the 200k mark.
It could have just been a one-off, but I was overall pleased with the result.
I’m super envious. I can’t seem to do anything without a half a million tokens. I had to create a slash command that I run at the start of every session so the darn thing actually reads its own memory- whatever default is just doesn’t seem to do it. It’ll do things like start to spin up scripts it’s already written and stored in the code base unless I start every conversation with instructions to go read persistence and memory files. I also seem to have to actively remind it to go update those things at various parts of the conversation even though it has instructions to self update. All these things add up to a ton of work every session.
Something sounds very wrong with your setup or how you use it.
Is your CLAUDE.md barren?
Try moving memory files into the project:
(In your project's .claude/settings.local.json)
{ ...
"plansDirectory": "./plans/wip",
"autoMemoryDirectory": "/Users/foo/project/.claude/memory"
}
(Memory path has to be absolute)
I did this because memory (and plans) should show up in git status so that they are more visible, but then I noticed the agent started reading/setting them more.
This does kind of smell like the wrong way to use it. Not trying to self-promote here, but the experiences you shared really made me think I headed the right direction with my prompting framework ("projex" - I once made a post about it).
I straight up skip all the memory thing provided by harnesses or plugins. Most of my thread is just plan, execute, close - Each naturally produce a file - either a plan to execute, a execution log, a post-work walkthrough, and is also useful as memory and future reference.
Something seems wrong. A half-million tokens is almost five times larger than I allow even long-running conversations to get too. I've manually disabled the 1M context, so my limit is 200K, and I don't like it to get above 50%.
Is it... not aware of its current directory? Is its current directory not the root of your repo? Have you maybe disabled all tool use? I don't even know how I could get it to do what you're describing.
Maybe spend more time in /plan mode, so it uses tools and the Explore sub-agent to see what the current state of things is?
- Use the Plan mode, create a thorough plan, then hand it off to the next agent for execution.
- Start encapsulating these common actions into Skills (they can live globally, or in the project, per skill, as needed). Skills are basically like scripts for LLMs - package repeatable behavior into single commands.
If i had to guess i think you have probably overstuffed the context in hopes of moulding it and gotten worse outcomes because of that. I keep the default context _extremely_ small (as small as possible) and rely on invoked slash commands for a lot of what might have been in a CLAUDE.md before
There is a miniature of Prague from around 1830 by Antonín Langweil. He dedicated his all free time to finish it in a hope of making money for his daughters. Langweil never found a benefactor for his work and he died poor. Pretty tragic story.
I would recommend The Children of Noisy Village, my 3 year old loves it (she didn't like Pippi Longstocking) and we've read all chapters from all the books several times already.
That's what we do here (Czech republic), we don't take meds until the fever goes over 39°C (above 40 you are looking for trouble). You lay in bed and drink enough to compensate for sweating. My grandma would make you onion tea.
Interesting, my user experience with them is top notch (Prague). MacBooks, iPads, musical instruments, mountain bikes, really expensive stuff generally. The delivery slot is kinda long ("in the afternoon"), but the tracking info is spot on, they always call and so far I have never lost anything with them.
> The delivery slot is kinda long ("in the afternoon")
At least they tell you the day, not like in Germany. Gets put on the post today, then usually it either gets sorted overnight and delivered the next day, or there's another working day in between. The tracker used to say which it's gonna be after that overnight sorting (so if you check at 2am), but in the last year or so they've switched to telling you it's e.g. after the weekend some day and then surprise show up on Friday for example when you hadn't planned for anyone to be home, wasting the deliverer's time if you didn't decide to work from home that day spontaneously
reply