IBL News | New York
Anthropic, the creator of the Claude model, introduced its API prompt caching this month.
This feature remembers the context between API calls and allows developers to avoid repeating prompts.
Prompt caching is available in public beta on Claude 3.5 Sonnet and Claude 3 Haiku, with support for Claude 3 Opus coming soon.
According to the company, this feature also significantly reduces costs and latency by up to 90%.
For Claude 3.5 Sonnet, writing a prompt to be cached will cost $3.75 per 1 million tokens (MTok), but using a cached prompt will cost $0.30 per MTok.
The base price of an input to the Claude 3.5 Sonnet model is $3/MTok, so paying a little more upfront is expected to yield a 10x savings increase.
Claude 3 Haiku users will pay $0.30/MTok to cache and $0.03/MTok when using stored prompts.
However, as AI influencer Simon Willison noted on X, Anthropic’s cache only has a 5-minute lifetime and is refreshed upon each use.
Anthropic suggests using prompt caching on:
- “Conversational agents: Reduce cost and latency for extended conversations, especially those with long instructions or uploaded documents.
- Coding assistants: Improve autocomplete and codebase Q&A by keeping a summarized version of the codebase in the prompt.
- Large document processing: Incorporate complete long-form material, including images, in your prompt without increasing response latency.
- Detailed instruction sets: Share extensive lists of instructions, procedures, and examples to fine-tune Claude’s responses. Developers often include a few examples in their prompts, but with prompt caching, you can perform even better by including dozens of diverse examples of high-quality outputs.
- Agentic search and tool use: Enhance performance for scenarios involving multiple rounds of tool calls and iterative changes, where each step typically requires a new API call.
- Talk to books, papers, documentation, podcast transcripts, and other long-form content: Bring any knowledge base alive by embedding the entire document(s) into the prompt and letting users ask it questions.”
Simon Last, Co-founder at Notion, said that his company’s AI assistant added prompt caching to reduce costs, increase speed, and optimize internal operations.