LLMs: Context Matters. A LOT.
April 4, 2024
Context matters a lot in LLMs. I would say it matters much more than anyone thinks. Anthropic released a new version of Claude with 3 models, in order of capability: Haiku, Sonnet, and Opus. Opus is very expensive at $15/1M input tokens and $75/$1M output tokens. In contrast, Haiku is 60x cheaper and benchmarks have shown it has comparable performance to GPT-4.
I recently made a CLI tool called howdoi where I make extensive use of Haiku. The API is simple:
howdoi "write a server in Go" -c <optional_context, can be URLs or local files>
Even without context, Haiku is useful, but with context, it really shines.
To see this, let's focus on Opus for a second:
λ ~/code/howdoi: howdoi "how do i write a web server with bun" -m "opus" > out 2024/04/04 12:52:13 Usage: Input Tokens: 17, Output Tokens: 576, Total Cost: $0.043455
This cost ~$0.04, and the output wasn't quite right.
It's not surprising it would be wrong, as Bun is new, and it's unlikely the base model would have knowledge of it.
Let's add some context and use Haiku:
λ ~/code/domluna.com/content: howdoi "how do i write a web server with bun" -c "https://bun.sh/guides/http/simple" 2024/04/04 12:56:22 Scraping the web page: https://bun.sh/guides/http/simple To write a web server with Bun, you can follow the example provided in the document. Here's a step-by-step guide: 1. **Import the Bun module**: In your JavaScript file, import the Bun module at the beginning of your code. import Bun from "bun"; 2. **Create the server**: Use the `Bun.serve()` method to create the server. This method takes an object as an argument, which defines the server's configuration. const server = Bun.serve({ port: 3000, // The port number the server will listen on fetch(request) { // The function that handles incoming requests return new Response("Welcome to Bun!"); }, }); In the example, the server listens on port 3000 and responds to all requests with the message "Welcome to Bun!".
We get a working solution for ~60x cheaper.
Context is incredibly important. The most powerful aspect of LLMs is their ability to do in-context learning. Of course, we want a base model that can make use of the context, otherwise it's useless.
Let's think about this a bit more: In order to equal the cost of 1k input tokens of Opus, we would need to have a context of 60k tokens in Haiku for a base cost of $0.015. This cost will outweigh the cost of the output. We would need 4k output tokens to break even at $0.02, and this would still be 2x cheaper than the original Opus query!
In-context learning will only get better, but it seems to me it has reached an inflection point whereby adding context is more worthwhile than using a larger model.