AI Agents in Production — Day 2: Caching Strategies

Observability told us what the agent is doing. Now let’s stop paying for the same work twice.

In a typical agent session, the LLM asks the same questions repeatedly:

“What issues are open?” — asked 3 times with slightly different wording
“Show me issue #42” — fetched 5 times because context windows reset
list_issues(owner="foo", repo="bar", state="open") — called 4 times identically

Without caching, every call burns tokens and API quota. With caching, identical and semantically similar requests get served from cache in milliseconds.

What We’re Building#

1
┌──────────────┐     ┌───────────────────┐     ┌─────────┐
2
│   Agent      │────▶│  Cache Middleware  │────▶│   LLM   │
3
│   Runtime    │     │                    │     │   API   │
4
│              │     │  ┌───────────┐    │     └─────────┘
5
│  Tool Call   │     │  │ Semantic  │    │     ┌─────────┐
6
│  LLM Request │     │  │ Cache     │    │────▶│  Redis  │
7
│  Resource    │     │  │ (Embed)   │    │     └─────────┘
8
│  Read        │     │  └───────────┘    │
9
└──────────────┘     │  ┌───────────┐    │
10
                     │  │ Exact     │    │
11
                     │  │ Cache     │    │
12
                     │  │ (TTL)     │    │
13
                     │  └───────────┘    │
14
                     │  ┌───────────┐    │
15
                     │  │ Tool      │    │
16
                     │  │ Result    │    │
17
                     │  │ Cache     │    │
18
                     │  └───────────┘    │
19
                     └───────────────────┘

Three layers:

Layer	Strategy	Cache key	TTL	Hit rate
Exact match	Key = serialized input	`tool:list_issues:{"owner":"foo"...}`	30-60s	Low (10-15%)
Semantic	Key = embedding similarity	`embed:query:text` → nearest neighbor	5-30min	High (30-50%)
Tool result	Key = tool + params hash	`result:issues:sha256(params)`	10-60s	Medium (20-30%)

Step 1: Install Dependencies#

1
cd github-issue-mcp
2
npm install ioredis
3
npm install --save-dev @types/ioredis

You’ll need Redis running locally:

1
docker run -d --name redis-cache -p 6379:6379 redis:7-alpine

Step 2: The Core Cache Interface#

All cache layers implement the same interface, so they’re swappable and composable.

`src/cache/interface.ts`#

1
// src/cache/interface.ts — Generic cache interface
2

3
export interface CacheEntry<T> {
4
  value: T;
5
  cachedAt: number;
6
  expiresAt: number;
7
  hitCount: number;
8
}
9

10
export interface CacheStats {
11
  hits: number;
12
  misses: number;
13
  size: number;
14
  avgTtlRemaining: number;
15
}
16

17
export interface CacheLayer {
18
  get<T>(key: string): Promise<CacheEntry<T> | null>;
19
  set<T>(key: string, value: T, ttlMs: number): Promise<void>;
20
  del(key: string): Promise<void>;
21
  clear(pattern?: string): Promise<number>;
22
  stats(): Promise<CacheStats>;
23
}

Step 3: Exact Match Cache (Redis-Backed)#

`src/cache/exact-cache.ts`#

1
// src/cache/exact-cache.ts — TTL-based exact key match via Redis
2

3
import Redis from "ioredis";
4
import { CacheEntry, CacheLayer, CacheStats } from "./interface.js";
5

6
export class ExactCache implements CacheLayer {
7
  private redis: Redis;
8
  private prefix: string;
9
  private internalHits = 0;
10
  private internalMisses = 0;
11

12
  constructor(redis?: Redis, prefix = "exact:") {
13
    this.redis = redis || new Redis();
14
    this.prefix = prefix;
15
  }
16

17
  async get<T>(key: string): Promise<CacheEntry<T> | null> {
18
    const data = await this.redis.get(`${this.prefix}${key}`);
19
    if (!data) {
20
      this.internalMisses++;
21
      return null;
22
    }
23

24
    const entry: CacheEntry<T> = JSON.parse(data);
25
    if (Date.now() > entry.expiresAt) {
26
      await this.redis.del(`${this.prefix}${key}`);
27
      this.internalMisses++;
28
      return null;
29
    }
30

31
    this.internalHits++;
32
    entry.hitCount++;
33
    // Update hit count in background (fire and forget)
34
    this.redis.set(
35
      `${this.prefix}${key}`,
36
      JSON.stringify(entry),
37
      "PX",
38
      entry.expiresAt - Date.now()
39
    );
40
    return entry;
41
  }
42

43
  async set<T>(key: string, value: T, ttlMs: number): Promise<void> {
44
    const entry: CacheEntry<T> = {
45
      value,
46
      cachedAt: Date.now(),
47
      expiresAt: Date.now() + ttlMs,
48
      hitCount: 0,
49
    };
50
    await this.redis.set(
51
      `${this.prefix}${key}`,
52
      JSON.stringify(entry),
53
      "PX",
54
      ttlMs
55
    );
56
  }
57

58
  async del(key: string): Promise<void> {
59
    await this.redis.del(`${this.prefix}${key}`);
60
  }
61

62
  async clear(pattern?: string): Promise<number> {
63
    const keys = await this.redis.keys(`${this.prefix}${pattern || "*"}`);
64
    if (keys.length === 0) return 0;
65
    return await this.redis.del(...keys);
66
  }
67

68
  async stats(): Promise<CacheStats> {
69
    const keys = await this.redis.keys(`${this.prefix}*`);
70
    return {
71
      hits: this.internalHits,
72
      misses: this.internalMisses,
73
      size: keys.length,
74
      avgTtlRemaining: 0, // Would need to scan each key
75
    };
76
  }
77
}

Usage:#

1
const exactCache = new ExactCache();
2

3
// Store tool result for 30 seconds
4
const key = `tool:list_issues:${JSON.stringify({ owner: "foo", repo: "bar" })}`;
5
await exactCache.set(key, issues, 30_000);
6

7
// Retrieve later
8
const cached = await exactCache.get<typeof issues>(key);
9
if (cached) {
10
  console.log(`Cache HIT (${cached.hitCount} previous hits)`);
11
  return cached.value;
12
}

Step 4: Semantic Cache (Embedding-Based)#

Semantic cache uses embeddings to find similar queries. When the LLM asks “show open bugs” and later asks “list all open issues”, semantic cache recognizes they’re the same intent.

`src/cache/semantic-cache.ts`#

1
// src/cache/semantic-cache.ts — Embedding similarity cache
2

3
import Redis from "ioredis";
4
import { CacheEntry, CacheLayer, CacheStats } from "./interface.js";
5

6
interface SemanticEntry {
7
  key: string;
8
  embedding: number[];
9
  value: string; // Serialized JSON
10
  cachedAt: number;
11
  expiresAt: number;
12
  hitCount: number;
13
}
14

15
export class SemanticCache implements CacheLayer {
16
  private redis: Redis;
17
  private prefix: string;
18
  private similarityThreshold: number;
19
  private internalHits = 0;
20
  private internalMisses = 0;
21

22
  // Simple hash-based embedding for demonstration.
23
  // In production, use OpenAI/text-embedding-3-small or voyage-large-2.
24
  private static readonly EMBEDDING_DIM = 64;
25

26
  constructor(
27
    redis?: Redis,
28
    prefix = "semantic:",
29
    similarityThreshold = 0.85
30
  ) {
31
    this.redis = redis || new Redis();
32
    this.prefix = prefix;
33
    this.similarityThreshold = similarityThreshold;
34
  }
35

36
  /**
37
   * Simple character-n-gram hash embedding.
38
   * Not as good as real embeddings, but demonstrates the concept.
39
   * Replace with an actual embedding API in production.
40
   */
41
  static textToEmbedding(text: string): number[] {
42
    const dim = SemanticCache.EMBEDDING_DIM;
43
    const vector = new Array(dim).fill(0);
44

45
    // Normalize: lowercase + remove punctuation
46
    const cleaned = text.toLowerCase().replace(/[^a-z0-9\s]/g, "");
47

48
    // Character trigrams as features
49
    for (let i = 0; i < cleaned.length - 2; i++) {
50
      const trigram = cleaned.slice(i, i + 3);
51
      let hash = 0;
52
      for (let j = 0; j < trigram.length; j++) {
53
        hash = ((hash << 5) - hash) + trigram.charCodeAt(j);
54
        hash = hash & hash; // Convert to 32-bit int
55
      }
56
      const idx = Math.abs(hash) % dim;
57
      vector[idx] += 1;
58
    }
59

60
    // L2 normalize
61
    const magnitude = Math.sqrt(vector.reduce((s, v) => s + v * v, 0));
62
    if (magnitude === 0) return vector;
63
    return vector.map((v) => v / magnitude);
64
  }
65

66
  /**
67
   * Cosine similarity between two vectors.
68
   */
69
  static cosineSimilarity(a: number[], b: number[]): number {
70
    if (a.length !== b.length) return 0;
71

72
    let dotProduct = 0;
73
    let normA = 0;
74
    let normB = 0;
75

76
    for (let i = 0; i < a.length; i++) {
77
      dotProduct += a[i] * b[i];
78
      normA += a[i] * a[i];
79
      normB += b[i] * b[i];
80
    }
81

82
    const magnitude = Math.sqrt(normA) * Math.sqrt(normB);
83
    return magnitude === 0 ? 0 : dotProduct / magnitude;
84
  }
85

86
  /**
87
   * Set a semantic cache entry.
88
   * Query text is embedded and stored alongside the response.
89
   */
90
  async set<T>(text: string, value: T, ttlMs: number): Promise<void> {
91
    const entry: SemanticEntry = {
92
      key: text,
93
      embedding: SemanticCache.textToEmbedding(text),
94
      value: JSON.stringify(value),
95
      cachedAt: Date.now(),
96
      expiresAt: Date.now() + ttlMs,
97
      hitCount: 0,
98
    };
99

100
    // Store by exact key for direct lookup
101
    await this.redis.set(
102
      `${this.prefix}${text}`,
103
      JSON.stringify(entry),
104
      "PX",
105
      ttlMs
106
    );
107

108
    // Also add to similarity index (stored as sorted set)
109
    await this.redis.zadd(
110
      `${this.prefix}index`,
111
      Date.now(),
112
      text
113
    );
114
  }
115

116
  /**
117
   * Find a semantically similar cached entry.
118
   * Returns the closest match above the similarity threshold.
119
   */
120
  async get<T>(query: string): Promise<CacheEntry<T> | null> {
121
    const queryEmbedding = SemanticCache.textToEmbedding(query);
122

123
    // Get all active cache entries from the index
124
    const entries = await this.redis.zrangebyscore(
125
      `${this.prefix}index`,
126
      Date.now() - 3600_000, // Last hour
127
      Date.now()
128
    );
129

130
    let bestMatch: { similarity: number; entry: SemanticEntry } | null = null;
131

132
    for (const key of entries) {
133
      const data = await this.redis.get(`${this.prefix}${key}`);
134
      if (!data) continue;
135

136
      const entry: SemanticEntry = JSON.parse(data);
137
      if (Date.now() > entry.expiresAt) {
138
        await this.redis.del(`${this.prefix}${key}`);
139
        continue;
140
      }
141

142
      const similarity = SemanticCache.cosineSimilarity(
143
        queryEmbedding,
144
        entry.embedding
145
      );
146

147
      if (similarity > (bestMatch?.similarity || 0)) {
148
        bestMatch = { similarity, entry };
149
      }
150
    }
151

152
    if (bestMatch && bestMatch.similarity >= this.similarityThreshold) {
153
      this.internalHits++;
154
      return {
155
        value: JSON.parse(bestMatch.entry.value) as T,
156
        cachedAt: bestMatch.entry.cachedAt,
157
        expiresAt: bestMatch.entry.expiresAt,
158
        hitCount: bestMatch.entry.hitCount + 1,
159
      };
160
    }
161

162
    this.internalMisses++;
163
    return null;
164
  }
165

166
  // ——— Interface methods ———
167
  async del(key: string): Promise<void> {
168
    await this.redis.del(`${this.prefix}${key}`);
169
    await this.redis.zrem(`${this.prefix}index`, key);
170
  }
171

172
  async clear(pattern?: string): Promise<number> {
173
    const keys = await this.redis.keys(`${this.prefix}${pattern || "*"}`);
174
    const indexKeys = keys.filter(k => !k.includes(":index"));
175
    if (indexKeys.length === 0) return 0;
176

177
    // Remove from both stores
178
    const keysToDel = indexKeys.map(k => k.replace(`${this.prefix}`, ""));
179
    await this.redis.zrem(`${this.prefix}index`, ...keysToDel);
180
    return await this.redis.del(...indexKeys);
181
  }
182

183
  async stats(): Promise<CacheStats> {
184
    const indexSize = await this.redis.zcard(`${this.prefix}index`);
185
    return {
186
      hits: this.internalHits,
187
      misses: this.internalMisses,
188
      size: indexSize,
189
      avgTtlRemaining: 0,
190
    };
191
  }
192
}

Real embedding API alternative:#

1
// In production, replace textToEmbedding with a real embedding API:
2
export class RealEmbeddingCache extends SemanticCache {
3
  private embedApiKey: string;
4

5
  constructor(apiKey: string, redis?: Redis) {
6
    super(redis);
7
    this.embedApiKey = apiKey;
8
  }
9

10
  override async set<T>(text: string, value: T, ttlMs: number): Promise<void> {
11
    // Get real embedding from API
12
    const embedding = await this.fetchEmbedding(text);
13
    // ... store with embedding from API instead of hash
14
  }
15

16
  private async fetchEmbedding(text: string): Promise<number[]> {
17
    const response = await fetch("https://api.openai.com/v1/embeddings", {
18
      method: "POST",
19
      headers: {
20
        "Authorization": `Bearer ${this.embedApiKey}`,
21
        "Content-Type": "application/json",
22
      },
23
      body: JSON.stringify({
24
        model: "text-embedding-3-small",
25
        input: text,
26
      }),
27
    });
28
    const data = await response.json() as any;
29
    return data.data[0].embedding;
30
  }
31
}

Semantic cache in action:#

1
Query 1: "list all open issues in owner/repo"
2
  ↓ embed → [0.12, -0.34, 0.87, ...]
3
  ↓ store
4

5
Query 2: "show me the open bugs in owner/repo"
6
  ↓ embed → [0.11, -0.33, 0.85, ...]
7
  ↓ cosine similarity → 0.94 (above 0.85 threshold)
8
  ↓ CACHE HIT — return cached response

Step 5: Tool Result Cache#

Tool results have different caching rules than LLM responses. A list_issues call returns data that changes (new issues get created). A get_issue call returns stable data (issue details don’t change often).

`src/cache/tool-cache.ts`#

1
// src/cache/tool-cache.ts — Tool-specific caching rules
2

3
import { ExactCache } from "./exact-cache.js";
4
import { ToolTracer } from "../telemetry/tool-tracer.js";
5

6
/**
7
 * Caching rules per tool.
8
 * Short TTL for mutable data, longer for immutable.
9
 */
10
const TOOL_CACHE_RULES: Record<string, { ttlMs: number; staleWhileRevalidate: boolean }> = {
11
  // Read-only tools — safe to cache
12
  "get_issue":      { ttlMs: 60_000, staleWhileRevalidate: true },   // 1 min
13
  "list_issues":    { ttlMs: 30_000, staleWhileRevalidate: true },   // 30 sec
14
  "search_issues":  { ttlMs: 30_000, staleWhileRevalidate: true },   // 30 sec
15
  "list_issues_paginated": { ttlMs: 30_000, staleWhileRevalidate: true },
16

17
  // Mutating tools — never cache
18
  "create_issue":   { ttlMs: 0, staleWhileRevalidate: false },
19
  "update_issue":   { ttlMs: 0, staleWhileRevalidate: false },
20
  "batch_label_issues": { ttlMs: 0, staleWhileRevalidate: false },
21
};
22

23
export class ToolResultCache {
24
  private cache: ExactCache;
25

26
  constructor(cache?: ExactCache) {
27
    this.cache = cache || new ExactCache();
28
  }
29

30
  /**
31
   * Generate a deterministic cache key from tool name and params.
32
   * Sorted keys ensure {a:1,b:2} == {b:2,a:1}.
33
   */
34
  private makeKey(tool: string, params: Record<string, unknown>): string {
35
    const sorted = Object.keys(params)
36
      .sort()
37
      .reduce((acc: Record<string, unknown>, key: string) => {
38
        acc[key] = params[key];
39
        return acc;
40
      }, {});
41
    return `tool:${tool}:${JSON.stringify(sorted)}`;
42
  }
43

44
  /**
45
   * Try to get cached result. Returns null if not cached or TTL expired.
46
   * If staleWhileRevalidate, returns stale data + refreshes in background.
47
   */
48
  async get<T>(
49
    tool: string,
50
    params: Record<string, unknown>
51
  ): Promise<{ value: T; stale: boolean } | null> {
52
    const rules = TOOL_CACHE_RULES[tool];
53
    if (!rules || rules.ttlMs === 0) return null; // Don't cache
54

55
    const key = this.makeKey(tool, params);
56
    const entry = await this.cache.get<T>(key);
57

58
    if (!entry) return null;
59

60
    const age = Date.now() - entry.cachedAt;
61
    if (age > rules.ttlMs && rules.staleWhileRevalidate) {
62
      // Return stale data, caller refreshes in background
63
      return { value: entry.value, stale: true };
64
    }
65

66
    return { value: entry.value, stale: false };
67
  }
68

69
  /**
70
   * Store tool result in cache.
71
   * Also invalidates related caches (e.g., create_issue invalidates list_issues).
72
   */
73
  async set<T>(tool: string, params: Record<string, unknown>, value: T): Promise<void> {
74
    const rules = TOOL_CACHE_RULES[tool];
75
    if (!rules || rules.ttlMs === 0) return;
76

77
    const key = this.makeKey(tool, params);
78
    await this.cache.set(key, value, rules.ttlMs);
79

80
    // Invalidate related caches on mutation
81
    if (tool === "create_issue" || tool === "update_issue") {
82
      await this.cache.clear("tool:list_issues:*");
83
      await this.cache.clear("tool:search_issues:*");
84
    }
85
  }
86
}

Stale-while-revalidate pattern:#

1
const cache = new ToolResultCache();
2

3
async function cachedToolCall<T>(
4
  tool: string,
5
  params: Record<string, unknown>,
6
  freshFn: () => Promise<T>
7
): Promise<T> {
8
  const cached = await cache.get<T>(tool, params);
9

10
  if (cached) {
11
    if (!cached.stale) {
12
      return cached.value; // Fresh enough, return immediately
13
    }
14
    // Stale — refresh in background, return stale data now
15
    freshFn().then(fresh => cache.set(tool, params, fresh));
16
    return cached.value;
17
  }
18

19
  // Cache miss — call fresh, store result
20
  const fresh = await freshFn();
21
  await cache.set(tool, params, fresh);
22
  return fresh;
23
}

Step 6: Multi-Layer Cache Orchestrator#

Combines all three layers into one interface with fallthrough.

`src/cache/cache-orchestrator.ts`#

1
// src/cache/cache-orchestrator.ts — Multi-layer cache with fallthrough
2

3
import { ExactCache } from "./exact-cache.js";
4
import { SemanticCache } from "./semantic-cache.js";
5
import { ToolResultCache } from "./tool-cache.js";
6

7
export type CacheLayerType = "exact" | "semantic" | "tool";
8

9
export interface OrchestratorStats {
10
  exact: { hits: number; misses: number };
11
  semantic: { hits: number; misses: number };
12
  tool: { hits: number; misses: number };
13
}
14

15
export class CacheOrchestrator {
16
  public exact: ExactCache;
17
  public semantic: SemanticCache;
18
  public tool: ToolResultCache;
19

20
  constructor() {
21
    this.exact = new ExactCache();
22
    this.semantic = new SemanticCache();
23
    this.tool = new ToolResultCache();
24
  }
25

26
  /**
27
   * Try exact cache first, then semantic, then fall through to fresh.
28
   */
29
  async getCachedOrFetch<T>(
30
    input: { text?: string; tool?: string; params?: Record<string, unknown> },
31
    fetchFn: () => Promise<T>,
32
    options?: { semanticThreshold?: number }
33
  ): Promise<{ value: T; from: CacheLayerType }> {
34
    const { text, tool, params } = input;
35

36
    // 1. Try exact match (fastest)
37
    if (text) {
38
      const exact = await this.exact.get<T>(text);
39
      if (exact) {
40
        return { value: exact.value, from: "exact" };
41
      }
42
    }
43

44
    // 2. Try tool result cache
45
    if (tool && params) {
46
      const cached = await this.tool.get<T>(tool, params);
47
      if (cached && !cached.stale) {
48
        return { value: cached.value, from: "tool" };
49
      }
50
    }
51

52
    // 3. Try semantic cache
53
    if (text) {
54
      const semantic = await this.semantic.get<T>(text);
55
      if (semantic) {
56
        return { value: semantic.value, from: "semantic" };
57
      }
58
    }
59

60
    // 4. Cache miss — fetch fresh
61
    const fresh = await fetchFn();
62

63
    // Store in all applicable caches
64
    if (text) {
65
      await this.exact.set(text, fresh, 30_000);
66
      await this.semantic.set(text, fresh, 300_000);
67
    }
68
    if (tool && params) {
69
      await this.tool.set(tool, params, fresh);
70
    }
71

72
    return { value: fresh, from: "exact" as CacheLayerType }; // "fresh" = stored as exact
73
  }
74

75
  /**
76
   * Invalidate caches after a mutation.
77
   */
78
  async invalidateOnMutation(tool: string, params: Record<string, unknown>) {
79
    // Exact: clear matching entries
80
    if (tool === "create_issue" || tool === "update_issue") {
81
      await this.exact.clear("tool:list_issues:*");
82
      await this.exact.clear("tool:search_issues:*");
83
      await this.exact.clear("tool:get_issue:*");
84

85
      // Semantic: only invalidate entries containing affected repo
86
      const repoHint = params.repo ? `*${params.repo}*` : "*";
87
      await this.semantic.clear(repoHint);
88
    }
89
  }
90

91
  async stats(): Promise<OrchestratorStats> {
92
    const [exactStats, semanticStats, toolStats] = await Promise.all([
93
      this.exact.stats(),
94
      this.semantic.stats(),
95
      this.tool["cache"].stats(),
96
    ]);
97
    return {
98
      exact: { hits: exactStats.hits, misses: exactStats.misses },
99
      semantic: { hits: semanticStats.hits, misses: semanticStats.misses },
100
      tool: { hits: toolStats.hits, misses: toolStats.misses },
101
    };
102
  }
103
}

Step 7: Integration with the Agent#

Wire the cache orchestrator into the instrumented tool tracer from Day 1.

`src/server-with-cache.ts`#

1
import { CacheOrchestrator } from "./cache/cache-orchestrator.js";
2
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
3
import { ToolTracer } from "./telemetry/tool-tracer.js";
4
import { AgentLogger } from "./telemetry/logger.js";
5

6
const server = new McpServer({ name: "github-issue-manager", version: "1.0.3" });
7
const cache = new CacheOrchestrator();
8

9
function instrumentedAndCachedTool(
10
  name: string,
11
  description: string,
12
  schema: any,
13
  handler: (args: any) => Promise<any>
14
) {
15
  server.tool(name, description, schema, async (args) => {
16
    const logger = new AgentLogger();
17
    const tracer = new ToolTracer(logger);
18
    const serializedQuery = JSON.stringify(args);
19

20
    try {
21
      const { value, from } = await cache.getCachedOrFetch(
22
        {
23
          text: `${name}: ${serializedQuery}`,
24
          tool: name,
25
          params: args,
26
        },
27
        // Fresh fetch function
28
        async () => {
29
          const { result } = await tracer.traceToolCall(name, args, () => handler(args));
30
          return result;
31
        }
32
      );
33

34
      // Log cache layer info
35
      logger.info("tool_call_result", {
36
        tool: name,
37
        cacheLayer: from,
38
        inputSize: serializedQuery.length,
39
      });
40

41
      return value;
42
    } catch (error) {
43
      return {
44
        content: [{ type: "text", text: `Error: ${error}` }],
45
        isError: true,
46
      };
47
    }
48
  });
49
}
50

51
// Register tools with caching
52
instrumentedAndCachedTool("list_issues", "List issues", { /* ... */ }, async (args) => { /* ... */ });
53
instrumentedAndCachedTool("get_issue", "Get issue", { /* ... */ }, async (args) => { /* ... */ });
54
instrumentedAndCachedTool("search_issues", "Search issues", { /* ... */ }, async (args) => { /* ... */ });
55
// Mutations still get traced but not cached (handled by ToolResultCache rules)
56
instrumentedAndCachedTool("create_issue", "Create issue", { /* ... */ }, async (args) => { /* ... */ });

Step 8: Cache Monitoring Dashboard#

Expose cache statistics alongside metrics from Day 1.

1
app.get("/cache/stats", async (req, res) => {
2
  const stats = await cache.stats();
3
  const totalHits = stats.exact.hits + stats.semantic.hits + stats.tool.hits;
4
  const totalMisses = stats.exact.misses + stats.semantic.misses + stats.tool.misses;
5
  const hitRate = totalHits + totalMisses > 0
6
    ? (totalHits / (totalHits + totalMisses) * 100).toFixed(1)
7
    : "N/A";
8

9
  res.json({
10
    hitRate: `${hitRate}%`,
11
    layers: stats,
12
    estimatedSavings: {
13
      exact: `$${(stats.exact.hits * 0.001).toFixed(3)}`, // ~$0.001 saved per exact hit
14
      semantic: `$${(stats.semantic.hits * 0.005).toFixed(2)}`,
15
      tool: `$${(stats.tool.hits * 0.003).toFixed(2)}`,
16
    },
17
  });
18
});

Expected output:#

1
{
2
  "hitRate": "42.3%",
3
  "layers": {
4
    "exact": { "hits": 234, "misses": 512 },
5
    "semantic": { "hits": 89, "misses": 178 },
6
    "tool": { "hits": 156, "misses": 234 }
7
  },
8
  "estimatedSavings": {
9
    "exact": "$0.47",
10
    "semantic": "$0.18",
11
    "tool": "$0.31"
12
  }
13
}

Cache Strategy Decision Matrix#

Type of data	Cache layer	TTL	Example	Cost saved
LLM response “what issues are open?”	Semantic	5 min	Same question, different words	Token cost × 3-5x
Tool result `get_issue(#42)`	Exact	1 min	Same params, repeated call	API rate limit
Tool result `list_issues(open)`	Exact	30s	Browsing different issues	API calls
`create_issue` result	None (0 TTL)	—	Mutation	—
System prompt	Manual prefetch	Session	Agent instructions	Token cost daily
Embedding vector	Semantic	30 min	Text similarity search	Embedding API cost

Production Considerations#

Cache invalidation is hard#

1
// Problem: user creates issue, then lists issues — stale data shown
2
// Solution: invalidate list_issues cache on create_issue mutation
3
await cache.invalidateOnMutation("create_issue", { owner: "foo", repo: "bar" });

Tune thresholds based on data#

1
// For issue titles (short, distinct) — lower threshold is fine
2
const semantic = new SemanticCache(redis, "semantic:", 0.80);
3

4
// For bug report bodies (long, similar) — higher threshold to avoid wrong matches
5
const semanticStrict = new SemanticCache(redis, "semantic:", 0.92);

Cache warming for common queries#

1
// Pre-cache common queries at startup
2
async function warmCache() {
3
  const commonQueries = [
4
    "show open issues in my repository",
5
    "list all bugs",
6
    "what needs attention",
7
  ];
8
  for (const query of commonQueries) {
9
    await semantic.set(query, "placeholder", 300_000);
10
  }
11
}

Summary#

Concept	Implementation	Benefit
Exact cache	`<tool>:<params>` → Redis TTL	Fastest, deterministic
Semantic cache	Embedding → cosine similarity → nearest neighbor	Catches rephrased queries
Tool result cache	`ToolResultCache` with per-tool rules	Predictable TTLs
Stale-while-revalidate	Return stale + refresh in background	Zero-latency reads
Cache orchestrator	Fallthrough: exact → tool → semantic → fresh	Best hit rate
Invalidation	Mutation-aware cascade clear	Data freshness

Checklist:#

Redis running and accessible
Exact cache configured with per-tool TTLs
Semantic cache threshold tuned (start at 0.85)
Mutation tools invalidate related caches
Stale-while-revalidate enabled for read tools
Cache stats endpoint exposes hit rate
Cost savings tracked in dashboard

Day	Topic
1	Observability & Telemetry ✅
2	Caching Strategies ✅
3	Error Handling & Resilience
4	A/B Testing Prompts & Configs
5	Multi-Region & High Availability
6	Building an Internal Agent Platform

Series: AI Agents in Production. Day 2: Three-layer caching (exact, semantic, tool-result) with Redis, embeddings, TTL-based invalidation, and stale-while-revalidate. Full TypeScript source code included.