AI Agents trong Production — Day 2: Caching Strategies

Observability cho ta biết agent đang làm gì. Giờ hãy ngừng trả tiền cho cùng một việc hai lần.

Agent thường hỏi cùng câu hỏi với cách diễn đạt khác nhau:

“Có issue nào đang open?” — hỏi 3 lần, wording khác nhau
“Show issue #42” — fetch 5 lần vì context window bị reset
list_issues(open) — gọi 4 lần với params giống hệt

Không cache → tốn token + API quota. Có cache → response trong milliseconds.

3 Layers#

Layer	Strategy	Cache key	TTL	Hit rate
Exact	Key = serialized input	`tool:list_issues:{...}`	30-60s	10-15%
Semantic	Key = embedding similarity	`embed:text` → nearest neighbor	5-30min	30-50%
Tool result	Key = tool + params hash	`result:issues:sha256(params)`	10-60s	20-30%

Step 1: Cài đặt#

1
npm install ioredis
2
docker run -d --name redis-cache -p 6379:6379 redis:7-alpine

Step 2: Cache Interface Chung#

`src/cache/interface.ts`#

1
export interface CacheEntry<T> {
2
  value: T;
3
  cachedAt: number;
4
  expiresAt: number;
5
  hitCount: number;
6
}
7

8
export interface CacheStats {
9
  hits: number; misses: number; size: number;
10
}
11

12
export interface CacheLayer {
13
  get<T>(key: string): Promise<CacheEntry<T> | null>;
14
  set<T>(key: string, value: T, ttlMs: number): Promise<void>;
15
  del(key: string): Promise<void>;
16
  clear(pattern?: string): Promise<number>;
17
  stats(): Promise<CacheStats>;
18
}

Step 3: Exact Cache#

`src/cache/exact-cache.ts`#

1
import Redis from "ioredis";
2
import { CacheEntry, CacheLayer, CacheStats } from "./interface.js";
3

4
export class ExactCache implements CacheLayer {
5
  private redis: Redis;
6
  private prefix = "exact:";
7
  private hits = 0; private misses = 0;
8

9
  constructor(redis?: Redis) { this.redis = redis || new Redis(); }
10

11
  async get<T>(key: string): Promise<CacheEntry<T> | null> {
12
    const data = await this.redis.get(`${this.prefix}${key}`);
13
    if (!data) { this.misses++; return null; }
14
    const entry: CacheEntry<T> = JSON.parse(data);
15
    if (Date.now() > entry.expiresAt) {
16
      await this.redis.del(`${this.prefix}${key}`);
17
      this.misses++; return null;
18
    }
19
    this.hits++; entry.hitCount++;
20
    return entry;
21
  }
22

23
  async set<T>(key: string, value: T, ttlMs: number): Promise<void> {
24
    await this.redis.set(`${this.prefix}${key}`,
25
      JSON.stringify({ value, cachedAt: Date.now(), expiresAt: Date.now() + ttlMs, hitCount: 0 }),
26
      "PX", ttlMs);
27
  }
28

29
  async del(key: string): Promise<void> { await this.redis.del(`${this.prefix}${key}`); }
30

31
  async clear(pattern = "*"): Promise<number> {
32
    const keys = await this.redis.keys(`${this.prefix}${pattern}`);
33
    return keys.length ? await this.redis.del(...keys) : 0;
34
  }
35

36
  stats = async (): Promise<CacheStats> => ({
37
    hits: this.hits, misses: this.misses,
38
    size: (await this.redis.keys(`${this.prefix}*`)).length,
39
  });
40
}

Step 4: Semantic Cache#

Dùng embedding để tìm query tương tự.

`src/cache/semantic-cache.ts`#

1
import Redis from "ioredis";
2
import { CacheEntry, CacheLayer } from "./interface.js";
3

4
interface SemanticEntry {
5
  key: string; embedding: number[]; value: string;
6
  cachedAt: number; expiresAt: number; hitCount: number;
7
}
8

9
export class SemanticCache implements CacheLayer {
10
  private redis: Redis;
11
  private prefix = "semantic:";
12
  private threshold: number;
13
  private hits = 0; private misses = 0;
14
  private static DIM = 64;
15

16
  constructor(redis?: Redis, threshold = 0.85) {
17
    this.redis = redis || new Redis();
18
    this.threshold = threshold;
19
  }
20

21
  // Hash-based embedding (demo). Production: dùng OpenAI/text-embedding-3-small
22
  static textToEmbedding(text: string): number[] {
23
    const vec = new Array(SemanticCache.DIM).fill(0);
24
    const cleaned = text.toLowerCase().replace(/[^a-z0-9\s]/g, "");
25
    for (let i = 0; i < cleaned.length - 2; i++) {
26
      const trigram = cleaned.slice(i, i + 3);
27
      let hash = 0;
28
      for (let j = 0; j < trigram.length; j++)
29
        hash = ((hash << 5) - hash) + trigram.charCodeAt(j);
30
      vec[Math.abs(hash) % SemanticCache.DIM] += 1;
31
    }
32
    const mag = Math.sqrt(vec.reduce((s, v) => s + v * v, 0));
33
    return mag === 0 ? vec : vec.map(v => v / mag);
34
  }
35

36
  static cosineSimilarity(a: number[], b: number[]): number {
37
    let dot = 0, na = 0, nb = 0;
38
    for (let i = 0; i < a.length; i++) { dot += a[i] * b[i]; na += a[i] * a[i]; nb += b[i] * b[i]; }
39
    const m = Math.sqrt(na) * Math.sqrt(nb);
40
    return m === 0 ? 0 : dot / m;
41
  }
42

43
  async get<T>(query: string): Promise<CacheEntry<T> | null> {
44
    const qEmb = SemanticCache.textToEmbedding(query);
45
    const keys = await this.redis.zrangebyscore(`${this.prefix}index`, Date.now() - 3600_000, Date.now());
46

47
    let best: { sim: number; entry: SemanticEntry } | null = null;
48
    for (const key of keys) {
49
      const data = await this.redis.get(`${this.prefix}${key}`);
50
      if (!data) continue;
51
      const entry: SemanticEntry = JSON.parse(data);
52
      if (Date.now() > entry.expiresAt) { await this.redis.del(`${this.prefix}${key}`); continue; }
53
      const sim = SemanticCache.cosineSimilarity(qEmb, entry.embedding);
54
      if (sim > (best?.sim || 0)) best = { sim, entry };
55
    }
56

57
    if (best && best.sim >= this.threshold) {
58
      this.hits++; return { value: JSON.parse(best.entry.value), cachedAt: best.entry.cachedAt, expiresAt: best.entry.expiresAt, hitCount: best.entry.hitCount + 1 };
59
    }
60
    this.misses++; return null;
61
  }
62

63
  async set<T>(text: string, value: T, ttlMs: number): Promise<void> {
64
    const entry: SemanticEntry = {
65
      key: text, embedding: SemanticCache.textToEmbedding(text),
66
      value: JSON.stringify(value), cachedAt: Date.now(),
67
      expiresAt: Date.now() + ttlMs, hitCount: 0,
68
    };
69
    await this.redis.set(`${this.prefix}${text}`, JSON.stringify(entry), "PX", ttlMs);
70
    await this.redis.zadd(`${this.prefix}index`, Date.now(), text);
71
  }
72

73
  async del(key: string): Promise<void> { await this.redis.del(`${this.prefix}${key}`); await this.redis.zrem(`${this.prefix}index`, key); }
74
  async clear(pattern = "*"): Promise<number> { const keys = await this.redis.keys(`${this.prefix}${pattern}`); const k = keys.filter(k => !k.includes(":index")).map(k => k.replace(this.prefix, "")); if (!k.length) return 0; await this.redis.zrem(`${this.prefix}index`, ...k); return await this.redis.del(...k.map(k => `${this.prefix}${k}`)); }
75
  async stats() { return { hits: this.hits, misses: this.misses, size: await this.redis.zcard(`${this.prefix}index`) }; }
76
}

Cách hoạt động:

1
Query 1: "list all open issues"
2
  ↓ hash embedding → [0.12, -0.34, ...]
3
  ↓ store
4

5
Query 2: "show open bugs"
6
  ↓ hash embedding → [0.11, -0.33, ...]
7
  ↓ cosine similarity = 0.94 > 0.85
8
  ↓ CACHE HIT

Step 5: Tool Result Cache#

Tool khác nhau có caching rules khác nhau.

`src/cache/tool-cache.ts`#

1
const TOOL_RULES: Record<string, { ttlMs: number; staleWhileRevalidate: boolean }> = {
2
  "get_issue":             { ttlMs: 60_000, staleWhileRevalidate: true },   // 1 phút
3
  "list_issues":           { ttlMs: 30_000, staleWhileRevalidate: true },   // 30 giây
4
  "search_issues":         { ttlMs: 30_000, staleWhileRevalidate: true },
5
  "list_issues_paginated": { ttlMs: 30_000, staleWhileRevalidate: true },
6
  "create_issue":          { ttlMs: 0, staleWhileRevalidate: false },       // Không cache
7
  "update_issue":          { ttlMs: 0, staleWhileRevalidate: false },
8
  "batch_label_issues":    { ttlMs: 0, staleWhileRevalidate: false },
9
};
10

11
export class ToolResultCache {
12
  private cache: ExactCache;
13

14
  private makeKey(tool: string, params: Record<string, unknown>): string {
15
    const sorted = Object.keys(params).sort().reduce((a, k) => { a[k] = params[k]; return a; }, {} as any);
16
    return `tool:${tool}:${JSON.stringify(sorted)}`;
17
  }
18

19
  async get<T>(tool: string, params: Record<string, unknown>): Promise<{ value: T; stale: boolean } | null> {
20
    const rules = TOOL_RULES[tool];
21
    if (!rules || rules.ttlMs === 0) return null;
22
    const entry = await this.cache.get<T>(this.makeKey(tool, params));
23
    if (!entry) return null;
24
    const age = Date.now() - entry.cachedAt;
25
    if (age > rules.ttlMs && rules.staleWhileRevalidate) return { value: entry.value, stale: true };
26
    return { value: entry.value, stale: false };
27
  }
28

29
  async set<T>(tool: string, params: Record<string, unknown>, value: T): Promise<void> {
30
    const rules = TOOL_RULES[tool]; if (!rules || rules.ttlMs === 0) return;
31
    await this.cache.set(this.makeKey(tool, params), value, rules.ttlMs);
32
    // Invalidate related caches
33
    if (["create_issue", "update_issue"].includes(tool)) {
34
      await this.cache.clear("tool:list_issues:*");
35
      await this.cache.clear("tool:search_issues:*");
36
    }
37
  }
38
}

Step 6: Cache Orchestrator#

Kết hợp 3 layers, fallthrough: exact → tool → semantic → fresh.

`src/cache/cache-orchestrator.ts`#

1
export class CacheOrchestrator {
2
  public exact = new ExactCache();
3
  public semantic = new SemanticCache();
4
  public tool = new ToolResultCache();
5

6
  async getCachedOrFetch<T>(
7
    input: { text?: string; tool?: string; params?: Record<string, unknown> },
8
    fetchFn: () => Promise<T>
9
  ): Promise<{ value: T; from: string }> {
10
    // 1. Exact
11
    if (input.text) {
12
      const e = await this.exact.get<T>(input.text);
13
      if (e) return { value: e.value, from: "exact" };
14
    }
15
    // 2. Tool result
16
    if (input.tool && input.params) {
17
      const c = await this.tool.get<T>(input.tool, input.params);
18
      if (c && !c.stale) return { value: c.value, from: "tool" };
19
    }
20
    // 3. Semantic
21
    if (input.text) {
22
      const s = await this.semantic.get<T>(input.text);
23
      if (s) return { value: s.value, from: "semantic" };
24
    }
25
    // 4. Fetch fresh
26
    const fresh = await fetchFn();
27
    if (input.text) { await this.exact.set(input.text, fresh, 30_000); await this.semantic.set(input.text, fresh, 300_000); }
28
    if (input.tool && input.params) await this.tool.set(input.tool, input.params, fresh);
29
    return { value: fresh, from: "fresh" };
30
  }
31
}

Step 7: Kiểm Tra#

1
npm run build
2
export GITHUB_TOKEN="ghp_..."
3
node build/server-with-cache.js

1
curl http://localhost:3001/cache/stats

1
{
2
  "hitRate": "42.3%",
3
  "layers": {
4
    "exact": { "hits": 234, "misses": 512 },
5
    "semantic": { "hits": 89, "misses": 178 },
6
    "tool": { "hits": 156, "misses": 234 }
7
  },
8
  "estimatedSavings": {
9
    "exact": "$0.47",
10
    "semantic": "$0.18",
11
    "tool": "$0.31"
12
  }
13
}

Cache Decision Matrix#

Data type	Layer	TTL	Cost saved
LLM response “what issues are open?”	Semantic	5 min	Token cost × 3-5x
Tool result (read)	Exact	1 min	API rate limit
Tool result (list)	Exact	30s	API calls
Mutation	None (0 TTL)	—	—
Embedding	Semantic	30 min	Embedding API cost
System prompt	Manual prefetch	Session	Token cost daily

Checklist#

Redis running
Exact cache: per-tool TTLs
Semantic cache: threshold tuned (start 0.85)
Invalidation: mutations clear related caches
Stale-while-revalidate cho read tools
Cache stats endpoint

Day	Chủ đề
1	Observability & Telemetry ✅
2	Caching Strategies ✅
3	Error Handling & Resilience
4	A/B Testing Prompts & Configs
5	Multi-Region & High Availability
6	Building an Internal Agent Platform

Series: AI Agents trong Production. Day 2: Ba lớp cache (exact, semantic, tool-result) với Redis, embeddings, và stale-while-revalidate.