Agent Security 2026: Production Security Patterns — Complete Agent Defense Stack

Đây là bài cuối.

4 bài trước covered prompt injection, tool access control, MCP server security, compliance auditing. Mỗi bài là một layer. Bài này show cách stack chúng vào một production defense system hoàn chỉnh.

Security cho AI agents không về single technique. Nó về defense in depth — nhiều layers catch failures ở mỗi level.

7-Layer Agent Defense Stack#

1
User Message
2
    │
3
    ▼
4
L1: Input Guardrails → detect injection, PII, PCI
5
    │ Pass? ─── Block → Log + Alert
6
    ▼
7
L2: Agent Reasoning (system prompt + steering)
8
    │
9
    ▼
10
L3: Tool Call Validator → params, context, sequence
11
    │ Pass? ─── Block → Log + Alert
12
    ▼
13
L4: MCP Server Security → transport auth, rate limiting
14
    │
15
    ▼
16
L5: Tool Execution → isolated process, resource limits
17
    │
18
    ▼
19
L6: Output Guardrails → PII leak, hallucination risk
20
    │ Pass? ─── Block → Log + Retry
21
    ▼
22
L7: Audit & Monitoring → trail, anomaly, incident response
23
    │
24
    ▼
25
Response to User

L1: Input Guardrails#

1
class InputGuardrail:
2
    async def process(self, message, context):
3
        checks = {
4
            "prompt_injection": await self.detect_injection(message),
5
            "pii": await self.detect_pii(message),
6
            "pci": await self.detect_pci(message),
7
            "content_safety": await self.check_content_safety(message),
8
        }
9
        critical = [n for n, r in checks.items() if not r.passed and r.severity == "critical"]
10
        if critical:
11
            await self.alert_security_team(...)
12
            return Blocked(reason=critical)
13
        message = self.redact_pii(message)
14
        return Passed(sanitized=message)

L3: Tool Call Validator (Production)#

1
TOOL_POLICIES = {
2
    "query_database": {
3
        "allowed_for": ["support", "engineering"],
4
        "max_calls": 50,
5
        "forbidden_tables": ["secrets", "credentials"],
6
        "require_approval": False,
7
    },
8
    "deploy_service": {
9
        "allowed_for": ["engineering"],
10
        "allowed_environments": ["staging"],
11
        "require_approval": True,  # Production needs human
12
    },
13
    "delete_records": {
14
        "allowed_for": [],  # No agent can call directly
15
        "require_human_initiated": True,
16
    },
17
}

L5: Tool Execution Isolation#

1
class IsolatedToolExecutor:
2
    async def execute(self, tool_name, params, credentials):
3
        with tempfile.TemporaryDirectory() as workdir:
4
            setrlimit(RLIMIT_AS, 256MB)    # 256MB memory
5
            setrlimit(RLIMIT_CPU, 30)       # 30s CPU
6
            setrlimit(RLIMIT_NOFILE, 100)   # 100 file descriptors
7

8
            proc = await create_subprocess_exec(
9
                "tool_executor", tool_name,
10
                cwd=workdir, preexec_fn=os.setsid,
11
            )
12
            # Timeout 30s → kill process group

L6: Output Guardrails#

1
class OutputGuardrail:
2
    async def process(self, response):
3
        checks["pii_leak"] = check_pii(response)          # Don't expose data
4
        checks["hallucination"] = score_hallucination(response)
5
        checks["data_exposure"] = check_raw_dumps(response)  # No raw DB exports
6

7
        if any critical fails:
8
            return Blocked("I can't provide that response due to security policy.")

Anomaly Detection Rules#

Tool call frequency spike > 3× historical average
Unusual tool sequence (e.g., query → delete without verify)
Cost anomaly > $10/min
Suspicious parameter values (SQL injection patterns, path traversal)
Repeated guardrail failures > 5 in 10 minutes

Incident Response#

1
class AgentIncidentResponder:
2
    async def respond(self, incident):
3
        # 1. Contain: stop agent, revoke tokens, revoke credentials
4
        # 2. Investigate: full audit trail, identify compromised tools
5
        # 3. Assess damage: what data was accessed
6
        # 4. Report: timeline, tools, exposed data
7
        # 5. Remediate: apply security patches

What We’ve Built (Series Summary)#

Bài	Layer	Key Techniques
1	Prompt Injection Defense	5-layer defense, prompt fences, guardrails
2	Tool Access Control	Least privilege, context validation, human-in-loop
3	MCP Server Security	Transport auth, JWT, rate limiting, cost protection
4	Auditing & Compliance	Audit trails, SOC2, GDPR, PCI, retention
5	Production Defense Stack	7-layer defense, monitoring, incident response

Production Checklist#

Series: Agent Security 2026 — Production Patterns. Bài 5 (final): Complete Agent Defense Stack.