AI Agent 아키텍처 설계 — ReAct부터 Multi-Agent까지

AI Agent란

Agent = LLM + Tools + Memory + Planning

단순 LLM 호출과의 차이:

LLM: 한 번의 입력 → 출력
Agent: 목표 달성까지 반복적으로 계획하고 행동

사용자: "오늘 서울 날씨 알려줘"
LLM: "죄송합니다, 실시간 정보가 없습니다"
Agent: 
  1. Think: 날씨 API 호출이 필요함
  2. Act: weather_tool("Seoul")
  3. Observe: {"temp": 18, "condition": "맑음"}
  4. Answer: "오늘 서울은 18°C, 맑습니다"

ReAct 패턴

Reasoning + Acting의 반복.

from anthropic import Anthropic

client = Anthropic()

tools = [
    {
        "name": "search_web",
        "description": "웹 검색으로 최신 정보 조회",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "검색 쿼리"}
            },
            "required": ["query"]
        }
    },
    {
        "name": "run_code",
        "description": "Python 코드 실행",
        "input_schema": {
            "type": "object",
            "properties": {
                "code": {"type": "string"}
            },
            "required": ["code"]
        }
    }
]

def run_agent(user_message: str, max_iterations: int = 10):
    messages = [{"role": "user", "content": user_message}]
    
    for i in range(max_iterations):
        response = client.messages.create(
            model="claude-opus-4-6",
            max_tokens=4096,
            tools=tools,
            messages=messages,
        )
        
        if response.stop_reason == "end_turn":
            return response.content[0].text
        
        if response.stop_reason == "tool_use":
            tool_calls = [b for b in response.content if b.type == "tool_use"]
            tool_results = []
            
            for call in tool_calls:
                result = execute_tool(call.name, call.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": call.id,
                    "content": str(result)
                })
            
            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": tool_results})
    
    raise RuntimeError("Max iterations reached")

Memory 설계

Agent에는 세 종류의 메모리가 필요합니다.

class AgentMemory:
    def __init__(self):
        # 1. Working Memory: 현재 대화 컨텍스트
        self.messages: list[dict] = []
        
        # 2. Episodic Memory: 과거 대화 요약 (Vector DB)
        self.vector_store = ChromaDB("agent_episodes")
        
        # 3. Semantic Memory: 도메인 지식 (검색 가능)
        self.knowledge_base = ChromaDB("knowledge")
    
    def add_message(self, role: str, content: str):
        self.messages.append({"role": role, "content": content})
        # 컨텍스트 길이 초과 시 요약 후 압축
        if len(self.messages) > 20:
            self._compress()
    
    def retrieve_relevant(self, query: str, k: int = 3) -> list[str]:
        """과거 에피소드에서 유사한 경험 검색"""
        return self.vector_store.similarity_search(query, k=k)
    
    def _compress(self):
        """오래된 메시지를 요약해서 벡터 DB에 저장"""
        old_messages = self.messages[:10]
        summary = summarize(old_messages)  # LLM으로 요약
        self.vector_store.add(summary)
        self.messages = self.messages[10:]

Multi-Agent 아키텍처

복잡한 작업은 전문화된 여러 Agent가 협력합니다.

Orchestrator Agent
├── Research Agent    → 정보 수집
├── Analysis Agent    → 데이터 분석
├── Code Agent        → 코드 작성/실행
└── Writer Agent      → 결과 정리

class OrchestratorAgent:
    def __init__(self):
        self.agents = {
            "research": ResearchAgent(),
            "analysis": AnalysisAgent(),
            "code": CodeAgent(),
            "writer": WriterAgent(),
        }
    
    async def run(self, task: str) -> str:
        # 1. 작업 분해
        subtasks = await self.plan(task)
        
        # 2. 병렬 실행 (독립적인 작업)
        results = await asyncio.gather(*[
            self.agents[st.agent].run(st.description)
            for st in subtasks
        ])
        
        # 3. 결과 통합
        return await self.synthesize(task, results)

안전성 설계

class SafeAgent:
    # 위험 도구는 승인 후 실행
    REQUIRES_APPROVAL = {"delete_file", "send_email", "deploy_production"}
    
    async def execute_tool(self, tool_name: str, params: dict):
        if tool_name in self.REQUIRES_APPROVAL:
            if not await self.request_human_approval(tool_name, params):
                return {"error": "사용자가 거부했습니다"}
        
        # 실행 전 입력 검증
        validated = self.validate_inputs(tool_name, params)
        
        # 타임아웃 설정
        try:
            return await asyncio.wait_for(
                self.tools[tool_name](**validated),
                timeout=30.0
            )
        except asyncio.TimeoutError:
            return {"error": "Tool execution timeout"}

프로덕션 체크리스트

최대 반복 횟수 설정 (무한 루프 방지)
도구 타임아웃 설정
비용 상한 설정 (토큰 사용량 모니터링)
Human-in-the-loop: 위험 액션 전 승인 요청
실행 로그 저장 (디버깅 + 감사)
실패 복구: 도구 실패 시 대체 경로
입력 검증: 프롬프트 인젝션 방어

비용 예측

# claude-opus-4-6 기준
# Input: $15/1M tokens, Output: $75/1M tokens

def estimate_cost(iterations: int, avg_tokens_per_turn: int = 1000):
    input_tokens = iterations * avg_tokens_per_turn
    output_tokens = iterations * 500
    
    cost = (input_tokens * 15 + output_tokens * 75) / 1_000_000
    return f"예상 비용: ${cost:.4f} / 태스크"

Agent 아키텍처는 강력하지만 비용이 빠르게 증가합니다.
캐싱과 모델 선택(Haiku vs Opus)으로 비용을 제어하세요.

AI 서비스 운영과 성능개선을 위한 실전 허브