A conversational AI agent that manages a personal store directory. Users save stores and retrieve them through natural language β with streaming responses, CSV bulk import, a full admin dashboard, and end-to-end observability.
4
Agent tools
7
Database tables
8
Intent classes
1
Avg turns per save
Built to handle the full lifecycle β from natural language input to structured storage, retrieval, and observability.
Natural language works out of the box β "Save ACME Hardware, 415-555-0198" or "What's the number for Costco?"
Responses stream token-by-token over SSE. No polling, no loading spinners β output appears as the model generates it.
Google's libphonenumber normalizes every input to E.164 (+14155550198) regardless of how the user typed it.
Upload a .csv file and every store is extracted, phone-validated, and saved automatically β up to 500 rows per file.
pg_trgm provides trigram-based similarity search so "costco" finds "Costco Wholesale" without a vector database.
If the primary LLM provider is unavailable, PydanticAI's FallbackModel automatically retries with a secondary provider.
Metrics, user management, live prompt editor, conversation inspector, and system config β all browser-based.
Langfuse traces every LLM call. Sentry captures errors with PII redacted. Structured JSON logs keyed by request ID.
APScheduler auto-purges old messages, orphaned conversations, and stale logs on a configurable schedule.
Three Docker services communicating over REST and SSE, backed by PostgreSQL.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Browser / Client β
β Next.js 14 (TypeScript) β
β Chat UI Β· Store Directory Β· Settings Β· Admin Dashboard β
ββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β REST + SSE (streaming)
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FastAPI (Python 3.11) β
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββ ββββββββββββ β
β β Auth / JWT β β Chat / SSE β β Store CRUD β β Admin β β
β ββββββββββββββββ ββββββββ¬ββββββββ ββββββββββββββ ββββββββββββ β
β β β
β ββββββββΌβββββββ β
β β Orchestrator β β deterministic state machine β
β ββββββββ¬βββββββ β
β β PydanticAI β
β ββββββββββββββ΄βββββββββββββ β
β βΌ βΌ β
β ββββββββββββββββββββ βββββββββββββββββββββββββ β
β β Anthropic Claudeβ β OpenAI GPT (fallback) β β
β β (primary LLM) β β via FallbackModel β β
β ββββββββββββββββββββ βββββββββββββββββββββββββ β
ββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββΊ PostgreSQL 15 (stores, conversations, budget, audit log)
ββββΊ Langfuse (LLM call tracing)
ββββΊ Sentry (error tracking, PII-scrubbed)
ββββΊ Telegram Bot (budget cap & system alerts)The LLM is never trusted to make decisions. The agent extracts intent and generates natural language, but every consequential gate is enforced in Python: passphrase correctness, phone number validity, state transitions, budget enforcement, and duplicate detection. This means the system is correct even if the LLM hallucinates or misbehaves.
The agent is a deterministic state machine. The LLM never chooses the next state β Python does.
βββββββββββββββββββ
β IDLE βββββββββββββββββββββββ
ββββββββββ¬βββββββββ β
β β
ββββββββββββββββββΌβββββββββββββββββ β
βΌ βΌ βΌ β
ββββββββββββββββββββ βββββββββββ ββββββββββββββββββββ β
βCOLLECTING_SAVE_ β βOFF_SCOPEβ βAWAITING_PASSPHRASEβ β
βINFO β β_WARNED β β_FOR_RETRIEVE or β β
ββββββββββ¬ββββββββββ ββββββ¬βββββ β_FOR_REVERSE β β
β β (Γ3) ββββββββββ¬βββββββββββ β
ββββββββββ βΌ ββββββββββββββββ
β ββββββββββββ
βββΊβTERMINATEDβ
ββββββββββββsave_store(store_name, phone)No gateSaves a store after validating the phone to E.164 and checking for duplicates.
lookup_by_name(query)PassphraseFuzzy name search using pg_trgm trigram similarity.
lookup_by_phone(phone)PassphraseReverse lookup β find a store by its phone number, including partial digits.
terminate_conversation(reason)No gateEnds the session and triggers automatic summary generation.
Every user message is first classified by Claude Haiku (fast, cheap) before the main agent acts on it β separating routing from response generation.
saveUser wants to save one storeretrieve_by_nameLook up a store by nameretrieve_by_phoneFind a store by its phone numbermulti_saveMultiple stores in one messagemulti_operationMix of save + lookup in one messageoff_scopeUnrelated to store managementterminateUser is done ("bye", "thanks")clarificationAmbiguous β needs follow-upEvery dependency chosen for a specific purpose β no bloat, no experiments in production.
PostgreSQL 15 with pg_trgm, async SQLAlchemy, JSONB columns, and explicit indexes on every hot path.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β users β
β id Β· email Β· hashed_password Β· passphrase_hash Β· created_at β
ββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β 1
ββββββββββββββββΌββββββββββββββββββββ
β N β N β N
βββββββββββββββΌβββββββ βββββΌβββββββββββββββ βββΌβββββββββββββββββ
β stores β β conversations β β token_usage β
β store_name β β state (FSM) β β tokens_in/out β
β normalized_name β β pending_action β β cost_usd β
β phone_e164 β β JSONB β β date β
β phone_original β ββββββββββ¬ββββββββββ ββββββββββββββββββββ
β deleted_at β β 1
ββββββββββββββββββββββ ββββββββββ΄βββββββββββββββ
β N β 1
ββββββββββΌβββββββ βββββββββββββββΌβββββββ
β messages β β conversation_ β
β role β β summaries β
β content β β summary TEXT β
β tool_calls β β operations_count β
β JSONB β ββββββββββββββββββββββ
β latency_ms β
βββββββββββββββββ
ββββββββββββββββββββββββββββββββ
β audit_log β
β event_type Β· severity β
β source Β· details JSONB β
ββββββββββββββββββββββββββββββββusersAccounts with passphrase fields separate from login password β a second independent secret layer for phone lookups.
storesCore data. Each record has display name, normalized_name (search-ready), E.164 phone, and original phone format. Soft-deleted β never physically erased.
conversationsOne row per chat session. Holds the state machine state and pending_action JSONB β a serialized intent paused waiting for passphrase.
messagesEvery turn in every conversation. tool_calls JSONB records which tools fired and whether they succeeded. Latency and token counts are stored on every assistant turn.
conversation_summariesOne-to-one with conversations. Written by the LLM summarizer when a conversation ends. Kept separately so it can be queried cheaply.
token_usageOne row per user per day. Accumulates input tokens, output tokens, and USD cost. The budget enforcer queries this table with a 60-second in-process cache.
audit_logAppend-only log of security-relevant events. Used for fraud detection and compliance.
Every foreign key and hot query path has an explicit index. A GIN trigram index on normalized_name turns fuzzy search from a full table scan into a fast index lookup.
Every database call uses SQLAlchemy's async engine (asyncpg driver). A single worker handles many concurrent requests β each await yields the thread while Postgres responds.
The admin metrics endpoint runs 4 independent query groups simultaneously with asyncio.gather. The conversation list uses a single CTE instead of N+1 queries.
conversations.pending_action and messages.tool_calls are JSONB. The schema never needs a migration when the agent gains new tools or new state types.
SQLAlchemy maintains a pool of persistent connections. New requests reuse existing connections rather than paying the TCP handshake cost on every request.
Stores are soft-deleted (deleted_at IS NOT NULL) rather than physically removed. No cascading deletes; all queries simply add WHERE deleted_at IS NULL.
Every non-trivial problem encountered during development and how it was resolved.
Architecture decisions with explicit rationale β not defaults or convention.
| Decision | Rationale |
|---|---|
| LLM not trusted for gates | Passphrase, validation, state transitions all enforced in Python β LLM can only generate text |
| PydanticAI for agent | Native Pydantic v2 types; structured tool I/O; clean async streaming API |
| pg_trgm fuzzy search | Handles "costco" β "Costco Wholesale" without a vector DB; no extra infrastructure |
| SSE over WebSocket | Stateless; no connection broker needed; native browser support; simpler auth |
| Claude Haiku for classifiers | 10Γ cheaper than Sonnet; fast; intent classification is a simple enough task |
| FallbackModel for circuit breaker | PydanticAI's built-in retry wrapper; no custom state machine for provider failover |
| In-process budget cache | Avoids a DB query on every request; 60s TTL is acceptable staleness for a soft cap |
| Soft delete for stores | Preserves audit trail; admin can recover accidentally deleted entries |
| Single backend replica | APScheduler requirement; called out explicitly so operators don't accidentally scale horizontally |
| Telegram over Slack for alerts | Simpler setup (no workspace required); bot token + chat ID is sufficient; free |
Every LLM call is traced, every error captured, every request ID tracked end-to-end.