Production-grade AI Β· Full-stack Β· Observable

StoreAgent

A conversational AI agent that manages a personal store directory. Users save stores and retrieve them through natural language β€” with streaming responses, CSV bulk import, a full admin dashboard, and end-to-end observability.

Dual-model agentPostgreSQL + pg_trgmLangfuse tracingProvider failover

4

Agent tools

7

Database tables

8

Intent classes

1

Avg turns per save

Features

Everything in one place

Built to handle the full lifecycle β€” from natural language input to structured storage, retrieval, and observability.

Conversational Save & Lookup

Natural language works out of the box β€” "Save ACME Hardware, 415-555-0198" or "What's the number for Costco?"

Streaming Responses

Responses stream token-by-token over SSE. No polling, no loading spinners β€” output appears as the model generates it.

E.164 Phone Normalization

Google's libphonenumber normalizes every input to E.164 (+14155550198) regardless of how the user typed it.

CSV Bulk Import

Upload a .csv file and every store is extracted, phone-validated, and saved automatically β€” up to 500 rows per file.

Fuzzy Name Search

pg_trgm provides trigram-based similarity search so "costco" finds "Costco Wholesale" without a vector database.

Provider Failover

If the primary LLM provider is unavailable, PydanticAI's FallbackModel automatically retries with a secondary provider.

Admin Dashboard

Metrics, user management, live prompt editor, conversation inspector, and system config β€” all browser-based.

Full Observability

Langfuse traces every LLM call. Sentry captures errors with PII redacted. Structured JSON logs keyed by request ID.

Data Retention Cron Jobs

APScheduler auto-purges old messages, orphaned conversations, and stale logs on a configurable schedule.

Architecture

How the system is wired

Three Docker services communicating over REST and SSE, backed by PostgreSQL.

system-architecture.txt
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         Browser / Client                            β”‚
β”‚                    Next.js 14  (TypeScript)                         β”‚
β”‚         Chat UI Β· Store Directory Β· Settings Β· Admin Dashboard      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚  REST + SSE (streaming)
                             β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     FastAPI  (Python 3.11)                          β”‚
β”‚                                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Auth / JWT  β”‚  β”‚  Chat / SSE  β”‚  β”‚ Store CRUD β”‚  β”‚  Admin   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                           β”‚                                         β”‚
β”‚                    β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”                                  β”‚
β”‚                    β”‚ Orchestrator β”‚  ← deterministic state machine  β”‚
β”‚                    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜                                  β”‚
β”‚                           β”‚  PydanticAI                             β”‚
β”‚              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                            β”‚
β”‚              β–Ό                         β–Ό                            β”‚
β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                β”‚
β”‚    β”‚  Anthropic Claudeβ”‚    β”‚  OpenAI GPT (fallback) β”‚               β”‚
β”‚    β”‚  (primary LLM)   β”‚    β”‚  via FallbackModel     β”‚               β”‚
β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β”œβ”€β”€β–Ί PostgreSQL 15  (stores, conversations, budget, audit log)
         β”œβ”€β”€β–Ί Langfuse       (LLM call tracing)
         β”œβ”€β”€β–Ί Sentry         (error tracking, PII-scrubbed)
         └──► Telegram Bot   (budget cap & system alerts)

Key Architectural Principle

The LLM is never trusted to make decisions. The agent extracts intent and generates natural language, but every consequential gate is enforced in Python: passphrase correctness, phone number validity, state transitions, budget enforcement, and duplicate detection. This means the system is correct even if the LLM hallucinates or misbehaves.

Agent Behavior

State machine + four tools

The agent is a deterministic state machine. The LLM never chooses the next state β€” Python does.

conversation-state-machine.txt
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚      IDLE       │◄────────────────────┐
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜                     β”‚
                             β”‚                              β”‚
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”             β”‚
            β–Ό                β–Ό                β–Ό             β”‚
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
  β”‚COLLECTING_SAVE_  β”‚  β”‚OFF_SCOPEβ”‚  β”‚AWAITING_PASSPHRASEβ”‚   β”‚
  β”‚INFO              β”‚  β”‚_WARNED  β”‚  β”‚_FOR_RETRIEVE or   β”‚   β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β”‚_FOR_REVERSE       β”‚   β”‚
           β”‚                 β”‚ (Γ—3)  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
           └────────┐        β–Ό                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    └─►│TERMINATEDβ”‚
                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The agent has exactly four tools β€” no more, no less

save_store(store_name, phone)No gate

Saves a store after validating the phone to E.164 and checking for duplicates.

lookup_by_name(query)Passphrase

Fuzzy name search using pg_trgm trigram similarity.

lookup_by_phone(phone)Passphrase

Reverse lookup β€” find a store by its phone number, including partial digits.

terminate_conversation(reason)No gate

Ends the session and triggers automatic summary generation.

Intent Classification

Every user message is first classified by Claude Haiku (fast, cheap) before the main agent acts on it β€” separating routing from response generation.

saveUser wants to save one store
retrieve_by_nameLook up a store by name
retrieve_by_phoneFind a store by its phone number
multi_saveMultiple stores in one message
multi_operationMix of save + lookup in one message
off_scopeUnrelated to store management
terminateUser is done ("bye", "thanks")
clarificationAmbiguous β€” needs follow-up
Tech Stack

Built on proven open-source

Every dependency chosen for a specific purpose β€” no bloat, no experiments in production.

Backend

FastAPI 0.115PydanticAISQLAlchemy 2 (async)asyncpgAlembicfastapi-users 13phonenumbersAPSchedulerstructlogpandashttpxpydantic-settings

LLM Layer

Anthropic Claude SonnetClaude Haiku (classifier)OpenAI GPT (fallback)FallbackModelLangfuse 2.xSentry SDK

Frontend

Next.js 14 (App Router)TypeScript 5Tailwind CSS 3.4lucide-reactNative SSE

Infrastructure

PostgreSQL 15pg_trgm extensionDocker + ComposeRailwayTelegram Bot API
Database

7-table schema built for speed

PostgreSQL 15 with pg_trgm, async SQLAlchemy, JSONB columns, and explicit indexes on every hot path.

schema-diagram.txt
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  users                                                           β”‚
β”‚  id Β· email Β· hashed_password Β· passphrase_hash Β· created_at    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚ 1
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚ N            β”‚ N                 β”‚ N
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  stores            β”‚  β”‚  conversations   β”‚  β”‚  token_usage     β”‚
β”‚  store_name        β”‚  β”‚  state (FSM)     β”‚  β”‚  tokens_in/out   β”‚
β”‚  normalized_name   β”‚  β”‚  pending_action  β”‚  β”‚  cost_usd        β”‚
β”‚  phone_e164        β”‚  β”‚    JSONB         β”‚  β”‚  date            β”‚
β”‚  phone_original    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚  deleted_at        β”‚           β”‚ 1
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                        β”‚ N                     β”‚ 1
               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
               β”‚  messages     β”‚  β”‚  conversation_      β”‚
               β”‚  role         β”‚  β”‚  summaries          β”‚
               β”‚  content      β”‚  β”‚  summary TEXT       β”‚
               β”‚  tool_calls   β”‚  β”‚  operations_count   β”‚
               β”‚    JSONB      β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚  latency_ms   β”‚
               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  audit_log                   β”‚
β”‚  event_type Β· severity       β”‚
β”‚  source Β· details JSONB      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Table purposes

users

Accounts with passphrase fields separate from login password β€” a second independent secret layer for phone lookups.

stores

Core data. Each record has display name, normalized_name (search-ready), E.164 phone, and original phone format. Soft-deleted β€” never physically erased.

conversations

One row per chat session. Holds the state machine state and pending_action JSONB β€” a serialized intent paused waiting for passphrase.

messages

Every turn in every conversation. tool_calls JSONB records which tools fired and whether they succeeded. Latency and token counts are stored on every assistant turn.

conversation_summaries

One-to-one with conversations. Written by the LLM summarizer when a conversation ends. Kept separately so it can be queried cheaply.

token_usage

One row per user per day. Accumulates input tokens, output tokens, and USD cost. The budget enforcer queries this table with a 60-second in-process cache.

audit_log

Append-only log of security-relevant events. Used for fraud detection and compliance.

Why the design is fast

01

Load-bearing indexes

Every foreign key and hot query path has an explicit index. A GIN trigram index on normalized_name turns fuzzy search from a full table scan into a fast index lookup.

02

Async all the way down

Every database call uses SQLAlchemy's async engine (asyncpg driver). A single worker handles many concurrent requests β€” each await yields the thread while Postgres responds.

03

Batched parallel queries

The admin metrics endpoint runs 4 independent query groups simultaneously with asyncio.gather. The conversation list uses a single CTE instead of N+1 queries.

04

JSONB for flexible agent state

conversations.pending_action and messages.tool_calls are JSONB. The schema never needs a migration when the agent gains new tools or new state types.

05

Connection pooling

SQLAlchemy maintains a pool of persistent connections. New requests reuse existing connections rather than paying the TCP handshake cost on every request.

06

Soft deletes prevent cascades

Stores are soft-deleted (deleted_at IS NOT NULL) rather than physically removed. No cascading deletes; all queries simply add WHERE deleted_at IS NULL.

Engineering Challenges

Hard problems, real solutions

Every non-trivial problem encountered during development and how it was resolved.

Design Decisions

Why each choice was made

Architecture decisions with explicit rationale β€” not defaults or convention.

DecisionRationale
LLM not trusted for gatesPassphrase, validation, state transitions all enforced in Python β€” LLM can only generate text
PydanticAI for agentNative Pydantic v2 types; structured tool I/O; clean async streaming API
pg_trgm fuzzy searchHandles "costco" β†’ "Costco Wholesale" without a vector DB; no extra infrastructure
SSE over WebSocketStateless; no connection broker needed; native browser support; simpler auth
Claude Haiku for classifiers10Γ— cheaper than Sonnet; fast; intent classification is a simple enough task
FallbackModel for circuit breakerPydanticAI's built-in retry wrapper; no custom state machine for provider failover
In-process budget cacheAvoids a DB query on every request; 60s TTL is acceptable staleness for a soft cap
Soft delete for storesPreserves audit trail; admin can recover accidentally deleted entries
Single backend replicaAPScheduler requirement; called out explicitly so operators don't accidentally scale horizontally
Telegram over Slack for alertsSimpler setup (no workspace required); bot token + chat ID is sufficient; free
Observability

Nothing is a black box

Every LLM call is traced, every error captured, every request ID tracked end-to-end.

Langfuse Tracing

  • Input/output tokens per call
  • Model name + prompt version
  • Latency per turn
  • Tool calls and results
  • Conversation ID as trace group

Structured Logging

  • structlog with JSON output
  • Request ID on every line
  • User + conversation ID
  • Latency in ms per turn
  • Token counts per request

Health Check

  • GET /healthz endpoint
  • Database connectivity
  • Anthropic reachability
  • OpenAI reachability
  • Sentry status