Production-grade AI · Full-stack · Observable

StoreAgent

A conversational AI agent that manages a personal store directory. Users save stores and retrieve them through natural language — with streaming responses, CSV bulk import, a full admin dashboard, and end-to-end observability.

Dual-model agentPostgreSQL + pg_trgmLangfuse tracingProvider failover

Agent tools

Database tables

Intent classes

Avg turns per save

Features

Everything in one place

Built to handle the full lifecycle — from natural language input to structured storage, retrieval, and observability.

Conversational Save & Lookup

Natural language works out of the box — "Save ACME Hardware, 415-555-0198" or "What's the number for Costco?"

Streaming Responses

Responses stream token-by-token over SSE. No polling, no loading spinners — output appears as the model generates it.

E.164 Phone Normalization

Google's libphonenumber normalizes every input to E.164 (+14155550198) regardless of how the user typed it.

CSV Bulk Import

Upload a .csv file and every store is extracted, phone-validated, and saved automatically — up to 500 rows per file.

Fuzzy Name Search

pg_trgm provides trigram-based similarity search so "costco" finds "Costco Wholesale" without a vector database.

Provider Failover

If the primary LLM provider is unavailable, PydanticAI's FallbackModel automatically retries with a secondary provider.

Admin Dashboard

Metrics, user management, live prompt editor, conversation inspector, and system config — all browser-based.

Full Observability

Langfuse traces every LLM call. Sentry captures errors with PII redacted. Structured JSON logs keyed by request ID.

Data Retention Cron Jobs

APScheduler auto-purges old messages, orphaned conversations, and stale logs on a configurable schedule.

Architecture

How the system is wired

Three Docker services communicating over REST and SSE, backed by PostgreSQL.

system-architecture.txt

┌─────────────────────────────────────────────────────────────────────┐
│                         Browser / Client                            │
│                    Next.js 14  (TypeScript)                         │
│         Chat UI · Store Directory · Settings · Admin Dashboard      │
└────────────────────────────┬────────────────────────────────────────┘
                             │  REST + SSE (streaming)
                             ▼
┌─────────────────────────────────────────────────────────────────────┐
│                     FastAPI  (Python 3.11)                          │
│                                                                     │
│  ┌──────────────┐  ┌──────────────┐  ┌────────────┐  ┌──────────┐  │
│  │  Auth / JWT  │  │  Chat / SSE  │  │ Store CRUD │  │  Admin   │  │
│  └──────────────┘  └──────┬───────┘  └────────────┘  └──────────┘  │
│                           │                                         │
│                    ┌──────▼──────┐                                  │
│                    │ Orchestrator │  ← deterministic state machine  │
│                    └──────┬──────┘                                  │
│                           │  PydanticAI                             │
│              ┌────────────┴────────────┐                            │
│              ▼                         ▼                            │
│    ┌──────────────────┐    ┌───────────────────────┐                │
│    │  Anthropic Claude│    │  OpenAI GPT (fallback) │               │
│    │  (primary LLM)   │    │  via FallbackModel     │               │
│    └──────────────────┘    └───────────────────────┘                │
└────────┬───────────────────────────────────────────────────────────┘
         │
         ├──► PostgreSQL 15  (stores, conversations, budget, audit log)
         ├──► Langfuse       (LLM call tracing)
         ├──► Sentry         (error tracking, PII-scrubbed)
         └──► Telegram Bot   (budget cap & system alerts)

Key Architectural Principle

The LLM is never trusted to make decisions. The agent extracts intent and generates natural language, but every consequential gate is enforced in Python: passphrase correctness, phone number validity, state transitions, budget enforcement, and duplicate detection. This means the system is correct even if the LLM hallucinates or misbehaves.

Agent Behavior

State machine + four tools

The agent is a deterministic state machine. The LLM never chooses the next state — Python does.

conversation-state-machine.txt

                    ┌─────────────────┐
                    │      IDLE       │◄────────────────────┐
                    └────────┬────────┘                     │
                             │                              │
            ┌────────────────┼────────────────┐             │
            ▼                ▼                ▼             │
  ┌──────────────────┐  ┌─────────┐  ┌──────────────────┐   │
  │COLLECTING_SAVE_  │  │OFF_SCOPE│  │AWAITING_PASSPHRASE│   │
  │INFO              │  │_WARNED  │  │_FOR_RETRIEVE or   │   │
  └────────┬─────────┘  └────┬────┘  │_FOR_REVERSE       │   │
           │                 │ (×3)  └────────┬──────────┘   │
           └────────┐        ▼                └──────────────┘
                    │  ┌──────────┐
                    └─►│TERMINATED│
                       └──────────┘

The agent has exactly four tools — no more, no less

save_store(store_name, phone)No gate

Saves a store after validating the phone to E.164 and checking for duplicates.

lookup_by_name(query)Passphrase

Fuzzy name search using pg_trgm trigram similarity.

lookup_by_phone(phone)Passphrase

Reverse lookup — find a store by its phone number, including partial digits.

terminate_conversation(reason)No gate

Ends the session and triggers automatic summary generation.

Intent Classification

Every user message is first classified by Claude Haiku (fast, cheap) before the main agent acts on it — separating routing from response generation.

saveUser wants to save one store

retrieve_by_nameLook up a store by name

retrieve_by_phoneFind a store by its phone number

multi_saveMultiple stores in one message

multi_operationMix of save + lookup in one message

off_scopeUnrelated to store management

terminateUser is done ("bye", "thanks")

clarificationAmbiguous — needs follow-up

Tech Stack

Built on proven open-source

Every dependency chosen for a specific purpose — no bloat, no experiments in production.

Backend

FastAPI 0.115PydanticAISQLAlchemy 2 (async)asyncpgAlembicfastapi-users 13phonenumbersAPSchedulerstructlogpandashttpxpydantic-settings

LLM Layer

Anthropic Claude SonnetClaude Haiku (classifier)OpenAI GPT (fallback)FallbackModelLangfuse 2.xSentry SDK

Frontend

Next.js 14 (App Router)TypeScript 5Tailwind CSS 3.4lucide-reactNative SSE

Infrastructure

PostgreSQL 15pg_trgm extensionDocker + ComposeRailwayTelegram Bot API

Database

7-table schema built for speed

PostgreSQL 15 with pg_trgm, async SQLAlchemy, JSONB columns, and explicit indexes on every hot path.

schema-diagram.txt

┌──────────────────────────────────────────────────────────────────┐
│  users                                                           │
│  id · email · hashed_password · passphrase_hash · created_at    │
└────────────────────────────┬─────────────────────────────────────┘
                             │ 1
              ┌──────────────┼───────────────────┐
              │ N            │ N                 │ N
┌─────────────▼──────┐  ┌───▼──────────────┐  ┌─▼────────────────┐
│  stores            │  │  conversations   │  │  token_usage     │
│  store_name        │  │  state (FSM)     │  │  tokens_in/out   │
│  normalized_name   │  │  pending_action  │  │  cost_usd        │
│  phone_e164        │  │    JSONB         │  │  date            │
│  phone_original    │  └────────┬─────────┘  └──────────────────┘
│  deleted_at        │           │ 1
└────────────────────┘  ┌────────┴──────────────┐
                        │ N                     │ 1
               ┌────────▼──────┐  ┌─────────────▼──────┐
               │  messages     │  │  conversation_      │
               │  role         │  │  summaries          │
               │  content      │  │  summary TEXT       │
               │  tool_calls   │  │  operations_count   │
               │    JSONB      │  └────────────────────┘
               │  latency_ms   │
               └───────────────┘
┌──────────────────────────────┐
│  audit_log                   │
│  event_type · severity       │
│  source · details JSONB      │
└──────────────────────────────┘

Table purposes

users

Accounts with passphrase fields separate from login password — a second independent secret layer for phone lookups.

stores

Core data. Each record has display name, normalized_name (search-ready), E.164 phone, and original phone format. Soft-deleted — never physically erased.

conversations

One row per chat session. Holds the state machine state and pending_action JSONB — a serialized intent paused waiting for passphrase.

messages

Every turn in every conversation. tool_calls JSONB records which tools fired and whether they succeeded. Latency and token counts are stored on every assistant turn.

conversation_summaries

One-to-one with conversations. Written by the LLM summarizer when a conversation ends. Kept separately so it can be queried cheaply.

token_usage

One row per user per day. Accumulates input tokens, output tokens, and USD cost. The budget enforcer queries this table with a 60-second in-process cache.

audit_log

Append-only log of security-relevant events. Used for fraud detection and compliance.

Why the design is fast

Load-bearing indexes

Every foreign key and hot query path has an explicit index. A GIN trigram index on normalized_name turns fuzzy search from a full table scan into a fast index lookup.

Async all the way down

Every database call uses SQLAlchemy's async engine (asyncpg driver). A single worker handles many concurrent requests — each await yields the thread while Postgres responds.

Batched parallel queries

The admin metrics endpoint runs 4 independent query groups simultaneously with asyncio.gather. The conversation list uses a single CTE instead of N+1 queries.

JSONB for flexible agent state

conversations.pending_action and messages.tool_calls are JSONB. The schema never needs a migration when the agent gains new tools or new state types.

Connection pooling

SQLAlchemy maintains a pool of persistent connections. New requests reuse existing connections rather than paying the TCP handshake cost on every request.

Soft deletes prevent cascades

Stores are soft-deleted (deleted_at IS NOT NULL) rather than physically removed. No cascading deletes; all queries simply add WHERE deleted_at IS NULL.

Engineering Challenges

Hard problems, real solutions

Every non-trivial problem encountered during development and how it was resolved.

Design Decisions

Why each choice was made

Architecture decisions with explicit rationale — not defaults or convention.

Decision	Rationale
LLM not trusted for gates	Passphrase, validation, state transitions all enforced in Python — LLM can only generate text
PydanticAI for agent	Native Pydantic v2 types; structured tool I/O; clean async streaming API
pg_trgm fuzzy search	Handles "costco" → "Costco Wholesale" without a vector DB; no extra infrastructure
SSE over WebSocket	Stateless; no connection broker needed; native browser support; simpler auth
Claude Haiku for classifiers	10× cheaper than Sonnet; fast; intent classification is a simple enough task
FallbackModel for circuit breaker	PydanticAI's built-in retry wrapper; no custom state machine for provider failover
In-process budget cache	Avoids a DB query on every request; 60s TTL is acceptable staleness for a soft cap
Soft delete for stores	Preserves audit trail; admin can recover accidentally deleted entries
Single backend replica	APScheduler requirement; called out explicitly so operators don't accidentally scale horizontally
Telegram over Slack for alerts	Simpler setup (no workspace required); bot token + chat ID is sufficient; free

Observability

Nothing is a black box

Every LLM call is traced, every error captured, every request ID tracked end-to-end.

Langfuse Tracing

Input/output tokens per call
Model name + prompt version
Latency per turn
Tool calls and results
Conversation ID as trace group

Structured Logging

structlog with JSON output
Request ID on every line
User + conversation ID
Latency in ms per turn
Token counts per request

Health Check

GET /healthz endpoint
Database connectivity
Anthropic reachability
OpenAI reachability
Sentry status