Skip to content
AIAn Alian Software company

Labs

What we're trying between engagements.

Internal experiments. Some land in client work. Some get shelved. We share them so you can see how we think about the problems we're paid to solve — and how we keep getting better at them.

  • Cross-tenant evaluation suite

    In progress

    A single eval harness that scores AI agents across anonymized client deployments — same test cases, different prompts and contexts. Lets us catch drift across the whole portfolio in one place.

    Why: Drift is the boring failure mode that kills agents. Centralized eval lets a 6-person team watch 20+ agents responsibly.

  • Synthetic conversation generator

    In progress

    Programmatically generate adversarial conversations for support agents — confused users, multi-turn manipulation, off-policy requests. Used to grow eval suites without waiting for production failures.

    Why: 20 hand-picked test cases cover the obvious. Synthetic generation extends coverage into the long tail.

  • Open-source on-prem reference stack

    In progress

    End-to-end on-prem AI stack — vLLM for inference, pgvector for retrieval, n8n for orchestration, Langfuse for observability. We deploy variants of this for clients who can't send data to a cloud LLM.

    Why: On-prem AI is harder than people admit. Having a reference deployment shortens client engagements meaningfully.

  • Agent cost telemetry SDK

    Shipped

    Lightweight TS/Python library that wraps Anthropic + OpenAI calls and emits token cost per conversation, per user, per workflow to your observability tool. We use it in every engagement now.

    Why: Clients deserve to know what every conversation cost. Most observability tools don't make that easy.

  • Multi-modal QC for SMB manufacturing

    In progress

    YOLO + vision-LLM hybrid for low-volume QC — uses YOLO for the common defects (fast, cheap) and a vision-LLM for the uncertain cases (slower, smarter). Pilot running in two plants.

    Why: Pure YOLO needs too much labeled data for SMB volumes. Pure VLM is too slow for production lines. Hybrid is the right answer.

  • Local-first agent demos

    Shelved

    Idea: tiny agents that run fully in-browser via WebGPU for prospects who want to test without sending data anywhere. Shelved because the model quality at WebGPU-sized inference isn't good enough yet.

    Why: We'll revisit when the local model quality catches up — probably late 2026.

  • Agent contract testing

    In progress

    Treating agents like services with contracts. Schema-validated tool calls, OpenAPI-style interface docs, contract tests in CI that fail builds if the agent breaks its interface.

    Why: Agents-as-services scales better than agents-as-magic. The boring infrastructure pattern is the right one.

Have a problem worth experimenting on?

We pilot interesting projects at a discount when the IP we earn is valuable to the rest of our client base. Tell us what you've got.