Ask your AWS bill questions. In English.
A natural-language agent that infers resource ownership from CloudTrail, IAM, and cost behaviour using a Graph Neural Network. Answers questions like why did our bill spike Tuesday? with specific resources, teams, and dollar amounts.
Try it live
Chat with the agent over a synthetic 68-resource AWS account. The agent has 10 tools — it picks which to chain based on your question, hits the GraphSAGE-attributed scan, and answers in plain English.
Ask anything about a synthetic AWS account with 68 resources across 4 teams. The agent has 10 tools and runs on GPT-4o.
Live demo runs on GPT-4o · ~$0.01 per question · rate-limited to 5/IP/hour
The problem
Every FinOps team's most painful metric: the percentage of AWS spend that can't be attributed to any team. Industry estimates put it at 40–60%. Tagging is the standard answer, but tags drift. Resources are created in a hurry. Engineers leave.
Existing FinOps dashboards (CloudHealth, Vantage, Apptio) are only as good as the tags you have — and on most accounts, the tags you have aren't enough. CostDNA is the input layer: a tool that infers the missing tags from behaviour, then lets you ask English questions about the result.
The audit story
I had a 97% accuracy result on Microsoft's published 2.6M-VM Azure dataset. I audited it. It was a tautology.
Across all 33,205 deployments in the Azure trace, 100% mapped 1:1 to a single subscription. So deployment_id, used as a graph edge, was a perfect lookup of subscription_id. LabelProp's "97%" was a graph database join, not learning.
Most engineers stop when they see a high accuracy number and ship it. I caught the leak by asking "are you sure data is accurate?" The same audit on Microsoft's Philly DL trace surfaced another partial leak: 85% of users belong to one virtual cluster. Three datasets, three different shortcuts, one consistent finding.
| Dataset | Resources | First-cut | Audited shortcut | Honest behavioural |
|---|---|---|---|---|
| Microsoft Azure | 2.6M VMs | 97% | deployment_id ≡ sub (100% deterministic) | 6.9% (12× rand) |
| Microsoft Philly | 117K jobs | 89% | user_id → vc (85% deterministic) | 14% (2× rand) |
The methodological finding: production cloud attribution is mostly a metadata-lookup problem. Behavioural fingerprinting matters specifically when metadata is missing or unreliable — exactly the gap CostDNA's synthetic env reproduces.
How it works
Three layers, all cloud-agnostic. Only the collector layer (left) changes per cloud — the GNN and agent (right) are identical for AWS, Azure, and GCP.
Hardened boto3 / azure-mgmt / google-cloud collectors pull CloudTrail (or equivalent), Cost Explorer, IAM roles, VPC flow logs. Throttle-aware retry. AWS production-tested.
Behavioural features (peak_hour, write_ratio, event_diversity, …) + LLM-derived semantic features (sentence-transformer over IAM names) → 2/4-layer GraphSAGE GNN. Auto-shrinks for small label sets.
10-tool LLM agent (GPT-4o, function-calling) answers natural-language questions. Tools are pure data lookups against the trained scan output — fast, deterministic, auditable.
10 tools the agent chains
Each tool wraps a piece of the underlying CostDNA pipeline. The LLM decides which to call (or chain) based on the visitor's question.
summarize_accountHigh-level rollup: total resources, by-team spend and confidence.
attribute_resourceLook up which team owns a specific resource and the why-explanation.
top_spendersTop resources by total cost, optionally filtered by team.
find_cost_spikesLargest spikes + Granger-causality attribution to deploys.
find_anomaliesResources that don't fit any team — investigate manually.
search_resourcesSubstring match across resource IDs.
signal_historyRecent CloudTrail events + cost samples for one resource.
find_idleLow-activity resources to consider for cleanup.
compare_teamsSide-by-side comparison: counts, spend, top resources, by type.
find_abandonedResources whose activity has collapsed in the recent half of the window — likely abandoned. Sorted by spend.
Real-AWS deployment
Provisioned a labeled AWS environment, ran per-team workload simulators on a 24/7 EC2 to generate authentic CloudTrail signal, scanned the live account. Same code that powers the live demo.
Both wrong predictions came back with confidence below 0.7 and were correctly surfaced by find_anomalies for human review — exactly the active-learning workflow the system is designed for. Verification artifacts in docs/real-aws-evidence/.
Visual proof — embedding space
GraphSAGE learns a 2D-projected representation where same-team resources cluster together and unowned ones sit visibly separate.

Synthetic AWS env. The tan unowned cluster (vendor / legacy / orphan / shadow resources) sits visibly apart from the team clusters. The anomaly detector catches them automatically.
Multi-cloud architecture
The model + features + agent are cloud-agnostic — only the collector layer is provider-specific. AWS calls cloudtrail:LookupEvents; Azure calls monitor.activity_logs.list; GCP calls cloud_logging.list_entries. Same downstream pipeline.
13/15 = 87% on real AWS · live demo runs against this stack
Code follows azure-mgmt-resource v25 + activity-logs + cost-management v4 patterns; mocked-shape tests pass
Code follows google-cloud-asset v4 + cloud-logging protobuf payloads; mocked-shape tests pass
Run it yourself
Three usage patterns — CLI, REPL, web UI. All run against the same agent code that powers the live demo above.
A.One-shot CLI
$ costdna ask "why did our bill spike Tuesday?" \
--from-dir runs/todayB.Multi-turn REPL
$ costdna chat --from-dir runs/today
[0] ❯ summarize this account
[1] ❯ which 5 resources are spending the most?
[2] ❯ tell me about i-0c4f3230 specificallyC.Web chat UI (Streamlit)
$ pip install 'costdna[ui,agent]'
$ costdna serve
# open http://localhost:8501 → "Chat with the agent" tabD.Docker (no install)
$ docker run --rm pauti04/costdna scan --syntheticSetup for the agent commands: pip install 'costdna[agent]' + export OPENAI_API_KEY=...
Stack
- ▸Python 3.11— pandas, numpy, scikit-learn, statsmodels, networkx
- ▸PyTorch 2.x + PyTorch Geometric— GraphSAGE classifier — 2 to 4 layers, residual
- ▸sentence-transformers— MiniLM embeddings of IAM role names + resource IDs + tags
- ▸OpenAI SDK— function-calling agent loop — pluggable LLM backend
- ▸boto3 (hardened)— adaptive retry, throttle-aware CloudTrail lookup_events
- ▸azure-mgmt-* + google-cloud-*— multi-cloud collectors (lazy-loaded extras)
- ▸Streamlit + Click + Rich— CLI commands + interactive chat UI
- ▸Terraform— labelled AWS env with CloudTrail data events + VPC Flow Logs
- ▸pytest + GitHub Actions— CI on every commit; Docker auto-publish on tag
- ▸Next.js + Vercel— this landing page + serverless agent endpoint
Open source. Forkable. Ready.
MIT licensed, hardened collectors, multi-cloud architecture, real-AWS numbers in the README. If you're hiring for cloud-cost / FinOps / ML-infra roles — I'd like to do this kind of work full-time.