● Consulting pillar · Architecture-first · Multi-cloud-native · Iceberg by default
Data & cloud architecture,
done in the order
agents will need it.
Most enterprises start AI work before their data architecture can support it. We start at the architecture, every time. Senior data and cloud engineers who design pipelines that survive contact with production, build the substrate enterprises need before anything autonomous can run on top, and have opinions about every tool they recommend.
Architecture is the prerequisite agents skip at their peril.
The single most common reason agentic AI projects get canceled is that the data architecture underneath couldn't support them. (Gartner: more than 40% canceled by the end of 2027.) The same is true of dashboards that need to be trustworthy, of mobile apps that need real-time enterprise data, of any system that has to read consistent state and write idempotent actions.
We exist to fix the data layer first. The pattern is consistent: a two-week architecture review reveals the gaps; six to twelve weeks of focused data engineering work closes them; everything downstream — mobile, agents, AI — gets cheaper and more likely to ship.
Eight decisions. Not thirty logos.
Every consulting page on the internet lists the same logos. We list our actual decision criteria. Each row is the call we make in greenfield engagements; each is replaceable for the right reason.
| # | Layer | Default | When to swap |
|---|---|---|---|
| 1 | Warehouse / lakehouse engine | Snowflake for SQL analytics; Databricks for ML & agents; BigQuery if GCP-native | The full framework lives in our Snowflake vs. Databricks 2026 teardown. |
| 2 | Table format | Apache Iceberg | ~78% of new lakehouse deployments. Delta only if Databricks-tied and UniForm interop isn't sufficient. Hudi only in narrow legacy contexts. |
| 3 | Transformation | dbt | Move to SQLMesh once you exceed ~200 dbt models or the incrementality patterns start to bite. |
| 4 | Orchestration | Airflow for established teams; Dagster greenfield | Temporal when the orchestrator needs to survive crashes mid-workflow. Prefect if the team specifically asks for it. |
| 5 | Ingestion | Fivetran (managed), dlt (Python-native), Airbyte (open) | Pick by data-team profile, not by religion. The right answer depends on whether you have engineers to maintain pipelines. |
| 6 | Streaming | Confluent or MSK Kafka by default; Kinesis if AWS-tied; Redpanda if latency is the constraint | Streaming decisions are almost always cloud-anchored. Multi-cloud streaming is harder than it looks; pick once and commit. |
| 7 | Quality / observability | Monte Carlo at production scale; Great Expectations for early-stage | Bigeye and Soda are credible alternatives. Datafold for data diffs in CI. |
| 8 | Vector layer | pgvector first (when Postgres is already in the stack) | Promote to Pinecone or Weaviate when scale or query patterns demand. Qdrant if self-hosted is a hard requirement. |
None of these are arbitrary. We've made each call enough times to know when to break it. Most consulting pitches list every tool they've ever touched; we list the call we'd make for you, with the reasons — and we'll change it on the right argument.
Architecture Readiness Sprint.
Two weeks. Fixed scope. No prerequisites.
A 10-day engagement that opens with the right question: what's in your data architecture today, and what does it need to be before anything ambitious on top of it has a chance?
About a third of these conclude that the agentic AI project (or mobile rebuild, or analytics initiative) you were planning doesn't need a different vendor — it needs six to twelve weeks of data architecture work first.
Deliverable: an audit of the current data architecture · a written 12-month roadmap (what to build, in what order, with what tradeoffs) · an honest answer on whether the project on the other side of the sprint is actually ready.
Request a scoped proposal →The work that follows.
Once the architecture is clear, the engagement shapes around the work:
- Data platform builds and migrations — production-grade Snowflake or Databricks (or both — see the teardown), with Iceberg, dbt, and observability from day one.
- Lakehouse adoption — Iceberg migrations from legacy formats; UniForm interop for hybrid stacks.
- Governance layer implementation — Atlan or Collibra alongside Horizon or Unity Catalog; data contracts at the source.
- AI-ready data preparation — the substrate work that makes agentic AI possible (semantic layer, vector strategy, lineage, freshness SLAs).
- dbt-to-SQLMesh transitions — for teams past ~200 dbt models hitting the incrementality wall.
- Architecture rescues — projects where the data layer is the bottleneck and needs honest help. We tell you up front whether it can be saved inside this engagement, or whether it's a longer conversation.
What we don't do.
A short list, because saying no is part of the work:
- We don't build vanity dashboards.
- We don't take agentic AI work without a data architecture review first. Most of the 40% would not have been in the 40% if their consulting firm had said this.
- We don't engage with fewer than 8 weeks of runway, because nothing that matters can be built in 4.
- We don't run "modernization" engagements that produce slides instead of pipelines.
- We don't recommend a tool because we have a relationship with the vendor.
Two engineers. One honest answer.
A 30-minute briefing with the people who'd actually do the work. If the architecture is fine and you don't need us, we'll tell you. If it isn't, we'll tell you that too — and give you a written 12-month roadmap in 10 working days.
Request a scoped proposal →