dbt for Analysts: How to Transform and Document Your Data Without a Data Engineer
Learn how dbt empowers data analysts to build reliable, documented data models in SQL without relying on a data engineering team.
Stop Waiting on Data Engineering: How dbt Puts Analysts in Control
If you’ve ever raised a Jira ticket to get a transformation done, waited two weeks for it to be prioritised, and then watched the business make decisions on stale data in the meantime — this post is for you.
dbt (data build tool) has quietly become one of the most important tools in the modern data stack. And the best part? You don’t need to be a data engineer to use it. If you can write SQL, you can use dbt to build clean, reliable, documented data models that your whole organisation can trust.
This guide is aimed at analysts — business analysts, data analysts, BI developers, and QA professionals who work with data every day but have traditionally handed off transformation work to engineers. Let’s change that.
What Is dbt, and Why Should Analysts Care?
dbt is an open-source transformation tool that sits on top of your data warehouse (Snowflake, BigQuery, Redshift, DuckDB, and others). It takes SQL SELECT statements and turns them into tables or views in your warehouse — with version control, testing, and documentation built in.
Here’s what makes it genuinely useful for analysts:
- You write SQL. No Python required (though it supports Jinja templating, which we’ll get to).
- It runs in your warehouse. There’s no separate compute layer to manage.
- It documents itself — or rather, it makes documentation easy enough that you’ll actually do it.
- dbt Core is free. dbt Cloud offers a managed UI with extras, but the core tool costs nothing.
The mental model shift is this: instead of writing a one-off query and saving it in a shared folder nobody can find, you write a dbt model — a .sql file tracked in version control — that becomes a permanent, tested, documented asset.
Setting Up Your First dbt Project
Installing dbt Core
You’ll need Python installed. From there, it’s a pip install:
pip install dbt-bigquery
(Replace bigquery with your warehouse adapter — dbt-snowflake, dbt-duckdb, etc.)
Initialise a new project:
dbt init my_analytics_project
This creates a folder structure with a models/ directory — which is where all your SQL files will live.
Connecting to Your Warehouse
dbt uses a profiles.yml file to store your connection credentials. You configure it once, and then every model you write will automatically run against your warehouse. Most cloud warehouses support OAuth or service account authentication, so you’re not storing passwords in plain text.
Building Your First Data Model
Say you’re working with raw sales data that looks something like this in your warehouse:
raw.sales_orders (order_id, customer_id, order_date, amount_usd, status)
You want to create a clean, analysis-ready table. In dbt, you’d create a file at models/sales/stg_sales_orders.sql:
SELECT
order_id,
customer_id,
CAST(order_date AS DATE) AS order_date,
amount_usd / 100.0 AS amount_gbp,
LOWER(TRIM(status)) AS status
FROM {{ source('raw', 'sales_orders') }}
WHERE order_id IS NOT NULL
Run it with:
dbt run --select stg_sales_orders
dbt will execute this SQL in your warehouse and materialise it as a view (or table, depending on your config). That’s it. You’ve just built a data model.
The {{ source(...) }} syntax is Jinja templating — it tells dbt where the raw data is coming from and enables lineage tracking automatically.
Testing Your Data (Yes, Analysts Can Do This Too)
This is where dbt really separates itself from a folder of SQL scripts. You can define tests directly in a YAML file alongside your model.
Create models/sales/stg_sales_orders.yml:
version: 2
models:
- name: stg_sales_orders
columns:
- name: order_id
tests:
- unique
- not_null
- name: status
tests:
- accepted_values:
values: ['completed', 'pending', 'cancelled']
Run your tests with:
dbt test --select stg_sales_orders
dbt will query your warehouse to check that order_id is unique and non-null, and that status only contains expected values. If a test fails, you know before the business sees the data.
This is especially valuable for QA analysts and BAs who already think in terms of acceptance criteria — dbt just gives you a structured way to encode those criteria into your pipeline.
Documentation That Actually Gets Written
Most teams have documentation in theory. In practice, there’s a Confluence page from 2021 that nobody’s updated.
dbt solves this by making documentation part of the workflow. Add descriptions to your YAML file:
models:
- name: stg_sales_orders
description: "Cleaned and standardised sales orders from the raw ingestion layer."
columns:
- name: order_id
description: "Unique identifier for each order."
- name: amount_gbp
description: "Order value in GBP, converted from cents."
Then generate and serve the docs:
dbt docs generate
dbt docs serve
This opens a browser-based portal with an interactive data lineage graph, column-level descriptions, and test results. Share the URL with stakeholders and suddenly the question “where does this number come from?” has a proper answer.
Structuring Your Models: A Simple Layering Approach
As your project grows, structure matters. A widely used convention is the staging → intermediate → mart pattern:
- Staging (
stg_): One-to-one with raw sources. Clean, rename, cast types. - Intermediate (
int_): Join and aggregate staging models. Business logic lives here. - Marts (
dim_,fct_): Final, business-facing tables for reporting tools.
This means your BI tool (Power BI, Tableau, Looker) is always pointing at clean mart models, not messy raw tables. And when something changes upstream, you trace it through the lineage graph — not through five nested Excel tabs.
Practical Tips for Analysts Getting Started
- Start small. Pick one messy query your team runs repeatedly. Turn it into a dbt model this week.
- Use dbt Cloud’s free tier if you’d rather not set up a local environment immediately. It has a browser-based IDE.
- Commit to version control from day one. Use GitHub or GitLab. Your future self will thank you.
- Pair with a BI colleague. Get a dashboard pointed at your dbt mart model early — seeing it in production is motivating.
- Read the dbt Slack community. It’s one of the most genuinely helpful communities in data. Questions get answered fast.
Conclusion: Ownership Is the New Analyst Superpower
The data landscape has shifted. Warehouses are faster and cheaper than ever, SQL skills are widespread, and tools like dbt have removed the barriers that used to separate analysts from production-grade data pipelines.
You don’t need to wait for a data engineer to transform your data, test it, or document it. You can own that work yourself — and do it in a way that’s repeatable, auditable, and trustworthy.
At Softcraft Studio, our mission is to help analysts at every stage of their career gain the skills and confidence to do more with data. Whether you’re a BA looking to formalise your SQL work, a BI developer tired of brittle spreadsheet pipelines, or a QA analyst who wants to bring testing discipline to data — dbt is one of the most practical tools you can add to your toolkit right now.
Start with one model. Build from there.