How to Unit Test dbt Models Without Connecting to Database?

Written by QASource Engineering Team | Jan 7, 2025 5:51:00 AM

To test dbt models without connecting to a live database, you can use mock data and isolate logic to validate transformations locally or within a CI/CD pipeline.

Setup and Configuration Requirements

Testing Framework: You can use pytest with a tool like dbt-test-utils or custom Python scripts to mock data and validate model logic. While pytest will handle assertions, you’ll simulate data without a database connection.
Mocking Data: For dbt, instead of using a live database, you can mock the data with in-memory data structures like Pandas DataFrames (in Python) to simulate the data transformations. Alternatively, dbt seeds can provide controlled test datasets.
CI/CD Integration: Bitbucket Pipelines can run these tests as part of the CI/CD process. You can configure a pipeline to run unit tests in one step (without a database) and full integration tests (with a live database) in a separate step.

Steps to Unit Test dbt Models

Mock Data Using Python

You can mock your dbt models by representing tables with in-memory DataFrames (e.g., using Pandas). Here’s an example setup with pytest.
- Install dependencies
  - Install pytest, dbt-core, and Pandas in your project.
```
pip install pytest dbt-core pandas
```

Example of Mocked Unit Test

Here’s how you can mock the data and write a test for a dbt model that transforms raw order data into aggregated customer order data.

Python Code for Testing

import pytest
import pandas as pd
from my_dbt_project.models import customer_orders  # Replace it with the actual model path

# Mocked test data
@pytest.fixture
def mock_data():
    raw_orders = pd.DataFrame({
        'customer_id': [1, 1, 2],
        'order_id': [101, 102, 103],
        'order_amount': [100, 200, 300]
    })
    return raw_orders
# Unit test for customer_orders transformation
def test_customer_orders(mock_data):
    result = customer_orders(mock_data)  # This would be your transformation logic
    expected_result = pd.DataFrame({
        'customer_id': [1, 2],
        'total_orders': [2, 1],
        'total_order_value': [300, 300]
    })
pd.testing.assert_frame_equal(result, expected_result)

In this example, customer_orders is your dbt model logic represented as a function and the test mocks the raw data as a Pandas DataFrame. The assertions ensure that the transformation logic works as expected.

Using dbt Seeds for Testing

Alternatively, you can use debt's seeds to load test data. In your debt project, create a CSV file with mock data that simulates what your raw data would look like. This will be used in place of a real database table.
```
-- models/customer_orders.sql
WITH raw_data AS (
    SELECT *
    FROM {{ ref('seed_raw_orders') }}
)
SELECT
    customer_id,
    COUNT(order_id) AS total_orders,
    SUM(order_amount) AS total_order_value
FROM raw_data
GROUP BY customer_id
```

Key Takeaways

Framework: Use pytest to mock dbt model data and validate transformations.
Mocking Data: Use Pandas DataFrames or dbt seeds to represent and test your models without a live database.
Trade-offs: Using dbt seeds keeps everything within the dbt framework, while pytest offers more flexibility for detailed unit tests and assertions.

This setup allows granular testing of dbt models locally or within a CI/CD pipeline without requiring a database connection.

View full post