To test dbt models without connecting to a live database, you can use mock data and isolate logic to validate transformations locally or within a CI/CD pipeline.
Mock Data Using Python
You can mock your dbt models by representing tables with in-memory DataFrames (e.g., using Pandas). Here’s an example setup with pytest.
pip install pytest dbt-core pandas
Example of Mocked Unit Test
Here’s how you can mock the data and write a test for a dbt model that transforms raw order data into aggregated customer order data.
import pytest import pandas as pd from my_dbt_project.models import customer_orders # Replace it with the actual model path # Mocked test data @pytest.fixture def mock_data(): raw_orders = pd.DataFrame({ 'customer_id': [1, 1, 2], 'order_id': [101, 102, 103], 'order_amount': [100, 200, 300] }) return raw_orders # Unit test for customer_orders transformation def test_customer_orders(mock_data): result = customer_orders(mock_data) # This would be your transformation logic expected_result = pd.DataFrame({ 'customer_id': [1, 2], 'total_orders': [2, 1], 'total_order_value': [300, 300] }) pd.testing.assert_frame_equal(result, expected_result)
In this example, customer_orders is your dbt model logic represented as a function and the test mocks the raw data as a Pandas DataFrame. The assertions ensure that the transformation logic works as expected.
Using dbt Seeds for Testing
Alternatively, you can use debt's seeds to load test data. In your debt project, create a CSV file with mock data that simulates what your raw data would look like. This will be used in place of a real database table.
-- models/customer_orders.sql WITH raw_data AS ( SELECT * FROM {{ ref('seed_raw_orders') }} ) SELECT customer_id, COUNT(order_id) AS total_orders, SUM(order_amount) AS total_order_value FROM raw_data GROUP BY customer_id
This setup allows granular testing of dbt models locally or within a CI/CD pipeline without requiring a database connection.