Unit Test dbt Models Without Connecting to DB and Integrate Tests Into a CI/CD

QASource Engineering Team | January 7, 2025

How to Unit Test dbt Models Without Connecting to Database

To test dbt models without connecting to a live database, you can use mock data and isolate logic to validate transformations locally or within a CI/CD pipeline.

Setup and Configuration Requirements

  1. Testing Framework: You can use pytest with a tool like dbt-test-utils or custom Python scripts to mock data and validate model logic. While pytest will handle assertions, you’ll simulate data without a database connection.
  2. Mocking Data: For dbt, instead of using a live database, you can mock the data with in-memory data structures like Pandas DataFrames (in Python) to simulate the data transformations. Alternatively, dbt seeds can provide controlled test datasets.
  3. CI/CD Integration: Bitbucket Pipelines can run these tests as part of the CI/CD process. You can configure a pipeline to run unit tests in one step (without a database) and full integration tests (with a live database) in a separate step.

Steps to Unit Test dbt Models

  1. Mock Data Using Python

    You can mock your dbt models by representing tables with in-memory DataFrames (e.g., using Pandas). Here’s an example setup with pytest.

    • Install dependencies
      • Install pytest, dbt-core, and Pandas in your project.
        pip install pytest dbt-core pandas
  2. Example of Mocked Unit Test

    Here’s how you can mock the data and write a test for a dbt model that transforms raw order data into aggregated customer order data.

    • Python Code for Testing
      import pytest
      import pandas as pd
      from my_dbt_project.models import customer_orders  # Replace it with the actual model path
      
      # Mocked test data
      @pytest.fixture
      def mock_data():
          raw_orders = pd.DataFrame({
              'customer_id': [1, 1, 2],
              'order_id': [101, 102, 103],
              'order_amount': [100, 200, 300]
          })
          return raw_orders
      # Unit test for customer_orders transformation
      def test_customer_orders(mock_data):
          result = customer_orders(mock_data)  # This would be your transformation logic
          expected_result = pd.DataFrame({
              'customer_id': [1, 2],
              'total_orders': [2, 1],
              'total_order_value': [300, 300]
          })
      pd.testing.assert_frame_equal(result, expected_result)
      

    In this example, customer_orders is your dbt model logic represented as a function and the test mocks the raw data as a Pandas DataFrame. The assertions ensure that the transformation logic works as expected.

  3. Using dbt Seeds for Testing

    Alternatively, you can use debt's seeds to load test data. In your debt project, create a CSV file with mock data that simulates what your raw data would look like. This will be used in place of a real database table.

    -- models/customer_orders.sql
    WITH raw_data AS (
        SELECT *
        FROM {{ ref('seed_raw_orders') }}
    )
    SELECT
        customer_id,
        COUNT(order_id) AS total_orders,
        SUM(order_amount) AS total_order_value
    FROM raw_data
    GROUP BY customer_id

Key Takeaways

  • Framework: Use pytest to mock dbt model data and validate transformations.
  • Mocking Data: Use Pandas DataFrames or dbt seeds to represent and test your models without a live database.
  • Trade-offs: Using dbt seeds keeps everything within the dbt framework, while pytest offers more flexibility for detailed unit tests and assertions.

This setup allows granular testing of dbt models locally or within a CI/CD pipeline without requiring a database connection.

Disclaimer

This publication is for informational purposes only, and nothing contained in it should be considered legal advice. We expressly disclaim any warranty or responsibility for damages arising out of this information and encourage you to consult with legal counsel regarding your specific needs. We do not undertake any duty to update previously posted materials.

Post a Comment

Categories