1. Overview¶

Below is an overview of what etlTest is and how it works. It is important to understand some fundamental concepts:

2. Why etlTest?¶

Data integration tools do not have standard output in terms of code. To make matters even more interesting, many of them do not integrate with external version control systems (like Subversion or Git) let alone have a universal way to test code. etlTest aims to change that last part by providing a universal way to work with data integration tests. This way, regardless of the data source or data integration tool your tests will be able to be used with minimal effort to convert them over when the stack you’re working on changes.

3. How Does It Work?¶

Developing tests in etlTest is designed to be as simple as possible. All that is required (other than installing etlTest ;) ) is to generate a sample data file...

//etltest/samples/data/etlUnitTest/users.yml
 1:
   user_id: 1
   first_name:  Bob
   last_name:  Richards
   birthday:  2000-01-04
   zipcode:  55555
   is_active: 0
 2:
   user_id: 2
   first_name:  Sarah
   last_name: Jenkins
   birthday:  2000-02-02
   zipcode:  12345
   is_active: 1
 ...

and a test file...

//etltest/samples/test/dataMart/users_dim.yml
DataMart\UsersDim:
   suites:
     - suite: dataMart
   processes:
     - tool:  PDI
       processes:
         - name:  data_mart/user_dim_jb.kjb
           type:  job
   dataset:
     - source:  etlUnitTest
       table:  users
       records:  [1, 2]
   tests:
     - name: testFirstNameLower
       desc:  Test for process that lower cases the first name field of a users table record.
       type: NotEqual
       query:
         select: first_name
         from: user_dim
         where: user_id = 2
         source:  etlUnitTest
         result: {'first_name': 'sarah'}

See sample data file standards and test file standards for full template details.

Once your tests have been written, you can then have etlTest generate and execute your code.

$ etlTest.py -f <path_to_your_test.yml> -o <path_to_your_output_dir> -g -e

Which will generate and run something similar to:

//etltest/samples/output/DataMart/UsersDim.py
#!/usr/bin/python
#
# This file was created by etlTest.
#

# These tests are also run as part of the following suites:
#
#    dataMart
#
# The following processes are executed for these tests:
#
#    PDI:
#      data_mart/user_dim_jb.kjb

import unittest
import datetime
from os import path

from etltest.data_connector import DataConnector
from etltest.process_executor import ProcessExecutor
from etltest.utilities.settings_manager import SettingsManager


class DataMartUsersDimTest(unittest.TestCase):

    def setUp(self):
          # Queries for loading test data.
            DataConnector("etlUnitTest").insert_data("users", [1, 2])

            PDI_settings = SettingsManager().get_tool("PDI")
            PDI_code_path = SettingsManager().system_variable_replace(PDI_settings["code_path"])
            ProcessExecutor("PDI").execute_process("job",
            path.join(PDI_code_path, "data_mart/user_dim_jb.kjb"))

    def tearDown(self):
       # Clean up testing environment.

        DataConnector("etlUnitTest").truncate_data("users")

    def testFirstNameLower(self):
        # Test for process that lower cases the first name field of a users table record.

        given_result = DataConnector("etlUnitTest").select_data("first_name",
                        "user_dim", "user_id = 2")

        expected_result = [{'first_name': 'sarah'}]

        self.assertNotEqual(given_result, expected_result)

if __name__ == "__main__":
    unittest.main()

Notice that etlTest generates actual Python code so that you can leverage a full blown testing framework without writing a single line of code! We’ll go over the various components of the test suites in Test Components