Creating real test data - Fast and Efficiently

When I build RANDOM_NINJA I knew already one of things I wanted to use that library for. Building good valid and life-like test data has and is always a really big problem. The base for good test data is good and valid looking random data. Without that, most tests are not really valid, as data can be clustered wrongly, indexes act strange because data does not mimic real life and relations and constraints are hard to define or validate on data that is created using most available randomizing packages. That is why it was important to me that RANDOM_NINJA would be able to create random data from as many different domains as possible. As of today it can create more than 185 different data points from as many as 14 different data domains.

But having good random data is only half of it. You still need something that can define and create those tables. You also need something that can still maintain relations between those test tables, and make sure that foreign key distributions are real as well. So I created TESTDATA_NINJA. This package has some generic generators to create simple tables of people, populations which are true according to UN demographics data, CDR records and credit card transactions. The real function in this package is the custom generation procedure. This procedure can parse a string representation of what your data looks like and from that create pipelined functiones that can create thousands of test rows extremely fast and efficiently.

The testdata_ninja.generator_create procedure takes 2 arguments. The name of the new generator and the format of the test data. Below is a short description on how the format looks like.