Monday, June 2, 2008

Know your test data

There has been a lot of attention given lately to the processes we go through to test our applilcations in preparation for a new release. Of course, an application must have data and that data must reflect the realities of the business. This leads us to an interesting problem; how do we ensure that our test data provide the full range of possible situations our applilcations will find in production? Easy, we'll just copy our production data to a test database or file.

Wait, not so fast. First, we have to be careful that our test data does not reveal information to people that don't already have a business need to know. ('testing' is not a business need to know!) So, unless our programmers and QA staff have an existing needs to know John Q Public's account balance, then letting them see the balance in the test data violates our privacy principles. So copying production data to test should, at best, only be a first step. The next thing we need to do is obfuscate the data, altering identification information such as ID numbers, names, addresses, and such. But wait, you say!

If I start obfuscating ID numbers then I'll break the referential integrity between my files. I won't be able to correlate the data in one file with the related data in another. Yep, that's right. Unless you are careful and change the ID number in all files the same way. This is non-trivial, and will likely take another program, which will need to be tested with test data. So this copy-from-production approach takes some work. That being said, it is fundamentaly flawed, and you have to work to minimize the impact of the problem.

When copying data from production, one has to be careful not to assume that all reasonable data possibilities are necessarily present. Just because a certain data configuration can exist, doesn't mean that it always must exist in every rendition of production data. Just because a situation existed last month, doesn't mean it will exist that way next month. But it might the month after that. You must therefore, validate with every production download that all data permutations are present.

The better solution, is to create the data you need to fully and completely validate every function and process in your application. This process will take some time, at first, but eventually you'll have a known, predictable test data set and can validate your application with the mathematical certainty usually reserved for GPS systems. Occaisionally a situation will arise in production that is not represented in your test data set. You'll have to add that condition in. You may even need to write code to 'refresh' your test data to update dates, or other permutation-sensitive elements.

The fundamental principle is this; you must know your test data or you cannot claim to have exercised the application. Copying from production opens the organization up to privacy issues and does not guarantee you've tested every condition. Furthermore, if you have an uncaught error in your existing version of the code, and you use the "well the new code matches production", you serve to perpetuate the error, with confidence!

The best practice is to create your own test data, making sure that every business permutation is fully exercised. Then, and only then, can you state with full confidence that you have tested the application. Now, let's talk about code coverage!

Follow by Email