DQS Validation Rules on Composite Domains

DQS Composite DomainsIn Data Quality Services, composite domains can be created to associate together two or more natural domains within a knowledge base. Like natural domains, composite domains can also contain one or more validation rules to govern which domain values are valid. In my last post, I discussed the use of validation rules against natural domains. In this post, I’ll continue the thread by covering the essentials of composite domain rules and demonstrating how these can be used to create relationships between data domains.

What is a composite domain?

Before we break off into discussing the relationships between member domains of a composite domain, we’ll first touch on the essentials of the latter.

Simply, a composite domain is a wrapper for two or more organic domains in a knowledge base. Think of a composite domain as a virtual collection of dissimilar yet related properties. As best I can tell, the composite domain is not materialized in the DQS knowledge base, but is simply a meta wrapper pointing to the underlying values.

To demonstrate, I’ve created a knowledge base using a list of automobile makes and models, along with a few other properties (car type and seating capacity). I should be able to derive a loose association between automobile type and seating capacity, so I’ll create a composite domain with those two domains as shown below.

image

As shown above, creating a composite domain requires nothing more than selecting two or more domains from an existing knowledge base. After the composite domain has been created, your configuration options are generally limited to attaching the composite domain to a reference data provider (which I’ll cover in a future post) and adding composite domain rules.

Value association via composite domain rules

The most straightforward way to associate the values of a composite domain is to create one or more rules against that composite domain. When created against a composite domain, you can use rules to declare if/then scenarios to describe allowable combinations therein.

Back in the day, before marriage, kids, and a mortgage, I used to drive sports cars. Even though that was a long time ago, I do remember a few things about that type of automobile: they are fast, expensive to insure, and don’t have a lot of passenger capacity. It’s on that last point that we’ll focus our data quality efforts for now. I want to make sure that some sneaky manufacturer doesn’t falsely identify as a sports car some big and roomy 4-door sedan. Therefore, I’m going to create a rule that will restrict the valid domain values for seating capacity for sports cars.

I’ll start with some business assumptions. What’s the minimum number of seats a sports car should have? I think it’s probably 2, but I suppose if some enterprising gearhead decided to convert an Indy Car into a street-legal machine, it would likely be classified as a sports car too. Therefore, it would be reasonable to assume that, in an edge case, a sports car could have just a single seat, so our minimum seating capacity for a sports car would be 1. On the high side, design of sports cars should dictate that there aren’t many seats. For example, the Chevrolet Camaro I had in high school could seat 4 people, assuming that 2 of the people were small children who had no claustrophobic tendencies. However, we can give a little on this issue and assume that they somehow manage to squeeze a third rows of seats into a Dodge Magnum, so we’ll say that a sports car can have a maximum seating capacity of 6 people.

Now, with that information in hand, I’m going to use the Domain Management component of the DQS client to set up the new rule against the “Type and Capacity” composite domain from above. As shown below, I can set value-specific constraints on the seating capacity based on the automobile type of Sports Car.

image

As shown, any valid record with a car type of Sports Car must have a seating capacity of between 1 and 6 persons.

Of course, sports cars aren’t the only types of automobiles (gasp!), so this approach would likely involve multiple rules. Fortunately, composite domains allow for many such rules, which would permit the creation of additional restrictions for other automobile types. You could also expand the Sports Car rule and add more values on the left side of the operator (the if side of the equation). For example, you might call this instead a “Small Car rule” and include both sports cars and compact cars in this seating capacity restriction.

Other uses

Although we’ve limited our exploration to simply interrogating the value of the natural domains within a composite domain, this is by no means our only option for validation. For example, when dealing with string data you can inspect the length of the string, search for patterns, use regular expressions, and test for an empty string in addition to checking against the actual value. Shown below are some of the options you can use to query against a string value in a domain rule.

image

When dealing with date or numerical data, you have the expected comparison operators including less than, greater than, less than or equal to, etc.

Conclusion

This post has briefly explored composite domains and shown how to add validation rules to a composite domain in an existing knowledge base. In my next DQS post, I’ll continue with composite domains to illustrate a potential misunderstanding in the way composite domains treat value combinations in cleansing operations.

About the Author

Tim Mitchell
Tim Mitchell is a data architect and consultant who specializes in getting rid of data pain points. Need help with data warehousing, ETL, reporting, or training? If so, contact Tim for a no-obligation 30-minute chat.

1 Comment on "DQS Validation Rules on Composite Domains"

  1. Thanks Tim for your brief insight. My question is can we have sql query in domain rules by which validity is checked. For example, if my set of columns in same row repeats or set of columns might have some other underlying business functionality which is failing that can be identified by query. Hope you got my question..

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.