Populate your dimensions with Unknown Members

Often dimensions need a “Unknown” and/or “Not Applicable” members to be inserted for data where the full set of attributes for a fact record do not come from a source system (effectively failed dimension surrogate key lookups). These often have negative integer surrogate keys (while I have seen up to 10 different types of Unknown members in a dimension, this usually suggests some data quality issues, so one or two would normally suffice).

If you have many dimensions or they are quite wide with many data types, this can be a time consuming task. The below script will help you automate this task by creating a set of INSERT statements based on your dimension table meta data.

I contemplated about adding paramaters to determine how many members to generate (ie Unknown, Not Applicable, Early Arriving fact etc) , but thought it best to let you rerun the script and change those values in the @dataTypes  table variable.


SET NOCOUNT ON

DECLARE @crlf CHAR(2)

SET @crlf = char(13) + char(10)

DECLARE @UnknownSurrogateKeyValue AS VARCHAR(255)
DECLARE @SurrogateKeyName AS VARCHAR(255)
DECLARE @DimensionTableNamePrefix AS VARCHAR(255)

–Unknown member surrogate key value:
SET @UnknownSurrogateKeyValue = ‘-1’
–String within dimension surrogate key column name
SET @SurrogateKeyName = ‘key’
–Prefix for Dimension table names
SET @DimensionTableNamePrefix = ‘Dim’

–Create Table for default values
DECLARE @dataTypes AS TABLE
  (
     DATA_TYPE     VARCHAR(50)
     ,MIN_DESC     VARCHAR(255)
     ,MAX_DESC     VARCHAR(255)
     ,DEFAULT_DESC VARCHAR(255)
  )

INSERT INTO @dataTypes
            (DATA_TYPE
             ,MIN_DESC
             ,MAX_DESC
             ,DEFAULT_DESC)
VALUES
–Exact Numerics
( ‘bigint’
  ,‘-9223372036854775808’
  ,‘9223372036854775807’
  ,‘0’ ),
            ( ‘numeric’
              ,‘0’
              ,‘0’
              ,‘0’ ),
            ( ‘bit’
              ,‘0’
              ,‘1’
              ,‘0’ ),
            ( ‘smallint’
              ,‘-32768’
              ,‘32767’
              ,‘0’ ),
            ( ‘decimal’
              ,‘0’
              ,‘0’
              ,‘0’ ),
            ( ‘smallmoney’
              ,‘-214748.3648’
              ,‘214748.3647’
              ,‘0’ ),
            ( ‘int’
              ,‘-2147483648’
              ,‘2147483647’
              ,‘0’ ),
            ( ‘tinyint’
              ,‘0’
              ,‘255’
              ,‘0’ ),
            ( ‘money ‘
              ,‘-922337203685477.5808’
              ,‘922337203685477.5807’
              ,‘0’ )
–Approximate Numerics
,
            ( ‘float’
              ,‘0’
              ,‘0’
              ,‘0’ ),
            ( ‘real’
              ,‘0’
              ,‘0’
              ,‘0’ )
–Binary Strings
,
            ( ‘binary’
              ,‘0’
              ,‘1’
              ,‘0’ ),
            ( ‘varbinary’
              ,‘0’
              ,‘1’
              ,‘0’ )
–Other Data Types
,
            ( ‘sql_variant’
              ,‘0’
              ,‘1’
              ,‘0’ )
–Date and Time
,
            ( ‘date’
              ,‘0001-01-01’
              ,‘9999-12-31’
              ,‘1900-01-01’ ),
            ( ‘datetimeoffset’
              ,‘0001-01-01 00:00:00.0000000 +00:00’
              ,‘9999-12-31 23:59:59.9999999 +00:00’
              ,‘1900-01-01 00:00:00.0000000 +00:00’ ),
            ( ‘datetime2’
              ,‘0001-01-01 00:00:00.0000000’
              ,‘9999-12-31 23:59:59.9999999’
              ,‘1900-01-01 00:00:00.0000000’ ),
            ( ‘smalldatetime’
              ,‘1900-01-01 00:00:00’
              ,‘2079-06-06 23:59:59’
              ,‘1900-01-01 00:00:00’ ),
            ( ‘datetime’
              ,‘1753-01-01 00:00:00.000’
              ,‘9999-12-31 23:59:59.997’
              ,‘1900-01-01 00:00:00.000’ ),
            ( ‘time’
              ,’00:00:00.0000000′
              ,’23:59:59.9999999′
              ,’00:00:00′ )
–Character Strings
,
            ( ‘char’
              ,‘U’
              ,‘Unknown’
              ,‘UNK’ ),
            ( ‘varchar’
              ,‘U’
              ,‘Unknown’
              ,‘UNK’ ),
            ( ‘text’
              ,‘U’
              ,‘Unknown’
              ,‘UNK’ )
–Unicode Character Strings
,
            ( ‘nchar’
              ,‘U’
              ,‘Unknown’
              ,‘UNK’ ),
            ( ‘nvarchar’
              ,‘U’
              ,‘Unknown’
              ,‘UNK’ ),
            ( ‘ntext’
              ,‘U’
              ,‘Unknown’
              ,‘UNK’ )
–Binary Strings
,
            ( ‘image’
              ,
              ,
              , )
–Other Data Types
,
            ( ‘hierarchyid’
              ,‘/’
              ,‘/’
              ,‘/’ ),
            ( ‘uniqueidentifier’
              ,‘null’
              ,‘null’
              ,‘null’ ),
            ( ‘xml’
              ,
              ,
              , )

SELECT ‘INSERT INTO ‘ + t.TABLE_NAME + ‘ (‘ + @crlf
       –{ column_list }
       + replace(stuff(( SELECT ‘,’ + c.COLUMN_NAME FROM
       INFORMATION_SCHEMA.columns c
       WHERE c.COLUMN_NAME = COLUMN_NAME AND c.TABLE_NAME = t.TABLE_NAME ORDER
       BY
       c.ORDINAL_POSITION FOR XML PATH() ), 1, 1, ), ‘,’, ‘,’) + ‘) ‘ +
       @crlf +
       ‘VALUES ‘
       + @crlf + ‘(‘
       –{ VALUES }
       + replace(stuff(( SELECT ‘,”’ + CASE WHEN c.COLUMN_NAME LIKE ‘%’ +
       @SurrogateKeyName + ‘%’ THEN @UnknownSurrogateKeyValue WHEN
       CHARACTER_MAXIMUM_LENGTH > len(dt.DEFAULT_DESC) THEN dt.MAX_DESC WHEN
       CHARACTER_MAXIMUM_LENGTH < len(dt.DEFAULT_DESC) THEN dt.MIN_DESC ELSE
       DT.DEFAULT_DESC END + ”” FROM INFORMATION_SCHEMA.columns c INNER JOIN
       @dataTypes dt ON
       c.DATA_TYPE = dt.DATA_TYPE WHERE c.COLUMN_NAME = COLUMN_NAME AND
       c.TABLE_NAME =
       t.TABLE_NAME ORDER BY c.ORDINAL_POSITION FOR XML PATH() ), 1, 1, ),
       ‘,’, ‘,’
       ) + ‘ ); ‘
FROM   INFORMATION_SCHEMA.tables t
WHERE  t.TABLE_TYPE = ‘BASE TABLE’
       AND t.TABLE_NAME LIKE @DimensionTableNamePrefix + ‘%’ 


Advertisements

About Karl Beran

I take interest and satisfaction from enabling people within an organisation to have an accurate, concise and flexible representation of the right information they need, when (or before) they need it. I believe this allows the business to make both operational and strategic decisions quickly and with more confidence, thus improving the agility and efficiency of the company. Having worked with BI stacks from the major vendors, the focus of my blog is on BI principals and strategy with examples from the Microsoft stack as I believe it has the most dynamic and engaged user community with the knowledge base and components fairly readily available to new developers.

Posted on August 18, 2012, in SQL. Bookmark the permalink. 2 Comments.

  1. Hi Karl,
    nice solution for these kind of data quality issues, facing similar problems and produced a similar function, just not that dynamically.
    Great Article Thanks.

    Kind Regards

    Like

  1. Pingback: Populate your dimensions with Unknown Members | Zaim Raza on MS BI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: