SSIS – table variables in the Data flow source

Let’s say you were doing a fact table load and wanted to do a dimension lookup.
   -Hah, easy you say, in SQL Server 2008 we have cached lookups, so I can join Staging.BusinessKey to Dimension.BusinessKey to get Dimension.Dimensionkey (surrogate key).
What happens when you want to apply something like a banding range, often used for Aged Debt (‘Not Due’, ‘0-90days’, ’90-120 days’…etc). In this situation you would need a “Staging.Value BETWEEN Dimension.BandMin and Dimension.BandMax” type join.
   -Hah, you say, the Lookup task allows for advanced joins so I can do this.
Great, untill you have to load 50million rows and you realise the ETL is going to take longer than building Sagrada Familia.
So one fairly reasonable approach may be to write some t-sql using a table variable looking something like:
DECLARE @DimAgeBand TABLE
    (DimAgeBandKey int
    ,BandMin int
    ,BandMax int
    ,BandDescription varchar(50)
    )
INSERT INTO @DimAgeBand
    (DimKey ,BandMin ,BandMax ,BandDescription )
    VALUES
     (1  ,-99999999 ,0          ,'Not Due')
    ,(2  ,0         ,90         ,'0-90 Days')
    ,(3  ,91        ,120        ,'90-120 Days')
    ,(4  ,121       ,180        ,'120-180 Days')
    ,(5  ,181       ,99999999   ,'120-180 Days')

 

SELECT
    DimAgeBandKey   = dab.DimAgeBandKey
    ,OverdueDays    = datediff(DD, staging.DueDate, GETDATE())
    ,Amount         = staging.Amount
FROM
    staging.Debtors stg
    INNER JOIN @DimAgeBand dab ON
        datediff(DD, staging.DueDate, GETDATE()) BETWEEN dab.BandMin AND dab.BandMax
You could even create the VALUES section of the INSERT statement dynamically by querying the Dimension table and constructing the Sourcesql as an expression.
All very well and good until you run it in SSIS and it returns 0 rows. Strange as all the rows came back in SSMS.
The reason for this is the query is returning two data streams and SSIS can only address the first one.
This problem can be resolved by issuing a SET NOCOUNT ON at the beginning of the query which “Stops the message that shows the count of the number of rows affected by a Transact-SQL statement or stored procedure from being returned as part of the result set.”
Advertisements

About Karl Beran

I take interest and satisfaction from enabling people within an organisation to have an accurate, concise and flexible representation of the right information they need, when (or before) they need it. I believe this allows the business to make both operational and strategic decisions quickly and with more confidence, thus improving the agility and efficiency of the company. Having worked with BI stacks from the major vendors, the focus of my blog is on BI principals and strategy with examples from the Microsoft stack as I believe it has the most dynamic and engaged user community with the knowledge base and components fairly readily available to new developers.

Posted on June 30, 2010, in SSIS. Bookmark the permalink. 1 Comment.

  1. Very nice article.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: