Category Archives: SSIS

Currency Conversion – Part 1 – Principles (ETL & DW)

Currency conversion seems to needlessly cause issues for accountants, developers, managers and analysts. So I thought I would put together a short series of posts to breakdown and simplify the concepts.

First up are some basic principles I have learnt to adhere to, so here goes:

1. Capture and use daily rates:

Period (weekly, monthly, quarterly, annual etc) closing rates or average rates will lead to incorrect results. Some more astute accountants will argue period closing rates for  Profit and Loss items and period average rates for Balance Sheet items are perfectly acceptable. I’m sorry but they aren’t! With major currencies fluctuating by as much as 14% during a month (Sep 2011 CHFUSD which caught out this guy to the tune of $2.3 billion) it isn’t going to do much for faith in the accuracy of your DW numbers if you build in this level of inaccuracy, particularly for P&L items with cost or revenue not evenly distributed by value throughout the period as is often the case for products like insurance premiums, power/gas/phone bills, salary payments, pension transactions, credit card payments… you get the idea, as this will magnify any fluctuations in the rate.

2. Always multiply:

Source currency –> Destination currency should be a multiplication operation. The reason for this one is quite simple (or maybe that’s because I’m quite simple!) -it’s easier to do multiplication in your head than division. So the rate for Source = ‘GBP’ to Destination = ‘USD’ should be apprx 1.6 (not 0.625).

3. Don’t use a money data type:

Yes it can save storage space, but the rounding errors resulting from implicit conversion will cause you headaches. Decimal is your friend

4. Store one-many rates in your relational fact table:

I’ve tested this on fairly large volumes and you’re usually better off to get many-many rates by doing the triangulation on the fly in a view (similar to that below), providing your currency fact table is indexed properly. Last check there were 182 current official currencies, so you could either store a rate for each currency into your base currency (say ‘USD’ or maybe ‘CNY’ in the years to come…) which would mean 66,430 rows per year in your currency fact table (182 currencies x 365 days). If you were to fully triangulate that you’d end up with 12,090,260 rows (182 currencies x 182 currencies x 365 days). Pretty obvious what will perform better. This of course assumes that you have a need for many-many currency conversion which isn’t often the case, most organisations may measure across many original source currencies and report in only three or four reporting currencies. If this is the case you can restrict your views to a limited set of “To currency” keys.

5. Use sufficient scale data type to retain accuracy:

Some currencies aren’t particularly strong, so if you’re base currency is relatively strong (USD, GBP, EUR, AUD, NZD, CAD, KWD etc) and you are adhering to rule 2 above, you must have sufficient scale in the decimal data type to store all the decimal places required as indicated in the table below.

Currency codes/names

US Dollar

Sao Tome/Principe Dobra

Somali Shilling

Turkmenistan Old Manat

Vietnamese Dong

USD

1

0.00005051

0.0006341

0.00007

0.00004798

STD

20000

1

12.6823

1.403508

0.9597

SOS

1667

0.08419

1

0.1169824

0.07999

TMM

14250

0.7195

9.035

1

0.684

VND

20850

1.053

13.2213

1.463158

1

 

In the next post I’ll cover principles within Analysis Services Multidimensional


SELECT fcr.CurrencyKey                       AS FromCurrencyKey
       ,fcr2.CurrencyKey                     AS ToCurrencyKey
       ,fcr.DateKey
       ,fcr.AverageRate / fcr2.AverageRate   AS AverageRate
       ,fcr.EndOfDayRate / fcr2.EndOfDayRate AS EndOfDayRate
       ,fcr.Date
FROM   factcurrencyrate fcr
       INNER JOIN dimcurrency dcFrom
               ON fcr.CurrencyKey = dcFrom.CurrencyKey
       INNER JOIN factcurrencyrate fcr2
               ON fcr.DateKey = fcr2.DateKey
       INNER JOIN dimcurrency dcTo
               ON fcr2.CurrencyKey = dcTo.CurrencyKey 

SSIS – table variables in the Data flow source

Let’s say you were doing a fact table load and wanted to do a dimension lookup.
   -Hah, easy you say, in SQL Server 2008 we have cached lookups, so I can join Staging.BusinessKey to Dimension.BusinessKey to get Dimension.Dimensionkey (surrogate key).
What happens when you want to apply something like a banding range, often used for Aged Debt (‘Not Due’, ‘0-90days’, ’90-120 days’…etc). In this situation you would need a “Staging.Value BETWEEN Dimension.BandMin and Dimension.BandMax” type join.
   -Hah, you say, the Lookup task allows for advanced joins so I can do this.
Great, untill you have to load 50million rows and you realise the ETL is going to take longer than building Sagrada Familia.
So one fairly reasonable approach may be to write some t-sql using a table variable looking something like:
DECLARE @DimAgeBand TABLE
    (DimAgeBandKey int
    ,BandMin int
    ,BandMax int
    ,BandDescription varchar(50)
    )
INSERT INTO @DimAgeBand
    (DimKey ,BandMin ,BandMax ,BandDescription )
    VALUES
     (1  ,-99999999 ,0          ,'Not Due')
    ,(2  ,0         ,90         ,'0-90 Days')
    ,(3  ,91        ,120        ,'90-120 Days')
    ,(4  ,121       ,180        ,'120-180 Days')
    ,(5  ,181       ,99999999   ,'120-180 Days')

 

SELECT
    DimAgeBandKey   = dab.DimAgeBandKey
    ,OverdueDays    = datediff(DD, staging.DueDate, GETDATE())
    ,Amount         = staging.Amount
FROM
    staging.Debtors stg
    INNER JOIN @DimAgeBand dab ON
        datediff(DD, staging.DueDate, GETDATE()) BETWEEN dab.BandMin AND dab.BandMax
You could even create the VALUES section of the INSERT statement dynamically by querying the Dimension table and constructing the Sourcesql as an expression.
All very well and good until you run it in SSIS and it returns 0 rows. Strange as all the rows came back in SSMS.
The reason for this is the query is returning two data streams and SSIS can only address the first one.
This problem can be resolved by issuing a SET NOCOUNT ON at the beginning of the query which “Stops the message that shows the count of the number of rows affected by a Transact-SQL statement or stored procedure from being returned as part of the result set.”

SSIS vs t-sql datetime datatypes

Why there are such glaring differences and conflicts between t-sql data types and SSIS data types I just don’t understand. Here is one that may trip you up when casting a DateTime as int (yes I know casting a date as an int is silly, but there are legacy issues here).

SSIS cast as int works differently to t-sql cast as int due to different base dates as demonstrated:

t-sql has a base date of 1900-01-01 00:00:00 while SSIS has a base date of 1899-12-30

Create four variables as below:

image

Set varIntFromDate to EvaluateAsEExpression = True, then enter the following expression

image

Note it evaluates to 40268

Set varIntFromDateTime to EvaluateAsEExpression = True, then enter the following expression

image

Note it evaluates to 40269

Problem: There appears to be some sort of rounding going on despite both variables having the same date (but different times).

Now do the same thing in t-sql:

DECLARE @varDate as datetime
DECLARE @varDateTime as datetime

SET @varDate     = '20100331'
SET @varDateTime = '20100331 18:50'

SELECT    
  @varDate as varDate    
 ,@varDateTime  as varDateTime     
 ,cast(@varDate as int)  as varIntFromDate    
 ,cast(@varDateTime as int)   as varIntFromDateTime

 

to get the following results:
image

This time getting 40266 and 40267. I don’t know about you but last time I checked nothing from the list 40268, 40269, 40266, 40267 are the same.

%d bloggers like this: