DWU: UNDERSTANDING MEMORY ALLOCATION IN AZURE SQL DATA WAREHOUSE

At the start of a recent investigation into utilising Microsoft’s cloud data warehouse service, Azure SQL Data Warehouse (general page HERE), I jumped right in to provisioning a service, creating tables and loading data.  After this initial jump start I took a step back to understand the DWU concept.  The following is information from the Concurrency and Workload Management documentation HERE.

DWUs, or Data Warehouse Units, are the concept behind how Azure SQL Data Warehouse allocates compute resources.  When provisioning an Azure SQL Data Warehouse (ASDW) there are currently 12 DWU tiers to choose from.  These tiers range in price and for comparison are in the following list, I’ve chosen 3 regions to show DWU compute prices per tier in GBP, USD and EUR as at 27th September 2016.  Prices for all available regions is HERE.  Please note that storage costs are a separate cost and are outlined in the Storage section of the pricing page.

Price Per Hour (Per Month)
Tier £ UK South $ West US € West Europe
DWU100 1.02 (755.99) 1.51 (1125.00) 1.28 (948.71)
DWU200 2.03 (1,511.97) 3.02 (2,250.00) 2.55 (1,897.43)
DWU300 3.05 (2,267.96) 4.54 (3,375.01) 3.83 (2,846.14)
DWU400 4.06 (3,023.94) 6.05 (4,500.01) 5.10 (3,794.86)
DWU500 5.08 (3,779.93) 7.56 (5,625.01) 6.38 (4,743.57)
DWU600 6.10 (4,535.91) 9.07 (6,750.01) 7.65 (5,692.29)
DWU1000 10.16 (7,559.86) 15.12 (11,250.02) 12.75 (9,487.15)
DWU1200 12.19 (9,071.83) 18.15 (13,500.03) 15.30 (11,384.57)
DWU1500 15.24 (11,339.79) 22.68 (16,875.04) 19.13 (14,230.72)
DWU2000 20.32 (15,119.72) 30.24 (22,500.05) 25.50 (18,974.29)
DWU3000 30.48 (22,679.57) 45.36 (33,750.07) 38.25 (28,461.44)
DWU6000 60.97 (45,359.15) 90.73 (67,500.14) 76.51 (56,922.87)

A DWU100 running for 1 year in the UK South region would cost £9,071.88 whilst a DWU6000 in the same region would cost £544,309.80.  A significant cost difference, but as the DWU tier can be reconfigured at any point in the use of the service and takes no longer than a few minutes to re-provision, different tiers can be used throughout the day depending on the workload.  The workload splits would typically be loading data and querying.

Concurrency Limits Per Tier

At each DWU tier there are limits to the concurrent queries and concurrent slots that are allocated.  There is a one-to-many relationship between concurrent queries and concurrent slots in that a single query can consume many slots, depending on the DWU tier.  The concept is taken further with the use of resource classes, of which there are 4:

  • smallrc (default)
  • mediumrc
  • largerc
  • xlargerc

A resource class is assigned to a user login and depending on the DWU tier can take 1 or many slots.  There is a maximum of 32 concurrent queries and 240 concurrent slots available.  Please note that ASDW accepts 1,024 concurrent connections, any queries not running will be queued.

The following table shows:

  • The total memory in GB allocated to each tier
  • The total memory in GB allocated to each resource class in each tier
  • Maximum concurrent queries and slots
Memory Allocation Per System GB Concurrency Slots Used Per Resource Class Total Memory GB (Est)
DWU smallrc mediumrc largerc xlargerc Queries Slots smallrc mediumrc largerc xlargerc
100 6 6 12 23 4 4 1 1 2 4 24
200 6 12 23 47 8 8 1 2 4 8 48
300 6 12 23 47 12 12 1 2 4 8 72
400 6 23 47 94 16 16 1 4 8 16 96
500 6 23 47 94 20 20 1 4 8 16 120
600 6 23 47 94 24 24 1 4 8 16 144
1000 6 47 94 188 32 40 1 8 16 32 235
1200 6 47 94 188 32 48 1 8 16 32 282
1500 6 47 94 188 32 60 1 8 16 32 355
2000 6 94 188 375 32 80 1 16 32 64 469
3000 6 94 188 375 32 120 1 16 32 64 705
6000 6 188 375 750 32 240 1 32 64 128 1,409

Total Memory GB (Est) was calculated by multiplying the slots available with the memory consumption per slot.  E.g DWU100 has 4 slots available, a smallrc resource class has 6GB allocated and uses 1 slot therefore 4 x 6 = 24GB. DWU1000 has 40 slots available therefore 5 mediumrc at 47GB each = 235GB.  It’s a simplistic calculation but gives an estimate of total system memory available at each tier.

ASDW achieves an MPP (massively parallel processing) architecture by distributing the workload across 60 distributions, therefore to calculate the memory allocated to each distribution across the tiers and resource classes, divide the memory allocated by 60.  For example, DWU100 for a smallrc resource class user will allocate 100MB to each distribution (6GB / 60), whereas at DWU6000, an xlargerc resource class user would have 12.5GB allocated to each distribution (750GB / 60).

Scenarios

As we can now see the memory resources available at each tier and within each resource class, we are able to calculate how many concurrent queries can be run.  For example we look at the total number of slots available per tier and then calculate how many slots each resource classes uses at that tier.  Note that we cannot exceed 32 concurrent queries running.

At DWU100 – 4 slots available:

  • 4 concurrent queries run by resource class smallrc
  • or 4 concurrent queries run by resource class mediumrc
  • or 2 concurrent queries run by resource class largerc
  • or 1 concurrent query run by resource class xlargerc

At DWU 400 – 16 slots available:

  • 16 concurrent queries run by resource class smallrc
  • or 4 concurrent queries run by resource class mediumrc
  • or 2 concurrent queries run by resource class largerc
  • or 1 concurrent query run by resource class xlargerc

At DWU2000 – 80 slots available:

  • 32 concurrent queries run by resource class smallrc
  • or 18 concurrent queries run by 16 resource class smallrc PLUS 2 largerc
  • or 20 concurrent queries run by 16 resource class smallrc PLUS 4 mediumrc
  • or 2 concurrent queries run by 1 resource class mediumrc PLUS 1 xlargerc

We can mix and match resource classes to fit with our total concurrent slot allocation and our workload requirements.  If we need to change DWU tiers to suit a particular workload requirement then we can do that then switch back to a more appropriate tier.  We also have the advantage of being able to pause compute and incur no charge, however the ASDW will not be accessible.  For example:

dwutiers

Query Importance

There are 2 importance allocations – medium and high.  Any query which uses 16 concurrent slots or more is considered high importance and is given 3 times more CPU cycles than a medium importance query.  For example, it’s not until DWU400 that importance allocation is considered as this is the first tier to use 16 slots.

  • smallrc – never given high importance
  • mediumrc – given high importance at DWU tiers 2000 and above
  • largerc – given high importance at DWU tiers 1000 and above
  • xlargerc – given high importance at DWU tiers 400 and above

I hope the above information has been useful as I certainly found understanding the DWU tiers and the concurrent query/slots concepts very useful in understanding what resources are being allocated.  Thoughts, comments and clarifications welcome.

Advertisements

CREATE A SSAS TABULAR 2016 MODEL USING THE TABULAR OBJECT MODEL AND C#

With SQL Server 2016, programmatically creating and managing Analysis Services Tabular models has been enhanced with the Tabular Object Model (TOM).  TOM can be used to create models, tables, columns, measures, hierarchies, relationships, security and also trigger data refreshes.  Why would we want to do this? In a word – Automation.  TOM allows us to create/modify Tabular models and deploy to the server from code.

There’s more info on MSDN HERE.  The class references are available HERE.  For reference, the following tutorial is an extension of an MSDN article available HEREthe changes and additions are:

  • Uses the new WideWorldImportersDW SQL Server 2016 example data warehouse database
  • Adds Measures, Hierarchies and Relationships to enhance the model

Download Visual Studio 2015 C# Tutorial Project

  • GitHub: Repository HERE.
  • Direct Download: Zip file HERE.

Tutorial Overview

I set out to create a Tabular model using C# and TOM that featured all the functionality I would expect such as Measures, Hierachies and Relationships.  I took the MSDN tutorial and extended it using the new WideWorldImportsDW database.  This was a challenge as I consider myself a C# “noob” and I hope this proves useful for others looking to do this too.  The next logical step will be to metadata-drive the creation of the objects, however at this stage I feel it’s important to get to grips with the TOM itself.

A big thank you to William Relf https://twitter.com/relfw, who is my colleague, Data Platform expert and C# whizz.

The example model includes

  • 2 Dimension tables
  • 1 Fact table
  • 2 Relationships between the Fact and Dimension tables
  • 3 SUM measures
  • 1 display folder
  • 1 hierarchy within the Date Dimension (Year-Month-Date)

The image below shows a visual representation of the model we’ll create.

tom_02

Prerequisites

  • SQL Server 2016 Database Engine and Analysis Services Tabular installed (Developer Edition is free HERE)
  • Visual Studio 2015 (the free Community Edition can be used HERE)
  • WideWorldImportersDW SQL database restored onto the SQL Server 2016 Database Engine instance (direct download HERE.  GitHub page HERE)

Download & Install the Libraries

  1. Download and install the SQL_AS_AMO.msi library from HERE.
  2. Verify the install by ensuring the following files are in C:\Program Files\Microsoft SQL Server\130\SDK\Assemblies
    • Microsoft.AnalysisServices.Core.dll
    • Microsoft.AnalysisServices.dll
    • Microsoft.AnalysisServices.Tabular.dll
    • Microsoft.AnalysisServices.Tabular.Json.dll

Creating a blank C# Console Application and Adding the Libraries

  1. Open Visual Studio and create a new Visual C# Console Application called TOM WideWorldImporters
  2. Add a reference to the TOM libraries:
    1. Right-click the References folder in Solution Explorer and click Add Reference
    2. Click Browse and go to the folder C:\Program Files\Microsoft SQL Server\130\SDK\Assemblies
    3. Select the following 3 DLL files
      1. Microsoft.AnalysisServices.Core
      2. Microsoft.AnalysisServices.Tabular
      3. Microsoft.AnalysisSerivces.Tabular.Json
    4. Click Add then OK
    5. If added successfully you should see the 3 libraries appear in the References folder
  3. Ensure that the following code is added to the namespace section at the top of the Program.cs file:
using Microsoft.AnalysisServices;
using Microsoft.AnalysisServices.Tabular;


Adding a Connection to the Tabular Server

We’re now ready to start adding C# code to connect to our Tabular server and create objects.  Please note that all C# code is added between the parenthesis in the Main class within the Program.cs file:

static void Main(string[] args)
    {
        // ALL FOLLOWING CODE GOES HERE
    }


Create a Connection to the Tabular Server.  Replace localhost with the name of your Tabular instance.

string ConnectionString = "DataSource=localhost";
    using (Server server = new Server())
    {
        server.Connect(ConnectionString);

        // ALL FOLLOWING CODE GOES HERE
    }

Defining a new Tabular Model

This will check if the database Wide World Imports exists on the server already, if it does then it’ll append an incremented number and still allow the database to be created.

string newDB = server.Databases.GetNewName("Wide World Importers");
    var blankdatabase = new Database()
    {
        Name = newDB,
        ID = newDB,
        CompatibilityLevel = 1200,
        StorageEngineUsed = StorageEngineUsed.TabularMetadata,
    };

blankdatabase.Model = new Model()
    {
        Name = "Wide World Importers Model",
        Description = "The Wide World Imports Tabular data model at the 1200 compatibility level."
    };

Defining a Tabular Data Source

This provides the Tabular model with a connection to the data source which contains the tables and data needed to populate our model.  Note the ImpersonationMode setting, in this instance I’m using the service account that the Tabular service is running under, by default that is NT Service\MSSQLServerOLAPService.  This service account has been added as a login to the SQL Server database engine instance and granted read permissions on the WideWorldImportersDW database.

blankdatabase.Model.DataSources.Add(new ProviderDataSource()
{
    Name = "WideWorldImportersDW_Source",
    ConnectionString = "Provider=SQLNCLI11;Data Source=localhost;Initial Catalog=WideWorldImportersDW;Integrated Security=SSPI;Persist Security Info=false",
    ImpersonationMode = Microsoft.AnalysisServices.Tabular.ImpersonationMode.ImpersonateServiceAccount,
 });

Defining Columns

We now define each of our columns used in our model, for each column we need the Name, DataType and SourceColumn properties to be populated.  The use of the column will depend on the other properties that need setting.  For example, each dimension table has a unique key column, this is represented by setting the IsUnique property to true.

Note that the Date_Month column is sorted by the Date_MonthNumber column, this ensures that the month names are sorted by calendar month and not alphabetically.

The available column properties are HERE.

 //COLUMN DEFINITIONS
 //DIMENSION TABLES
 //DATE
 DataColumn Date_Date = new DataColumn()
 {
     Name = "Date",
     DataType = DataType.DateTime,
     SourceColumn = "Date",
     IsUnique = true,
     FormatString = "yyyy-mm-dd",
 };

 DataColumn Date_MonthNumber = new DataColumn()
 {
     Name = "Month Number",
     DataType = DataType.Int64,
     SourceColumn = "MonthNumber",
     IsHidden = true,
 };

 DataColumn Date_Month = new DataColumn()
 {
     Name = "Month",
     DataType = DataType.String,
     SourceColumn = "Month",
     SortByColumn = Date_MonthNumber
 };

 DataColumn Date_Year = new DataColumn()
 {
     Name = "Year",
     DataType = DataType.String,
     SourceColumn = "Year",
 };

 //EMPLOYEE
 DataColumn Employee_EmployeeKey = new DataColumn()
 {
     Name = "Employee Key",
     DataType = DataType.Int64,
     SourceColumn = "EmployeeKey",
     IsHidden = true,
     IsUnique = true,
 };

 DataColumn Employee_EmployeeName = new DataColumn()
 {
     Name = "Employee Name",
     DataType = DataType.String,
     SourceColumn = "Employee",
 };

 DataColumn Employee_WWIEmployeeID = new DataColumn()
 {
     Name = "WWI Employee ID",
     DataType = DataType.Int64,
     SourceColumn = "WWIEmployeeID",
     SummarizeBy = AggregateFunction.None,
 };

 DataColumn Employee_IsSalesPerson = new DataColumn()
 {
     Name = "Is Sales Person",
     DataType = DataType.String,
     SourceColumn = "IsSalesPerson",
 };

 //FACT TABLE
 //ORDER
 DataColumn Order_SalesPersonKey = new DataColumn()
 {
     Name = "Sales Person Key",
     DataType = DataType.Int64,
     SourceColumn = "SalesPersonKey",
     IsHidden = true,
 };

 DataColumn Order_Date = new DataColumn()
 {
     Name = "Date",
     DataType = DataType.DateTime,
     SourceColumn = "OrderDateKey",
     IsHidden = true,
 };

 DataColumn Order_TotalExcludingTax = new DataColumn()
 {
     Name = "TotalExcludingTax",
     DataType = DataType.Decimal,
     SourceColumn = "TotalExcludingTax",
     IsHidden = true,
 };

 DataColumn Order_TaxAmount = new DataColumn()
 {
     Name = "TaxAmount",
     DataType = DataType.Decimal,
     SourceColumn = "TaxAmount",
     IsHidden = true,
 };

 DataColumn Order_TotalIncludingTax = new DataColumn()
 {
     Name = "TotalIncludingTax",
     DataType = DataType.Decimal,
     SourceColumn = "TotalIncludingTax",
     IsHidden = true,
 };

Defining Hierarchies

By using the Hierarchy class we’re able to define a 3 level hierarchy called Calendar Date which uses the Year, Month and Date columns.  The Ordinal property sets the order in which the columns are placed, therefore Year-Month-Date is represented as 0-1-2.

 Hierarchy H1 = new Hierarchy()
 {
     Name = "Calendar Year",
 };

 H1.Levels.Add(new Level()
 {
     Column = Date_Year,
     Ordinal = 0,
     Name = Date_Year.Name
 });

 H1.Levels.Add(new Level()
 {
     Column = Date_Month,
     Ordinal = 1,
     Name = Date_Month.Name
 });

 H1.Levels.Add(new Level()
 {
     Column = Date_Date,
     Ordinal = 2,
     Name = Date_Date.Name
 });


Defining Tables

We now define the 3 tables used in our model, note that we are using the column and hierarchy names previously defined.  The Orders table includes Measures, these measures are DAX calculations.

 //TABLES -------------------
 //DATE
 blankdatabase.Model.Tables.Add(new Table()
{
    Name = blankdatabase.Model.Tables.GetNewName("Date"),
    Partitions =
    {
        new Partition()
        {
            Name = "All Dates",
            Source = new QueryPartitionSource()
            {
                DataSource = blankdatabase.Model.DataSources["WideWorldImportersDW_Source"],
                Query = @"SELECT Date,Month,[Calendar Month Number] as MonthNumber,[Calendar Year] as Year
                          FROM Dimension.Date",
            }
        }
    },
    Columns =
    {
        Date_Date,
        Date_Year,
        Date_Month,
        Date_MonthNumber
    },
    Hierarchies =
    {
        H1
    }
});

//EMPLOYEE
blankdatabase.Model.Tables.Add(new Table()
{
    Name = blankdatabase.Model.Tables.GetNewName("Employee"),
    Partitions =
    {
        new Partition()
        {
            Name = "All Employees",
            Source = new QueryPartitionSource()
            {
                DataSource = blankdatabase.Model.DataSources["WideWorldImportersDW_Source"],
                Query = @"SELECT [Employee Key] as EmployeeKey,Employee,[WWI Employee ID] as WWIEmployeeID,CASE [Is Salesperson] WHEN 1 THEN 'Yes' ELSE 'No' end as IsSalesPerson
                          FROM Dimension.Employee",
            }
        }

    },
    Columns =
    {
        Employee_EmployeeKey,
        Employee_EmployeeName,
        Employee_WWIEmployeeID,
        Employee_IsSalesPerson
    }
});

//ORDERS
blankdatabase.Model.Tables.Add(new Table()
{
    Name = blankdatabase.Model.Tables.GetNewName("Orders"),
    Partitions =
    {
        new Partition()
        {
            Name = "All Orders",
            Source = new QueryPartitionSource()
            {
                DataSource = blankdatabase.Model.DataSources["WideWorldImportersDW_Source"],
                Query = @"SELECT [Salesperson Key] as SalesPersonKey,[Order Date Key] as OrderDateKey,[Total Excluding Tax] as TotalExcludingTax
                          ,[Tax Amount] as TaxAmount,[Total Including Tax] as TotalIncludingTax
                          FROM Fact.[Order]",
            }
        }
    },
    Columns =
    {
        Order_SalesPersonKey,
        Order_Date,
        Order_TotalExcludingTax,
        Order_TaxAmount,
        Order_TotalIncludingTax
    },
    Measures =
    {
        new Measure()
        {
            Name = "Total Excluding Tax",
            Expression = "SUM('Orders'[TotalExcludingTax])",
            FormatString = "#,###.##",
        },
        new Measure()
        {
            Name = "Tax Amount",
            Expression = "SUM('Orders'[TaxAmount])",
            FormatString = "#,###.##",
            DisplayFolder = "Tax",
        },
        new Measure()
        {
            Name = "Total Including Tax",
            Expression = "SUM('Orders'[TotalIncludingTax])",
            FormatString = "#,###.##",
        },
    }
});

Defining Relationships

As the Orders table is related to both the Date and Employee table, we need to add relationships to the model.  The relationship cardinality is set consistent with Many-To-One relationships (Fact-to-Dimensions).

SingleColumnRelationship relOrderToDate = new SingleColumnRelationship()
{
    Name = "Order_Date_Date_Date",
    ToColumn = Date_Date,
    FromColumn = Order_Date,
    FromCardinality = RelationshipEndCardinality.Many,
    ToCardinality = RelationshipEndCardinality.One
};

blankdatabase.Model.Relationships.Add(relOrderToDate);

SingleColumnRelationship relOrderToEmployee = new SingleColumnRelationship()
{
    Name = "Order_EmployeeKey_Employee_EmployeeKey",
    ToColumn = Employee_EmployeeKey,
    FromColumn = Order_SalesPersonKey,
    FromCardinality = RelationshipEndCardinality.Many,
    ToCardinality = RelationshipEndCardinality.One
};

blankdatabase.Model.Relationships.Add(relOrderToEmployee);

Deploying the Tabular Model to the Server

At this point the model has not been deployed to the server yet, this next step does just that when the program is eventually executed.

try
{
    server.Databases.Add(blankdatabase);
    blankdatabase.Update(UpdateOptions.ExpandFull);

    Console.WriteLine("Deployed to server successfully");
}
catch
{
    Console.WriteLine("Deployed to server failed");
    return;
} 


Loading Data into the Tabular Model

Once the model has been deployed, a data refresh operation is requested and executed.

blankdatabase.Model.RequestRefresh(Microsoft.AnalysisServices.Tabular.RefreshType.Full); //request data refresh
blankdatabase.Update(UpdateOptions.ExpandFull); //execute data refresh

Console.WriteLine("Data loaded...");


Output the Objects from the Tabular Model

We now output the objects created.

Console.Write("Database ");
Console.ForegroundColor = ConsoleColor.Green;
Console.Write(blankdatabase.Name);
Console.ResetColor();
Console.WriteLine(" created successfully.");

Console.WriteLine("The data model includes the following table definitions:");
Console.ForegroundColor = ConsoleColor.Yellow;
foreach (Table tbl in blankdatabase.Model.Tables)
{
    Console.WriteLine("\tTable name:\t\t{0}", tbl.Name);
    //Console.WriteLine("\ttbl description:\t{0}", tbl.Description);

    foreach (Measure measures in tbl.Measures)
    {
        Console.WriteLine("\tMeasure name:\t\t{0}", measures.Name);
    }

    foreach (Column columns in tbl.Columns)
    {
        Console.WriteLine("\tColumn name:\t\t{0}", columns.Name);
    }

    foreach (Hierarchy hierarchy in tbl.Hierarchies)
    {
        Console.WriteLine("\tHierachy name:\t\t{0}", hierarchy.Name);
    }
}

foreach (Relationship relationship in blankdatabase.Model.Relationships)
{
    Console.WriteLine("\tRelationship name:\t\t{0}", relationship.Name);
}
Console.ResetColor();
Console.WriteLine();

Console.WriteLine("Press Enter to close this console window.");
Console.ReadLine();

Run the Project

Now run the program (click Start or press F5) and the console should appear with messages indicating the actions performed.

Browse the Tabular Model

We can now connect to the model using a client tool such as Excel or Power BI Desktop and browse the objects.  Power BI Desktop is a free download and available HERE.

tom_03

Browsing the Tabular model using Power BI Desktop

Next Steps

Now that we can create Tabular objects in this “version 1” project, a number of improvements present themselves:

  • Use meta-data to drive the creation of objects, foreach loops would reduce the overall code size and enable dynamic object creation
  • Possible use of SQL Server Extended Properties to drive the metadata
  • Separate the Create and Load logic
  • Error handling

Comments, suggestions and critique is encouraged and I thank you if you have read this far!

THE IMPORTANCE OF DATA MANAGEMENT

THE IMPORTANCE OF DATA MANAGEMENT

In the New Year, instead of diving right back into the development/coding of data solutions I thought I’d take a step back from technology.  Why?  To again focus on a set of principles designed to aid the process of developing and managing data: Data Management.  With the current advances of Data/Big Data & Business Intelligence technologies and the sharp focus on delivering “breakthrough insights” to information-hungry organisations, it has become even more important to harness the power of effective Data Management.

Without a Data Management policy the following issues can affect an organisations ability to remain competitive:

  • Systems designed and deployed in isolation have great difficulty in being integrated
  • Information cannot be shared easily between systems if the semantics of these systems are different
  • Without data sharing and accessibility, people within the organisation may not have the information they need to carry out their role effectively
  • The same data is created and saved repeatedly causing data duplication and confusion

At the beginning of 2012 I had the opportunity to study under the tutelage of DAMA UK’s Keith Gordon in Data Management.  The BCS book Principles of Data Management that accompanied the course is freely available for purchase and I fully recommend it.  The principles outlined within the book are very much geared to tackling practical real-world issues, having been developed and tested after many years on the front-line.

datamanagement

I’ve cherry picked a few core areas that the book covers, please be aware that certain areas are enormous subjects in their own right and can’t be completely covered in a single chapter.

Data & The Enterprise

Shows us that Data is a key asset and a resource to be shared and collaborated with across the organisation.  For information to be accurately and timely delivered, data must be properly managed.

Database Development, Data Modelling & Corporate Data Modelling

Before diving right in and creating data structures and filling with vast amounts of data, we must first model the data and ask questions about the validity of that model.  Do attributes share common themes?  Can these common themes be re-used in data models across the organisation?

Data Definition & Naming Conventions

When creating data structures, it’s important to define common and shared definitions, this eases data integration.  This is especially difficult when buying off-the-shelf software but may help to identify integration problems before they occur.

Data Quality

If an organisation wants to derive insights and useful, actionable information from data then the quality of that data is paramount.  Missing or incorrect values will all skew the value of that piece of data.

Resources needed for Data Management

What skills and technology are needed to install a data management function within an organisation?  Do you need database developers, database administrators, data stewards?

What is clear from this book is that the principles outlined do highlight a fundamental fact that to “win at data” it’s very much a community effort from all members of an organisation.  From Directors, IT Managers, Developers to Systems Users all play a role in delivering an effective Data Management initiative.

New Kimball Group Data Warehousing Toolkit book for SQL Server 2008 R2

Wait for it…not released yet but bound to be a good read and I’ll definitely be buying this.  Worth it if you already work with data warehousing and upgrading to/working with SQL Server 2008 R2, essential if you’re just starting out with DW.

The Microsoft Data Warehouse Toolkit: With SQL Server 2008 R2

Synonyms and their incredible usefulness

Whilst working on a recent project which included quite a few Linked Server calls within several Stored Procedures, I looked at using SQL Server Synonyms.  The problem with linked servers in SQL queries is that if the linked server changes, E.G from Development to Test then to Live, the SQL query must also be updated and you may end up with multiple versions of the same piece of code.  You can of course use Dynamic SQL but I’m sure many of us don’t wish to do so.  An easier way is to create a Synonym on each server (development, test, live etc) and reference the Synonym name (for example “OrdersTable”) in your SQL query.  This way when you come to deploy your SQL code onto each server you will not have to change any of the actual SQL code.  Of course you will have multiple SQL scripts to create Synonyms but since it’s a small piece of code the overhead is far less than multiple versions of your actual SQL code.

Creating a Synonym to a Linked Server table:

CREATE SYNONYM [dbo].[OrdersTable] FOR [LinkedServer].[YourDatabase].[dbo].[OrdersTable]

Designing effective dashboards

The powers that be love dashboards, I mean they LOVE dashboards.  Now while showing as many different gauges, metrics, bar graphs and traffic lights may seem at first to be the way to your company CEOs heart it may also be your undoing if you don’t follow some simple steps to ensure what you show is useful, easy to read and even actionable.

Read the excellent Common Pitfalls of Dashboard Design document from Stephen Few here:

Download PDF document

ProClarity Dashboard

ProClarity Dashboard is a piece of technology I’ve been itching to use for the last couple of years now,  I’m sure we can make great use of it even if official support of ProClarity products will end in 2012.  Due to the “underground” nature of database development it’s rare that the business gets to see an end product on their screen and it’s nice to be able to involve the database team in these kinds of projects.

ProClarity Dashboard example, this is actually showing Reporting Services information about number of report requests, top users, success rate etc: