DWU: UNDERSTANDING MEMORY ALLOCATION IN AZURE SQL DATA WAREHOUSE

At the start of a recent investigation into utilising Microsoft’s cloud data warehouse service, Azure SQL Data Warehouse (general page HERE), I jumped right in to provisioning a service, creating tables and loading data.  After this initial jump start I took a step back to understand the DWU concept.  The following is information from the Concurrency and Workload Management documentation HERE.

DWUs, or Data Warehouse Units, are the concept behind how Azure SQL Data Warehouse allocates compute resources.  When provisioning an Azure SQL Data Warehouse (ASDW) there are currently 12 DWU tiers to choose from.  These tiers range in price and for comparison are in the following list, I’ve chosen 3 regions to show DWU compute prices per tier in GBP, USD and EUR as at 27th September 2016.  Prices for all available regions is HERE.  Please note that storage costs are a separate cost and are outlined in the Storage section of the pricing page.

Price Per Hour (Per Month)
Tier £ UK South $ West US € West Europe
DWU100 1.02 (755.99) 1.51 (1125.00) 1.28 (948.71)
DWU200 2.03 (1,511.97) 3.02 (2,250.00) 2.55 (1,897.43)
DWU300 3.05 (2,267.96) 4.54 (3,375.01) 3.83 (2,846.14)
DWU400 4.06 (3,023.94) 6.05 (4,500.01) 5.10 (3,794.86)
DWU500 5.08 (3,779.93) 7.56 (5,625.01) 6.38 (4,743.57)
DWU600 6.10 (4,535.91) 9.07 (6,750.01) 7.65 (5,692.29)
DWU1000 10.16 (7,559.86) 15.12 (11,250.02) 12.75 (9,487.15)
DWU1200 12.19 (9,071.83) 18.15 (13,500.03) 15.30 (11,384.57)
DWU1500 15.24 (11,339.79) 22.68 (16,875.04) 19.13 (14,230.72)
DWU2000 20.32 (15,119.72) 30.24 (22,500.05) 25.50 (18,974.29)
DWU3000 30.48 (22,679.57) 45.36 (33,750.07) 38.25 (28,461.44)
DWU6000 60.97 (45,359.15) 90.73 (67,500.14) 76.51 (56,922.87)

A DWU100 running for 1 year in the UK South region would cost £9,071.88 whilst a DWU6000 in the same region would cost £544,309.80.  A significant cost difference, but as the DWU tier can be reconfigured at any point in the use of the service and takes no longer than a few minutes to re-provision, different tiers can be used throughout the day depending on the workload.  The workload splits would typically be loading data and querying.

Concurrency Limits Per Tier

At each DWU tier there are limits to the concurrent queries and concurrent slots that are allocated.  There is a one-to-many relationship between concurrent queries and concurrent slots in that a single query can consume many slots, depending on the DWU tier.  The concept is taken further with the use of resource classes, of which there are 4:

  • smallrc (default)
  • mediumrc
  • largerc
  • xlargerc

A resource class is assigned to a user login and depending on the DWU tier can take 1 or many slots.  There is a maximum of 32 concurrent queries and 240 concurrent slots available.  Please note that ASDW accepts 1,024 concurrent connections, any queries not running will be queued.

The following table shows:

  • The total memory in GB allocated to each tier
  • The total memory in GB allocated to each resource class in each tier
  • Maximum concurrent queries and slots
Memory Allocation Per System GB Concurrency Slots Used Per Resource Class Total Memory GB (Est)
DWU smallrc mediumrc largerc xlargerc Queries Slots smallrc mediumrc largerc xlargerc
100 6 6 12 23 4 4 1 1 2 4 24
200 6 12 23 47 8 8 1 2 4 8 48
300 6 12 23 47 12 12 1 2 4 8 72
400 6 23 47 94 16 16 1 4 8 16 96
500 6 23 47 94 20 20 1 4 8 16 120
600 6 23 47 94 24 24 1 4 8 16 144
1000 6 47 94 188 32 40 1 8 16 32 235
1200 6 47 94 188 32 48 1 8 16 32 282
1500 6 47 94 188 32 60 1 8 16 32 355
2000 6 94 188 375 32 80 1 16 32 64 469
3000 6 94 188 375 32 120 1 16 32 64 705
6000 6 188 375 750 32 240 1 32 64 128 1,409

Total Memory GB (Est) was calculated by multiplying the slots available with the memory consumption per slot.  E.g DWU100 has 4 slots available, a smallrc resource class has 6GB allocated and uses 1 slot therefore 4 x 6 = 24GB. DWU1000 has 40 slots available therefore 5 mediumrc at 47GB each = 235GB.  It’s a simplistic calculation but gives an estimate of total system memory available at each tier.

ASDW achieves an MPP (massively parallel processing) architecture by distributing the workload across 60 distributions, therefore to calculate the memory allocated to each distribution across the tiers and resource classes, divide the memory allocated by 60.  For example, DWU100 for a smallrc resource class user will allocate 100MB to each distribution (6GB / 60), whereas at DWU6000, an xlargerc resource class user would have 12.5GB allocated to each distribution (750GB / 60).

Scenarios

As we can now see the memory resources available at each tier and within each resource class, we are able to calculate how many concurrent queries can be run.  For example we look at the total number of slots available per tier and then calculate how many slots each resource classes uses at that tier.  Note that we cannot exceed 32 concurrent queries running.

At DWU100 – 4 slots available:

  • 4 concurrent queries run by resource class smallrc
  • or 4 concurrent queries run by resource class mediumrc
  • or 2 concurrent queries run by resource class largerc
  • or 1 concurrent query run by resource class xlargerc

At DWU 400 – 16 slots available:

  • 16 concurrent queries run by resource class smallrc
  • or 4 concurrent queries run by resource class mediumrc
  • or 2 concurrent queries run by resource class largerc
  • or 1 concurrent query run by resource class xlargerc

At DWU2000 – 80 slots available:

  • 32 concurrent queries run by resource class smallrc
  • or 18 concurrent queries run by 16 resource class smallrc PLUS 2 largerc
  • or 20 concurrent queries run by 16 resource class smallrc PLUS 4 mediumrc
  • or 2 concurrent queries run by 1 resource class mediumrc PLUS 1 xlargerc

We can mix and match resource classes to fit with our total concurrent slot allocation and our workload requirements.  If we need to change DWU tiers to suit a particular workload requirement then we can do that then switch back to a more appropriate tier.  We also have the advantage of being able to pause compute and incur no charge, however the ASDW will not be accessible.  For example:

dwutiers

Query Importance

There are 2 importance allocations – medium and high.  Any query which uses 16 concurrent slots or more is considered high importance and is given 3 times more CPU cycles than a medium importance query.  For example, it’s not until DWU400 that importance allocation is considered as this is the first tier to use 16 slots.

  • smallrc – never given high importance
  • mediumrc – given high importance at DWU tiers 2000 and above
  • largerc – given high importance at DWU tiers 1000 and above
  • xlargerc – given high importance at DWU tiers 400 and above

I hope the above information has been useful as I certainly found understanding the DWU tiers and the concurrent query/slots concepts very useful in understanding what resources are being allocated.  Thoughts, comments and clarifications welcome.

Advertisements