ETL as a Service – Lifting & Shifting your existing SSIS ETL to Azure

Why a trend towards PaaS over IaaS ETL and how this relates to SSIS

Businesses are moving more towards environments where infrastructure is invisible to the business; where Platforms as a Service (PaaS) are developed, deployed and managed in the cloud to reduce operational infrastructure overhead costs as well as increase the scalability and availability of services.

But why is PaaS more attractive compared to on-premise or Infrastructure as a Service (IaaS) solutions?

  • Savings – PaaS services often use a Pay as You Go (PAYG) billing model, meaning you only pay for what you use and when you use it (for example, there are no costs due to idle server times).
  • Scalability – PaaS provides huge scalability and availability on demand.
  • Less overhead – PaaS requires no patching or maintenance of application infrastructure – it’s all taken care of. Therefore, there is less staff overhead from a maintenance perspective which allows staff to take part in activities that deliver a definite and measurable business benefit. This may include new solution developments, improving existing solutions, or data analytics.

3 images

This did not apply to SSIS

Microsoft’s incumbent data integration technology that handles the extract-, transform and load (ETL) of large volumes of data, SQL Server Integration Services (SSIS), has played a key role in many organisations’ data platforms, but unfortunately has not been available as a Service (PaaS).

So for businesses with existing investments in SSIS, wishing to move their data analytics solutions into the cloud and leverage the advantages of PaaS over IaaS, would not have been able to do so with their existing SSIS workloads. They would have had to embark on a full ETL redevelopment using an alternative data integration technology such as Azure Data Factory, or they would have had to use Azure VMs running SSIS to host their SSIS packages.

The good news

The good news is that Microsoft (MS) recently announced Azure-SSIS Integration Runtime (SSIS-IR) public preview, as well as the public announcement Azure SQL Database Managed Instance (SQLDBMI). Together these two new PaaS Services allow businesses to leverage the advantages of PaaS for their ETL workloads, without a major redevelopment.

In this article

  • We discuss what this means for businesses.
  • We discuss some capabilities still missing in public preview but in the pipeline of work for General Availability of the product.
  • We discuss how the recent announcement of SQLDBMI goes hand in hand with shifting your SSIS solutions into PaaS.
  • We touch on some of the other benefits of SQLDBMI.
  • We walk the reader through creating an SSIS IR instance as well as deploying, running and monitoring solution execution.

What are the business benefits

Savings, Scalability and Less overhead

SSIS workloads can now leverage the attractive cost advantages through a PaaS PAYG model – this includes instantaneous scale up and down to meet varying workloads, paying only for what you use, and no hidden costs through servers being idle.

By moving your SSIS workloads to PaaS, you minimise overhead costs, time and skillsets required to manage its supporting infrastructure. Infrastructure often involves costly hardware- (including depreciation, physical space and upgrades), plus backup-, network-, redundancy- and failover costs. Most of these costs simply do not apply to PaaS solutions.

Your staff can also focus on where there is a maximum business benefit – i.e. the solutions themselves rather than infrastructure or the setup, configuration, maintenance and licensing.

In addition, use what you previously developed

SSIS-IR allows businesses to easily lift and shift existing ETL workloads into the cloud so there is no need for redevelopment in ADF.  Developers familiar with the extensive functionality and components available in SSIS can easily continue to develop using proven technology such as Visual Studio and then deploy to their SSIS-IR instance in the same fashion they would for on-premise solutions.

What does this mean in $ terms

piggybank

A cost analysis between the 3 options (running SSIS in an on-premise server environment, running it on a cloud-based VM or as a PaaS service) based on ETL processes that is required to run twice a day and takes around 1 hour per run (~60 hours a month) shows a clear cost advantage when using PaaS.

Please note that the table below is NOT an official quote and any cost estimates must be obtained from your licenced software provider. Costs taken from the following sources:

CostMatrix

scalability

This option also leaves the business with the flexibility to easily scale up or down they compute power as their needs change. Doubling the computing power would take the annual cost from $550 to $870.

people

Regarding FTE overhead and maintenance – even if a conservative 30% of DBA time is saved by adopting PaaS, this will have a major cost saving and productivity increase to the Business.

What is currently missing? And what is coming?

SSIS-IR is still in public preview.  Therefore, there are some features not yet available as at the time of authoring this article – January 2018. These features are all expected to be available upon General Availability of the product:

  • 3rd party assemblies and API’s are not supported in SSIS-IR in the preview, only the stock SSIS components are supported.
  • Only East US, East US2, West Europe & North Europe host SSIS-IR in the preview.
  • There is no portal GUI available for provisioning your SSIS-IR instance; PowerShell scripts must be used to provision, stop/start and scale up/down.
  • You cannot schedule the execution of the SSIS packages from within SSIS-IR in preview and packages must be run
    • This is where another upcoming release by Microsoft, Azure SQL Database Managed Instance (SQLDBMI) comes to play.

SQLDBMI

The PaaS database service, Azure SQL Database Managed Instance (SQLDBMI) was announced at the recent MS Ignite conference and is set to add some key features into a more powerful version of Azure SQL Database.  This will be similar to what you would normally expect to see in the on-premise version of SQL Server and makes a very compelling case for migrating existing databases to the cloud. One of the key features in SQLDBMI is the SQL Agent, this will enable the scheduling, automation and execution of SSIS-IR packages.

Also in SQLDBMI

Additional to SQL Agent in SQLDBMI there are many other great features which are on their way.  The public preview of SQLDBMI is set to be available around Q1 2018. So, stay tuned for the following:

  • DB Mail
    • An enterprise solution for sending email messages through the SQL Server database engine
  • R / Machine Learning Services
    • Develop and deploy data science solutions that uncover new insights as an in-database service
  • Service Broker
    • Native support for messaging and queuing applications.
  • CDC
    • Change Data Capture records data manipulation operations that are applied to SQL server tables in an easily consumed relational format
  • Replication
    • Enables the copying and distributing of data and database objects from one database to another and the synchronising between them to maintain consistency.
  • Resource Governor
    • Used to manage the SQL Server workload and system resource consumption by specifying limits that incoming application requests can use

Conclusion

Together SSIS-IR and SQLDBMI will make it much easier for businesses to lift and shift existing Business Intelligence architectures (notably data storage in a fully functional Database as a Service environment, and batch based Extract, Transform and Load solutions) straight into the cloud without the need for additional costs of redevelopment. The benefits of reduced operational overhead and running costs, and increased scalability and availability are hard to overlook when comparing on-premise or IaaS based solutions with PaaS based architectures.

For those interested in more of the technical aspects of SSIS-IR, the remainder of this article will walk through a set-up of SSIS-IR. Knowledge of SQL Databases and SQL Server Integration Services are required for the remainder of the article.

Walkthrough of setting up Azure-SSIS Integration Runtime

An existing Azure SQL Server is required – this is where SSISDB will reside.  We preloaded the server with an Azure SQLDB – Adventure works (not required) which were used for some SSIS tests.

SQLDB Provisioning Screen
Resource group with Azure SQL Server and SQLDB

Missing functionality in the preview – Next, because there is no GUI in private or public preview, SSIS instance must be provisioned using PowerShell scripts.  The parameters, variables and example scripts can be found here: https://docs.microsoft.com/en-us/azure/data-factory/tutorial-deploy-ssis-packages-azure

After provisioning, the following objects should appear in your resource group.

SQLDB Provisioning Screen 2
Resource group after provisioning showing ADFv2 and SSISDB

After connecting to your Azure SQL Server through SSMS, you can see that provisioning has created a SSISDB database. To see the Integration Services Catalog, you must connect to your Azure SQL Server and SSISDB database directly through the advanced options.

Integration Services Catalog
Integration Services Catalog menu

You can now go ahead and create folders and deploy packages to the Azure-SSIS Integration Runtime environment just as you would with a regular SSIS server.  We created a few simple test cases in visual studio and deployed the project to the newly created Azure-SSIS.

SSIS Packages
SSIS Project packages

These packages can be configured in the same way as standard integration services, including parameters, environment variables and connection managers.

SSIS Package Configuration
SSIS Package configuration

Running these packages and checking the integration services catalogues reports allow for all the logging associated with standard SSIS.

SSIS Package Execution
SSIS Package execution
SSIS Package Run Log
SSIS Package run log

Leave a Reply

Your email address will not be published. Required fields are marked *