Smart Wastewater network – our SA Water case study

This solution enables greater customer service through better management of the wastewater assets through an industry-leading smart wastewater network management solution. It incorporates a leading edge architecture built on the Azure platform using key technologies including Databricks, IoT, Stream Analytics, App Service, and more.

Read our case study here: Smart Wastewater network – our SA Water case study

Databricks: beyond the guff, business benefits and why businesses should care. Here’s a cheat-sheet to get you started

Search for info on Azure Databricks and you’ll likely hear it described along the lines of “a managed Apache Spark platform that brings together data science, data engineering, and data analysis on the Azure platform”. The finer nuances and, importantly, information about the business benefits of this platform can be trickier to come by.  This is where our ‘cheat sheet’ comes in.  This is the first of a series designed to assist you in deciphering this potentially complicated platform. Feel free to also read the second article in the series, distilling information from data, hereafter.

What is it?

Databricks is a managed platform in Azure for running Apache Spark. Apache Spark, for those wondering, is a distributed, general-purpose, cluster-computing framework. It provides in-memory data processing capabilities and development APIs that allow data workers to execute streaming, machine learning or SQL workloads—tasks requiring fast, iterative access to datasets.

There are three common data worker personas: the Data Scientist, the Data Engineer, and the Data Analyst. Through Databricks, they’re able to collaborate on big data projects and acquire, engineer and analyse data, wherever it exists, in parallel. The bigger picture is that they are therefore all able to contribute to a final solution which is then brought to production.

  • Databricks is not a single technology but rather a platform that can, thanks to all its moving parts, personas, languages, etc., appear quite daunting. With the aim of simplifying things, our cheat sheet starts with a high-level snapshot of the workloads performed on Databricks by our Data Scientist, Data Engineer and Data Analyst personas.
  • We’ll then look at some real business benefits and why we think businesses should be paying attention. Lastly, we’ll delve into two related workloads:
    • Data transformation, and
    • Queries for visual analysis.

Our subsequent cheat sheets will start to unpick the remaining workloads.

The image below shows a high-level snapshot of the workloads performed by our three data worker personas. The workloads in the coloured sections form (to varying degrees) the basis for the contents of our cheat sheet.

  • Data engineering forms, in our opinion, the largest of the cohort of workloads:
    • Data acquisition – i.e. how data is acquired for transformation, data analysis and data science using Databricks. This could potentially fall beyond the realms of Databricks due to the fact that data can be leveraged from wherever it exists (for example Azure Blob or Azure Data Lake stores, Amazon S3, etc.) and data may already be hosted in those stores as a result of some preceding ETL process. Databricks can of course also acquire data.
    • Data transformation – discussed later in this article, focussing on the ETL processes within Databricks (ETL within).
  • Data analysis takes on two flavours:
    • Queries – these could overlap heavily with the world of the Data Scientist, especially if the languages used are Python or R and if the intent is machine learning and predictive analytics. But Data Analysts could, of course, also perform queries for ‘on the fly’ data analysis.
    • Queries for visual analysis – queries are also performed to ready data for visual analysis. This is discussed later in this article; however, it must be noted that the lines between this kind of Queries and Data Transformation performed by the Data Engineer can become very blurred. This in itself proves the collaborative and parallel nature Databricks allows.
  • Data science has machine learning and associated algorithms, with predictive and explanatory analytics as the end goal. Here too, queries are performed, and the lines are similarly blurred with the queries performed by the Data Analyst and the Data Engineer.
  • Underpinning all of this are the workloads involved in moving the solutions to production states.

These workloads are logical groupings only aimed at clearing what could otherwise be muddy waters to the untrained eye. Queries may, for example, be performed, then used for transformations, data science and visual analysis.

So, without any further ado, let’s look at why businesses should be watching Databricks very closely!

Why Databricks? – Beyond the guff, business benefits and importantly, why businesses should care

If you search on Google for ‘Apache Spark’ you’ll find loads of buzzwords – “open-source”, “distributed”, “big data”, etc. On first glance, this can look like marketing babble and appear completely removed from a business’s actual data challenges. So let’s dispense with the buzzwords and focus on the business challenges.

Note also: although Apache Spark (and therefore Databricks too) is positioned in the big data camp, its application is not limited to big data workloads. So, if some of the challenges we list below apply to your data landscape (big data or not), read on.

Time to market

Challenge – data warehouses takes too long to deliver business benefit

BenefitDatabricks is naturally geared towards agility via its ability to serve parallel collaboration, which, in turn, leads to improved responsiveness to change. This means that the time it takes to deliver data workloads is reduced

Parallel collaboration rather than seriality

Challenge – participants in the data solutions processes are too dependent on each other to complete their tasks before they can participate. These challenges are a result of serial workloads

Benefit – parallel collaboration delivers maximum agility. It means that the three main data personas, i.e. the Engineer, the Scientist and the Analyst can collaborate on delivering the data elements that will form part of a final data deliverable in parallel. As the Engineer acquires the data, the Analyst, the Scientist and indeed the Engineer start contributing to the logic that transforms and manipulates the data all in parallel. This, in turn, contributes to a reduction in time for solutions to get to market

Responsiveness and nimbleness

Challenge – companies change, requirements change, and business may not know exactly what they want or need from data that is stored in a variety of formats in different locations

Benefit – companies frequently generate thousands of data files, hosted in diverse formats including CSV, JSON, and XML from which analysts need to extract insights

The classic approach to querying this data is to load it into a central data warehouse. But this involves the design and development of databases and ETL. This works well but requires a great deal of upfront effort, and the data warehouse can only host data that fits the designed schema. This is costly, time-consuming and difficult to change.

With the data warehouse approach, insights can only be extracted after the data is transformed upon load.

Databricks presents a different approach and allows insights to be extracted and transformed upon query from vast amounts of data stored cheaply in its native format (such as XML, JSON, CSV, Parquet, and even relational database and live transactional data) in Blob Stores. With Databricks, data is read directly from the raw files, and by using SQL queries, data is cleansed, joined and aggregated – hence the term transform upon query.

Transforming the data each time a query run means this approach is much more geared towards quick turnaround and becomes more responsive to change. BUT, it requires superior performance.

Performance

Challenge – workloads (such as queries) serving analytics and data science, are run often and transform the data each time the query runs (transform upon query). Logic dictates that this will not perform as well as data transformed upon load once and the transformed data materialised for reuse.

Benefit – Databricks provides a performant environment that handles the transform upon query paradigm. This is done by utilising a variety of mechanisms, such as:

  • Databricks includes a Spark engine that is faster and performs better through various optimisations at the I/O layer and processing layer:
    • For example, Spark clusters are configured to support many concurrent queries and can be scaled to handle increased demand.
  • It includes high-speed connectors to Azure storage (i.e. Azure Blob and Azure Data Lake stores)
  • It uses the latest generation of Azure hardware (Dv3 VMs), with NvMe SSDs capable of even faster I/O performance.

A managed big data (or in our opinion, all data) platform

Challenge – The data landscape is becoming increasingly complex and fragmented and costly to maintain.

Benefit – “Databricks is a managed platform (in Azure) for running Apache Spark – that means that you neither have to learn complex cluster management concepts nor perform tedious maintenance tasks to take advantage of Spark. Databricks also provides a host of features to help its users to be more productive with Spark. It’s a point and click platform for those that prefer a user interface, such as data scientists or data analysts.” – https://docs.databricks.com/_static/notebooks/gentle-introduction-to-apache-spark.html

Not just Azure Blob Storage – access data where it lives

Challenge – Data is not necessarily stored in Azure Blobs

Benefit – Databricks connections are not limited to Azure Blob or Azure Data Lake stores, but also to Amazon S3 and other data stores such as Postgres, HIVE and MY SQL, Azure SQL Database, Azure Event Hubs, etc. via JDBC (Java Database Connectivity). So, you can immediately start to benefit from the cost, flexibility and performance benefits offered by Databricks for your existing data

Cost of the cluster

Challenge – Big data solutions tend to cost a lot of money

Benefit – The Databricks File System (DBFS), is a layer over your data (where it lives) that allows you to mount the data, making it available to other users in your workspace and persisting the data after a cluster is shut down. Data is not synced, but mounted, which means you do not double pay for storage.

When a Databricks cluster is shut down (which is also done automatically at an interval you specify when not in use), it stops costing you money, so you only pay for what you use

Furthermore, Azure Databricks leverages the economies of scale provided by Azure. Analysis workloads (Interactive workloads for analysing data collaboratively with notebooks) on a Premium F4 instance (4 virtual CPU’s and 8 GB RAM) running 24 x 7 will, for example, only cost you $380 pm. And Data Engineering workloads (Automated workloads for running fast and robust jobs via API or UI) for the same tier will, for example, only cost you $307 pm.

*Note that the pricing above is in AUD and is an estimate only as per the Azure Pricing Calculator.

Australian region

Challenge –some big data solutions such as Azure Data Lake, first generation, is not available in the Australian region as at the date of first publication of this article

Benefit – Databricks can be provisioned in the following Australian regions:

  • Australia Central
  • Australia Central 2
  • Australia East
  • Australia South East

Like everything, there are some downsides/ realities to consider

SQL, R, Python, Scala – can be daunting

SQL has become the “lingua franca” for most Data Engineers and Data Analysts, whereas the same applies to R and Python for Data Scientists. These personas collaborate on Databricks using notebooks as interfaces to the data, which allows them to create runnable code, visualisations and narrative.

Suddenly these personas gain visibility over the code from other personas in the same notebook, and as notebooks can consist of multiple languages, this can seem quite daunting to personas unfamiliar with languages they have not previously used, especially considering that the languages used in Databricks, i.e. R, Python, Scala and SQL, each have their peculiarities.

Obviously this is only an issue if you are unfamiliar with such an environment. For those with good coverage of SQL, R, Python and Scala, this is a benefit as they can work with multiple languages in the same Databricks notebook easily, i.e. personas can use their preferred language of choice irrespective of the choice of other personas. All that needs to be done is to prepend the cell with the appropriate magic command, such as %python, %r, %sql, etc.

From another viewpoint however, this diversity of languages can be a strength for the right business environment: the workflow naturally dissipates technical debt and encourages capability sharing.

Learning curve

There will often be a requirement for personas to become more familiar with a broader set of languages and the notebook environment to make following what is happening in the total notebook easier. This will make for easier collaboration and is inline with a move from pure serial to more parallel workloads.

Case study – Data Transformation and Visual Analysis

The use case described in this section is used as a vehicle for a more technical deep dive into the workloads shown in the coloured sections of the Databricks Workflow image above (i.e. Data Transformations and ETL within Databricks, and Queries for visual analysis).

Our use case – IoT and wearable devices, such as Apple Watches, are currently under a substantial spotlight as there is a lot of interest as to what can be gleaned from the data they produce (see our article June’s story as an example – http://blog.exposedata.com.au/2018/09/03/artificial-intelligence-in-aged-care-junes-story/). In our use case, Apple Watch data is brought into Azure from where the datasets will be mounted to Databricks, ETL processes then transforms and loads the data, and finally Queries are performed.

An Apple Watch is used to generate data we will use in this user story. An app on the watch integrates with Azure and streams some data into Azure Blob Storage (this app and stream are not within the scope of this article as Data Acquisition will be discussed in a subsequent article).

The data manifests itself as CSV files in Azure Blob store > Container:

Data Engineering > Data Transformation > ETL within


This section assumes that data is already available in an appropriate store for mounting (in this case Azure Blob store). We notionally call the next steps “ETL within Databricks” as it represents a logical ETL that will extract and validate the data, apply a schema, then load the data ready for use by (for example for analytical querying). ETL within Databricks should not be confused with ETL to get data into Azure in the first place (which will be discussed in a subsequent article).
ETL within Databricks is conceptually the same as the ETL concepts we know from conventional BI workloads, in that you first extract the data, then transform it, and then load it, but it is done in a much nimbler fashion and it adheres to the notion of the transformation of data upon query, rather than upon load.
The common steps associated with our two workloads, i.e. ETL within and queries to ready the data for visual analysis are shown visually in the image below:

Remember that Spark is the engine used by Databricks, and SQL/ Scala/ Python/ R/ Java uses that engine to perform the various workload tasks.

In the sections below, we will first mount our Apple watch data (this is the extract step), we will then transform the data and load it into a table using SQL (the amber route shown above), create a data frame and load it as a parquet file (the green route above). Later we will deal with Analysis of the loaded data, readying the data for, for example, visual analysis. For now let’s focus on the ETL.

The queries shown in each step below are examples of what could be done and should give the reader a starting point from where to build more complicated ETL within Databricks and subsequent queries. Databricks is a massively flexible platform, so the sample queries may be made much more complex or approached in an entirely different way.

Extract

In the first step we mount the data held in our Azure Blob store to the Databricks File System (DBFS). This represents the “Mounted Stores in DBFS” step in the image above (we are not focussing on the JDBC step in this use case).

We first generated a SAS URL for the Azure Blob store to use as a variable, then used it in the query.

Mounting means creating a pointer to the store, which means that the data never actually syncs. The mount point is simply a path representing where the Blob Storage container or a folder inside the container is mounted in DBFS.
Optional – We may quickly validate the mount by running the following query to see the contents of the mount point.

Optional – We lastly validate the data in any of the files within our mount by looking at the content of any of the files within our mount point.

Transform

As per the Transform steps, there are two options: a SQL path (shown in Amber) and a Scala/ Python/ R/ Java path (shown in Green). The reader can jump to the Scala/ Python/ R/ Java path if wanting to bypass the SQL sections, which to many may seem a bit familiar.

Transform and Load using SQL (Option A)

We use SQL to create a table in DBFS which will “host the data” via metadata, then infer the schema from the files in our Azure Blob store container. Note that the scheme can be explicit rather than inferred. In our use case all our files have the same structure and the schema can therefore be inferred. But in cases where structures differ, then standardisation queries will precede this step.

It is worth noting that in Databricks a table is a collection of structured data. Tables in Databricks are equivalent to Data Frames in Apache Spark.

Optional – We can now perform all manner of familiar SQL queries. It is also worth noting that data can be visualised on the fly using the options in the bottom left corner. In the first example, we review the data we had just loaded, in the second we do a simple record count.

Transform and Load using Scala (Option B)

Tables are familiar to any conventional database operator. Let’s now extend this concept to include Data Frames. A Data Frame is essentially the core Transformation layer in this alternative ETL path – it is a dataset organised into named columns. It is conceptually equivalent to a table in a relational database but with richer optimisations under the hood. Data Frame code follows a “spark.read.option” pattern.
In the next query, we read the data from the mount, we infer the headers (we know that all our files have the same format so no preceding column standardisation is required), we select only certain columns of value to us, and we transform the column names as a subsequent step, as loading the data to Parquet restricts us from using “restricted characters” such as “(” and “,” .

We lastly load the data into a parquet file in DBFS. Whilst blob stores like AWS S3 and Azure Blob are the data storage options of choice for Databricks, Parquet is the storage format of choice. They are highly efficient, column-oriented data format files that show massive performance increases over other options such as CSV. For example Parquet compresses data repeated in a given column and preserves the schema from a write.

Queries for Visual Analysis

Once we have Extracted, Transformed and Loaded the data we can now perform any manner of query-based analysis. We can for example query the Parquet file directly, or we can create a table from the Parquet file and then query that, or we can bake the final query into the Table create.

Let’s first query the Parquet directly:

Now let’s create a table from its Metadata which can then be used by BI tools such as Power BI.

In the final query, we query the table and prepare the data for visual analysis in something like Power BI. We select the maximum number of steps our Apple Watch wearer by day (we only loaded two days’ worth of data).

We will, in subsequent articles introduce many of the other workloads associated with Databricks building on the concepts we used in this article.

Author: Etienne Oosthuysen; Contributor: Rajesh Kotian

Blockchain in bits – A technical insight

baas_image

In our previous two articles, we articulated several real-life use cases for Blockchain implementations, and we have also elaborated conceptually how Blockchain differs from current/previous data storage architecture as well as other conceptual benefits of Blockchain as a platform.

In this article, we touch upon the technical components of Blockchain networks and Smart Contracts, and we walk through a technical implementation of a viable Blockchain application using the Microsoft Azure platform.

What is Blockchain?

The blockchain is a shared ledger which stores data differently to typical database platforms and solves several challenges by avoiding double spending and the need for trusted authorities or centralised computing servers. Furthermore, Blockchain as a technology has evolved since the introduction of the Bitcoin Blockchain in 2008 (invented by Satoshi Nakamoto), and are now solving more recognisable business problems other than cryptocurrencies.

In addition to the concepts discussed in the previous article, below are some additional descriptions of Blockchain components before we dive into the technical walk-through:

Blocks – A block is a valid record/transaction in Blockchain that Blockchain can’t be altered or destroyed. It is a digital footprint based on Cryptographic hash which remains in the system as long as the system is alive.  Since the Blockchain is decentralised, the blocks are replicated across the network nodes, thus making them immutable and secure.

Cryptographic hash – Cryptographic hash functions are cryptography algorithms that generate hash values for a given piece of data. It ensures authenticity, integrity and security of the data.

Nodes –  A node is a computer/server/virtual machine that participates in a Blockchain network. Nodes store all the blocks and transactions generated in the system. A peer-to-peer (P2P) architecture connects nodes of a Blockchain. When a device is attached to the network as a node, all blocks are downloaded and synchronised. Even if one node goes down, the network is not impacted.

Miner Node – Miner nodes create the blocks for processing the transactions. They validate new transactions and add blocks to the Blockchain. Any node can be a miner node since all the blocks in the network are replicated across each node including the miner node; hence a failing of any miner node is not seen as a single point of failure. It is advisable to set high computing machines as miner nodes since mining consumes a lot of power and resources.

How a Blockchain transaction works

A Blockchain transaction should complete a set of pre-cursory activities to ensure the integrity and security. These steps make the network of the Blockchain a unique proposition for a trust computing paradigm.

Let’s look at the Blockchain transaction lifecycle.

  1. A user initiates a transaction on Blockchain through a “wallet” or on a web3 interface.
  2. The transaction is validated by the set of computing nodes called miners using Cryptographic hash functions.
  3. Miner nodes create blocks based on the transaction using crypto economic options like Proof of Work (PoW) or Proof of Stake (PoS)
  4. The block is synchronised within the other nodes within the Blockchain network.
Blockchain transaction lifecycle

Types of Blockchain networks

Before setting up a Blockchain, one must determine the type of network required. There are three types of Blockchain Network applications.

Public Blockchain:

  • An open (public) network ready for use at any given point in time. Anyone can read the transactions and deploy decentralised apps that use the underlying blocks. No central authority controls the network.
  • These Blockchain networks are “fully decentralised”.
  • Use case: Ethereum Cryptocurrency Blockchain can be used efficiently for managing payments or running Blockchain apps globally.

Consortium Blockchain:

  • A group of nodes controlling the consensus process.  The right to read from may be public, but the participation within the Blockchain can be limited to consortium members by using API calls to limit the access and contents of the Blockchain.
  • For example, a statutory body or an organisation may implement a regulatory Blockchain application that allows selected organisations to participate in validating the process.
  • These Blockchain networks are “Partially decentralised”.
  • Use case: Reserve Bank of Australia (RBA) can set up a Blockchain network for processing and controlling specific banking transactions across banks based on statutory compliance requirements. Participating banks implement Blockchain nodes to authenticate transactions in the network.

Private Blockchain:

  • Similar to any other centralised database application that is controlled and governed by a company or organisation. They have complete write access and read permissions although the public may be allowed to see specific transactions at the Blockchain network administrator’s discretion.
  • These Blockchain networks are “Centralised”.
  • Use case: A company can automate its supply chain management using Blockchain technology.
Types of Blockchains

Implementing Blockchain on Azure

Blockchain on Azure is a Blockchain as a service (BaaS) which is an open flexible and scalable platform. Organisations can opt for BaaS to implement solutions on a federated network based on security, performance and operational processes without investing in physical infrastructure.

Azure BaaS provides a perfect ecosystem to design, develop and deploy cloud-based Blockchain applications. Rather than spending hours building out and configuring the infrastructure across organisations, Azure automate these time-consuming pieces to allow us to focus on building out your scenarios and applications. Through the administrator web page, you can configure additional Ethereum accounts to get started with smart contracts, and eventually application development.

Consortium Blockchains can be deployed using:

Ethereum Consortium Leader

  • To start a new multi-node Ethereum Consortium network, implement the Ethereum Consortium Leader.
  • And a primary network for the other multi-node members to join.

Ethereum Consortium Member

  • To join an existing Ethereum Consortium network, deploy the Ethereum Consortium Member.

Private Blockchains can be deployed using

Ethereum Consortium Blockchain

  • To create a private network use Ethereum Consortium Blockchain
  • Templated to build a private network within minutes on the Azure cloud

Below are links that will allow users to achieve a step by step approach to deploy a Blockchain network on the Azure cloud.

Once deployed you will receive the following details:

  • Admin Site: A website you can navigate to showing the status of the nodes on your Ethereum network.
  • Ethereum-RPC-Endpoint: An endpoint for connecting to your Ethereum network via an API like Truffle or web3 js.
  • Ssh-to-first-tx-node: To interact with your Blockchain, log in using your Secure Shell (SSH) client. I’m currently working on Windows, so I’ll be using Putty (https://www.putty.org/) to log in, but you can use any SSH client to connect the console. On Mac, you can just copy and paste the “ssh” line into your terminal.

Interacting with Your Azure Blockchain Using Geth

Geth is a multipurpose command line tool that runs a full Ethereum node implemented in Go. It offers three interfaces: the command line subcommands and options, a JSON-RPC server and an interactive console.

Steps to connect the Blockchain instance:

  • SSH into the Azure server using Putty or Command-line interface
  • Use the following command to connect to the Blockchain console

  • Loads all the modules below and the command prompt is available

  • Examples of geth Command

You can access the network using the Mist Ethereum wallet or any other Ethereum compatible wallet.

Mist Ethereum wallet

Smart Contracts in action

“Smart Contracts: Building Blocks for Digital Free Markets” – Nick Szabo

Smart contracts are set of terms and conditions one must meet to allow for something to happen between parties. It is just code in the form of blocks and is immutable.  Smart contracts:

  • Are anonymous.
  • Are secured using encryption so that they are safe.
  • Can’t be lost since they are duplicated into other Blockchain nodes.
  • Speed up the business process.
  • Save money since there is no need for any third party to validate and go through the contract terms.
  • Are accurate since they avoid errors that happen during manual execution of any contracts.
Example of how a smart contract works

In the above example, the following are the actions captured:

  1. Mark uses the healthcare consortium network to record his details. The details are persisted in the blockchain through a smart contract. A smart contract can hold all the required variables and attributes.
  2. Once the smart contract has acquired all the mandatory information and requirements, it is then deployed into the healthcare consortium network. A transaction is initiated for further consultation.
  3. Healthcare consortium network validates the transaction based on the logic defined in the smart contract. Mark has been detected with some health issues and the contract/health record is automatically sent to Dr John for further analysis and consultation.
  4. Dr John accesses the record and recommends Dr Anne for specialised treatment. The contract is automatically executed and sent to Dr Anne for further action.
  5. Dr Anne provides necessary treatment to Mark. The details of the treatment are persisted in the smart contract.

There are various tools to write/deploy a smart contract, however, common tools used are:

  • Languages: Solidity
  • IDE: Solidity Browser, Ethereum Studio.
  • Clients: geth, eth, Ethereum Wallet.
  • Api & framework : Embark, truffle, DAPPLE, Meteor, web3.js API, ethereumj,  Blockapps
  • TEST : TestRpc/ testnet or private network
  • Storage : IPFS/ swarm/Storj.
  • Dapp Browser: Netmask, Mist.

An example of solidity script can be found below.

Solidity script

Blockchain and Data Analytics

Perhaps the most critical development in information technology is the growth of data analytics and platforms in the Big Data, Machine Learning and Data Visualization space.  Analytics/Data lakes can source Blockchain data using federated APIs built on top of Blockchain. Since the provenance and lineage of data is well accomplished, the data from the Blockchain can be helpful in developing a productive data platform for data analytics or machine learning capabilities or AI development.

The following diagram is a simplistic view for integrating data analytics with Blockchain.

Blockchain Data Analytics

Conclusion

Before an organisation starts any of the technology assessments and implementation of a Blockchain, even if just for R&D, consider what a Blockchain would mean for your organisation through potential use cases and process improvement opportunities. Moreover, ensure some of the basic concepts described here and in the second article in the series are understood vis-a-via your identified use cases.

Only then proceed to the technology side of things.

Blockchain has the potential to be a fantastic technology through its federated computing paradigm. But do not lose sight of the process and people aspects associated with this

Networks Asset Data Mart – our Energy Infrastructure Provider case study

networks asset

Exposé designed and developed a solution that saw an increasingly temperamental Networks Asset Analytical solution move to the Exposé developed Enterprise Analytics Platform.

The solution now:

• Allows staff to focus on business-critical tasks by utilising the data created by the system.
• Reduces support costs due to the improved system stability.
• Utilises the IT resources for other projects that improve business productivity.

exposé case study – Energy Infrastructure Provider – Networks Asset Data Mart

See another case study here

Blockchain, lifting the lid on confusing concepts

In the previous article An Internet of Value – Blockchain, beyond the hype and why CxO’s must take note, we articulated why blockchain is more than just hype and provided some real-life use cases as to where/how blockchain is already helping businesses. But we want to dig deeper within this article, the second in a series of three articles, to explain how blockchain differs architecturally over its traditional database predecessors so it can do a better job at solving these real-world problems. We also describe some of the terminology used so that readers can better understand Blockchain.

This article demystifies two important concepts: Blockchain and Smart Contracts in relation to Blockchain. It then brings the concepts of Blockchain and Smart Contracts together through the prism of the Internet of Things (IoT) and finally concludes with Analytics.

But first, let’s explain why the evolution away from traditional databases towards Blockchain has been necessary to more readily share immutable data.

Blockchain Revisited

To restate what blockchain is – in its simplest form, it is a distributed database system, where there is no one master (primary) database, but many databases that are all considered primary, and where all parties participate in populating entries into the respective databases and receive the entries of the other participants. Blockchain started life as the technology that underpins cryptocurrencies. Whether cryptocurrencies have a long and bright future is very much an open question, but the technology that evolved because of them does indeed have a long and bright future.  Because of this, most businesses will have to start considering what this means for them and take steps to ready themselves for this new and important part of the technology landscape.

 

A precursor to Blockchain – a Centralised approach to data

Just about any organisation that uses digital infrastructure would be using one or more databases to store meaningful and up-to-date information. These are truly silos of data; secure from ‘outsiders’, private, and often customised for a business’ need to operate efficiently.

The use of databases has expanded over the last several decades, as businesses are needing to open up data they have securely stored in their databases to other businesses or users. Quite often this is done by using API’s (Application programming Interfaces – a controlled opening to a database), for other parties to read from and/or write to the database securely. To ensure data is not lost and always available, replication occurs so that there is redundant data ready to replace any data loss that might occur. Yet, no matter how well this is designed, it is vulnerable to single points of failure. Integrity can also be compromised if access is unopposed to a user who does harm to the underlying data.

As brilliant as database environments are, have been, and will be in the future, there are weaknesses.

D1

A typical database environment has a Primary database which stores and manages changes to data, whether by Inserting, Updating or Deleting. The ‘Replica’ database symbolises a conceptual backup database which can be used if the Primary fails, to avoid loss of data but also a loss of up-time. These databases have evolved over time to be able to handle millions of concurrent transactions, servicing thousands of users at any one time. Cost is and will likely continue to be the biggest contributing factor as to how powerful the database environment becomes, as it’s the hardware of the database environments which will limit how feasible mass concurrent transactions or users they can handle. The point is, there is ‘nearly’ no conceptual restriction for how powerful a realistic database environment can be as long as the user is willing to pay for it.

As time has gone on, having isolated database environments is not always what the 21st century requires. More and more, integration between environments is required, so to ensure an efficient and fast way for movement of data. This comes in many forms, such as movement of money from a payer on one system to payee on another. But also, for other reasons such as opening up a subset of data from a database so it can be interacted with by 3rd parties. An example of this is allowing a customer to log into their account and insert updated contact details, but to also read other data such as their Health Insurance Claims History. It is insecure to allow a 3rd party direct access to a database, so an API (Application Programming Interface) is typically developed.

The development of the API is managed by the owners of the database which they create for a specific interaction between a 3rd party and the underlying database. The API will typically contain very specific rules about what an end user is able to do within the database. These API’s can then be shared to 3rd party developers so that code can be added to websites, mobile apps or other applications not controlled by the database owner, but allows indirect interaction with the data under a controlled process.

Although API’s (which are used extensively) do satisfy more interactivity between systems, there are still issues with this database architecture, such as:

–         There is still a single point of failure

–         Lack of transparency

–         Data can be modified without any governance (governance is owned by the database owners – therefore the strength of this governance will be vastly different between different database environments).

–         Reliance on the database owners to develop APIs or other mechanisms so 3rd parties can access the data

Reliance on 3rd parties to handle contracts between transacting parties (more on this in Smart Contracts)

D2

Blockchain – a Decentralised approach to data

Conceptually, the architecture of Blockchain goes a long way towards solving many of the problems highlighted above within a typical database environment. An implementation of Blockchain can be considered a shared database; all sharing the same schema, and all acting like primary databases. No single point of failure, but all databases participate by populating data which is replicated throughout all the databases within the Blockchain, albeit with a slight lag time.

Blockchain also has its namesake due to the way a collection of transactions is confirmed which are blocks stored in the database, which effectively creates a chain of data, where each block or collection of transactions are immutably coupled to the previous and next block. This makes it near impossible to change, or remove a transaction within the blockchain, as it would cause the chain to be broken, and subsequently, the nodes processing the blockchain will never let that happen. Once data is inserted into the blockchain, it is locked in forever.

For an organisation that has no need to share data or be more transparent (even internally), there is no real benefit of Blockchain. But such organisations will become rare as commercial trends are to share, collaborate and to open key data assets for mutual benefit. Therefore, Blockchain presents a huge opportunity to make this process possible and secure.

D3

In a Blockchain, architecture transactions are written to blocks, which are replicated to the other participants in the blockchain network\consortium. All participants then validate the transactions and only then are the blocks added to the chain to create the tamper-proof audit log.

D4

Smart Contracts

In some of the use cases described in our previous article, An Internet of Value – Blockchain, beyond the hype and why CxO’s must take note, we mentioned “Smart Contracts”. But what are they, and how can they assist?

Smart Contracts is one of the key opportunities provided by Blockchain – first introduced by Nick Szabo all the way back in 1994.

What are Smart Contracts?

Contracts that set out the terms and conditions that must be met to allow for something to happen is commonplace, and in centralized models, these contracts between the two transacting parties affirming the terms and conditions of the contract are facilitated by 3rd parties (for example a bank, regulatory body, government, etc).

A Smart Contract is, like any other contract, language (actually a tiny piece of code stored inside a Blockchain) that describes a set of terms and conditions that must be met to allow for something to happen. It automatically verifies fulfilment and then executes the agreed terms. Smart Contracts, therefore, remove the need for 3rd parties as it allows parties to have an agreement to transact directly with each other.

Smart Contracts and Blockchains are immutable, in other words, they cannot be tampered with, and they are distributed, meaning the outcome of the contract is validated by everyone in the network.

For those interested in seeing what a smart contact actually looks like, here is an example. The third and final article in this series will also walk through an implementation of Blockchain and Smart Contracts.

IoT, using Blockchain and Smart Contracts (Blockchain IoT, or BIoT)

There were 20 billion connected IoT devices in 2017. This is projected to grow to 75 billion by 2025.

Centralised models for storing data, as described at the beginning of this article simply won’t be able to cope with security and volume demands. Plus, any transactional contracts will have to rely on 3rd parties.

The decentralised insertion of data into a Blockchain eliminates a single point of failure and is a tamper-proof way to store data. As a result, security against rogue participants is increased (in this case device spoofing and impersonation).

Regarding the IoT, each legitimate device can write to a Blockchain, and they can easily identify and authenticate to each other without the need for central brokers or certification authorities. The eco-system can scale massively as it can support billions of devices without the need for additional resources.

Smart Contracts extend the Blockchain functionality to include the contract between transacting parties. This removes any IoT eco system’s reliance on 3rd parties to handle the transactional contractual arrangements.

Analytics

Blockchain now also presents huge opportunities for Advanced Analytics.

Data analytics will be crucial in tracking Blockchain activities to help organisations that use Blockchain make more informed decisions. For every use case of Blockchain mentioned (sport, health, insurance, asset management and delivery), come data analytic opportunities. Especially when coupled with machine learning to find those nuggets deep inside the trillions of transactions.

Whether in a traditional database, unstructured text with big data or this new concept of the blockchain, data analytics can use this data to help users and IoT devices.  Moreover, AI, as well as predictive analytics, can help users make informed decisions, making the human ecosystem ever more efficient.

Conclusion

Blockchain, Smart Contracts along with their combination with IoT have plentiful use cases. The data technology sector has the ability to guide this new epoch of technology forward, ironing out the creases along the way for this technology to be as common as the internet is today. The idea of blockchain is very similar to the internet; it is networked and decentralised, requiring protocols (shared schemas) to be followed to allow it to effectively communicate. From that point onwards, everyone can see how life-changing the internet has become to our day to day lives. Organisations who visualise and implement these benefits sooner will be the leaders of the next technical revolution. This is absolutely where Exposé can help.

In the next article, we will delve a lot deeper into a technical implementation of Blockchain and Smart Contracts.

Joint Authors: Etienne Oosthuysen (Exposé, Head of Technology and Solutions) and Cameron Wells (Exposé, Technical Lead)

An Internet of Value – Blockchain, beyond the hype and why CxO’s must take note

A Blockchain, in its simplest form, is a distributed database system where there is no one master (primary) database, but many databases that are all considered primary. All parties participate in populating entries into the respective databases and receive the entries of the other participants.

But how does this apply to your business, and is this profoundly going to change how the world works? Let’s look at an analogy: Imagine I create a song and generate a file of my recording in mp3 format on a USB stick. I can give two of my friends a copy of this; they can do the same, and so on. With thousands of eventual copies going around, it will be impossible to establish which was the real version I own and which I ideally wanted to use in exchange for royalties. By the way, if I ever had to create a song and recorded it, I doubt very much that I would garner thousands of fans. I am just not David Grohl 😊

This is where Blockchain comes in. It is a shared ledger that is used to record any transaction and track the movement of any asset whether tangible, intangible or digital (such as my mp3). It is immutable, so participants cannot tamper with entries, and it is distributed, so all participants share and validate all the entries.

Blockchain will allow “my fans” 😊 to enter into a contract with me directly. As they stream the song, payment goes directly from their wallet into mine. The information about what was listened to and what I was paid, is verified by all the databases in the network and cannot be changed. There are no middlemen (like a central streaming service, or a record label), so the contract (a digital smart contract) is between those that listen to my song and me directly.

It is at this point important to mention that Blockchain is not Bitcoin, or any other cryptocurrency, although it did start life as the technology that underpins cryptocurrencies. This article, the first in a series of three articles, looks beyond its use in cryptocurrencies, and rather highlights use cases to show CxO’s why it is so important to take note of Blockchain and to start controlled proof of concepts (POC’s) and research and development (R&D) in this technology now. We look at some examples across a wide range of industries and use a Courier based use case to delve deeper into what Blockchain could mean for organisations using the Internet of Things (IoT).

Sport

Dope testing and cheating have been quite topical lately with large portions of the Russian contingent banned from the Rio Olympics in 2016, and again from the Winter Games in South Korea in 2018 for systemic manipulation of tests. Blockchain will make the test results immutable and open the results up to all that participate in the data cycle. Even if the athlete changes sports, that data will be available to participating sporting organisation. http://www.vocaleurope.eu/how-technology-can-transform-the-sports-domain-blockchain-2-0-will-boost-anti-doping-fight-sports-investments-and-e-sports/

Health

Some countries are planning health data exchanges with the aim of addressing a lack of transparency and improving trust in patient privacy as well as fostering better collaboration. Blockchain will provide practitioners and providers with better access to health, patient and research information. Adoption of Blockchain will lead to closer collaboration and better treatment and therapies, sooner.

Blockchain in healthcare is real and imminent. This study from IBM shows how pervasive Blockchain is expected to become with 16% of 200 responding health executives aiming to implement a Blockchain solution shortly. https://www.ibm.com/blogs/think/2017/02/Blockchain-healthcare/

Banking

 Australia’s Commonwealth Bank collaborated with Brighann Cotton and Wells Fargo to undertake the world’s first global trade transaction on Blockchain between independent banks – an unbroken digital thread that ran between a product origin and its final destination, capturing efficiencies by digitising the process and automating actions based on data. https://www.commbank.com.au/guidance/business/why-blockchain-could-revolutionise-the-shipping-industry-201708.html

CommBank is taking this a few steps further with an appointed head of Blockchain and a whopping 25 proof of concepts over the past five years, including the ability to peer-to-peer transfer of funds offshore within minutes rather than days, and the issuing of smart contracts. http://www.innovationaus.com/2017/12/CBA-outlines-a-blockchain-future

Insurance

Customers and insurers will be able to manage claims better, transparently and securely. Claim records, which are tamper proof once written to the chain, will streamline the claim process and minimise claimant fraud such as multiple claims for the same incident.

With Smart Contracts, payments can be triggered as soon as certain minimum conditions are met. There are also many smart contract rules that could ascertain when a claim is also fraudulent automatically denying the claim. https://www2.deloitte.com/content/dam/Deloitte/ch/Documents/innovation/ch-en-innovation-deloitte-Blockchain-app-in-insurance.pdf

Courier Delivery

Couriers deliver millions of items each day, very often crossing vast geographical distances and across multiple sovereign boundaries with unique laws and processes.

These businesses, who often make heavy use of IoT devices, will benefit hugely from Blockchain to improve the ability to track every aspect of a package delivery cycle and minimise fraud.

There were 20 billion connected IoT devices in 2017 and projected to grow to 75 billion by 2025. https://www.statista.com/statistics/471264/iot-number-of-connected-devices-worldwide/

The current centralised approach for insertion and storage of IoT data (see the image below) simply won’t be able to cope with volume demands and transactional contracts will have to rely on multiple 3rd parties. Also Managing data security can be very complex because data will flow across many administrative boundaries with different policies and intents.

In contrast, the Blockchain decentralised peer-to-peer approach for insertion and storage of IoT data eliminates issues with volume demand, (the data is stored across a potentially unlimited number of databases). There is no single point of failure that can bring the whole IoT network to a halt (computation and storage is shared and there is no one primary). It supports tamper-proofing (all participating databases validate a transaction, which is then shared and becomes immutable), which means increased security from rogue participants such as IoT device spoofers and impersonators (Spoofing can occur when security is breached through a lowly secured device on a shared IoT network. If the lowly/ unsecured device is hackable, then the whole network is compromised as it will believe that the hacker is encrypted as the intruder is on it through the easily hacked device).

Delving deeper into our Courier Delivery use case – Blockchain and IoT, creating an Internet of Value

In a courier parcel delivery ecosystems, the movement of parcels is tracked every step of the delivery process via IoT devices that reads a barcode, or another form of identification that can be picked up by the sensor. From the original warehouse to a vehicle, a plane, another warehouse, and finally your home.

By using Blockchain, each sensor participates in the chain and records “possession” of the delivery item (and so also the location). Each time it is read by a new sensor, the new location is broadcast to, inserted, then shared and agreed on by the remaining participants on the Blockchain. Every block is subsequently a transaction that is unchangeable once inserted into the blockchain.

Each Blockchain entry (i.e. the barcode, the location of the package and a date-time stamp) is encrypted into the Blockchain. The “possession” steps are tracked no matter who is involved in the package delivery process (from the origin which could be the delivery at an Aus Post outlet, to an Aus Post vehicle to the airport, to QANTAS en route to the US, to a DHL distribution centre in a US airport, and finally to a DHL delivery vehicle en route the destination address). This enhances trust in the system as there is no need to adhere and interface with a single primary system, and package tracking is put on steroids. If you have ever sent anything abroad, you would know that granular tracking effectively ends at the border. This won’t be the case with Blockchain. https://www.draglet.com/Blockchain-applications/smart-contracts/use-cases

Conclusion

It must be noted that Blockchain technology has not been around for very long and is rapidly evolving. Widespread commercialisation beyond cryptocurrencies is still in its infancy. But all indications are that it will be a hugely disruptive technology.

The many examples of important players taking this technology seriously move Blockchain beyond hype:

CxO’s may ask, why to invest in something they cannot yet fully understand, but this was probably a very similar question asked in the 90’s about the internet. The learning curve will no doubt be steep, but that makes investing in targeted R&D and POC’s early all the more important so that they do not get caught off guard once commercialisation starts increasing.

In the next article, Blockchain, lifting the lid on confusing concepts, we will delve a little bit deeper and describe the concepts in more depth.

Internet of Mice

Advanced analytics

The Internet of Mice – Our IoT and Advanced Analytics Solution

Understanding how animals involved in research move and eliminating as much human handling as possible makes for a much more humane environment for the animals. The outcome is more accurate results for the researchers. See how our IoT and Advanced Analytics solution developed for our customer strives towards a humane research environment and delivers more intelligent insights to researchers.

See more about IoT

Our YouTube channel

Our youtube Channel

We have a growing list of videos on our YouTube channel where you can find some selected case studies, test drives and solutions. Get an inside look at the world of Smart Analytics.

Topics include:  Advanced Analytics, Cognitive Intelligence, Artificial Intelligence, Augmented- and Virtual Reality, IOT and Business Intelligence

Feel free to subscribe as we are constantly adding new videos.

Our YouTube channel

 

Get more from your Retail data with Predictive Analytics

This case study showcases our solution that allows Sales and Marketing to match customers to the products they are most likely to buy using retail data and predictive analytics. This case study is just as relevant today as it was just shy of one year ago when we created it. Combining this solution with Cognitive Intelligence such as facial recognition (as shown in the article here) provides even more opportunity in the retail sector.