Envision and deliver the Central Data Analytics Platform – our Department for Health and Wellbeing case study

See how we envisioned and delivered a realtime Central Data Analytics Platform using modern best practice to provide, amongst others, a 360-degree clinical view of a patient in realtime, across previously fragmented data from unrelated systems across the South Australian health networks. This platform will ultimately better patient outcomes and care.

Read our case study here.

Azure Purview, does it fill the data governance blind spot for Microsoft? (Part 2)

Welcome back to Part 2 of this blog on Azure Purview. In this instalment continuing from Part 1, I will go through a detailed review and highlights of Purview.

Azure Purview – classifications and sensitivity:

  • Classifications using Purview’s own 100+ prebuilt classifications, or BYO ones, are used to mark and identify data of a specific type that is found within your data estate. We could easily see where all the credit card and Australian Tax File numbers were located across the data lake.
  • Sensitivity labels define how sensitive certain data is and they are applied when one or more classifications and conditions are found together. We could clearly find the data with a sensitivity of ‘secret’ – an immediate application of this could be to support the IRAP data classifications as defined by the Australian Signals Directorate and PCI in the financial sector.

When the scan completes all data meeting classification rules will be discoverable, whereas Purview’s sensitivity labelling needs a couple of hours to reflect the new assets and auto label sensitivity after which time it too will be discoverable. It is also possible to view insight reports for classifications and sensitivity labels.

Azure Purview classification and sensitivity label examples
Azure Purview classification and sensitivity label examples

Azure Purview – business glossary

We could easily employ a glossary to overlay business friendly terms over the metadata so that we converted the physical vocabulary with a standard vocabulary the business can understand. Remember data is the business’ asset more so than that of ICT, so a business vocabulary is important.

Purview has, as I previously mentioned, more than 100 system classifiers it uses during scans to automatically detect the system and it can also use your own BYO classifications and apply them to data assets and schemas. But it was easy to override these with the business glossary and anything a human override was never replaced by subsequent automated scans. We for example overrode CC# with Credit Card Number.

It is also in the glossary where data stewards and owners are set, two core elements of effective data governance.

Terms can also be subject to workflow and approvals so that it does not become a free for all.

Azure Purview – show lineage at various levels

We could see a bird’s-eye view of the data estate, including very impressive lineage at the asset, column and process, levels, as well as the full data map:

  • At the data asset/ entity level, i.e., where the entity/ asset came from.
Azure Purview Entity level lineage
Azure Purview Entity level lineage
  • At the column level. i.e., where the attribute came from.
Azure Purview column level lineage
Azure Purview column level lineage
  • At the process level, i.e., how data was moved/ orchestrated.
Azure Purview process level lineage
Azure Purview process level lineage

Azure Purview – gain insights

It is also easy to see insights across many of these concepts – across the data assets, glossary, scans, classifications, sensitivity, and file extensions.

The images below show only some insight examples for the glossary, for classification and for sensitivity:

Azure Purview Glossary insights
Azure Purview Glossary insights
Azure Purview Classification Insights
Azure Purview Classification Insights
Azure Purview Sensitivity Insights
Azure Purview Sensitivity Insights

Conclusion – is Azure Purview a worthy data governance tool

Does Purview hit the mark? As I said before, there were, as at the date of authoring this, some kinks Microsoft needed to sort out and my correspondence with their product team suggests they are working on this. So, looking at it from a pure data cataloguing perspective, it ticks many boxes and at a very compelling price point.

But data governance is broader than just cataloguing, and even though Purview crosses the boundary into some aspects that would not normally sit within a data catalog (which is a good thing), other areas still require attention, notably master data, data quality and data security. BUT we all know this is just the first module, so watch this space!

Originally published by Etienne Oosthuysen at https://www.makingmeaning.info/post/azure-purview-does-it-fill-the-data-governance-blind-spot-for-microsoft

Azure Purview, does it fill the data governance blind spot for Microsoft? (Part 1)

Data enablement – can Azure Purview help

For several years now, I have been evangelising about the need for data enablement in organisations. In a nutshell, this means getting data to consumers (be it final consumers of reports, or consumers who go on to create data solutions such as data engineers, data scientists and business focussed data modellers) faster, and accepting that data transformations, data modelling and new data solutions (models, reports, dashboards, etc.) will occur as part of a highly agile, rapid development process and by multiple data workers, some of whom are external to ICT.

Technology has now reached that part in the maturity curve, where this enablement will accelerate and become the norm, replacing old school, linear data workloads performed by central BI/ ICT teams only. Core to these technologies is Data Lakes for storage at scale (the ‘lake’ of your ‘lake house’), Synapse or Databricks for on-demand transformations and the virtual data warehouse layer (the ‘house’ of your ‘lake house’), and Power BI for business models and visualisations (the ‘front door’ to the whole thing) and of course resources to move data into, and around the ecosystem.

BUT all this enablement now demands a massive rethink of governance – both in terms of methodology as well as technology. Long and laborious theory heavy data governance methodologies simply won’t keep up with the rapid internal growth of the internal data market and the many workers across the organisation who take part in data related activities. An alternative, much more pragmatic methodology is required and must be supported by technology that posses two crucial things: (1) Artificial Intelligence to simplify and accelerate the process of data cataloguing and classification, and (2) crowd sourcing so that users across the business can quickly add to the collective knowledge of the data assets of the business. And it is in the technology space where Azure had a massive blind spot.

Introducing Azure Purview.

The word Purview simply means the ‘range of vision’ and when it comes to data, then the greater the range of this vision and the clearer the objects you see, the better. Will Purview live up to this definition of its more generic namesake and will it cover the blind spot I previously mentioned?

The current generally available version is the first of multiple planned modules for Purview, i.e., the Data Catalog module. This first module supports AI based cataloguing and classification of data across your data estate, curation, and data insights. Users will in addition be able to use and maintain business glossaries, expressions to classify data based on patterns beyond the out-of-the-box classifications (let’s call these bring your own (BYO) expressions to cover additional patterns), provide visibility over ownership and custodianship, show lineage, etc.

This will have immediate benefit to anyone seeking pragmatic data governance as it will immediately provide a heap of knowledge about the data in your data estate via 100+ out of the box scanning rules, something that would have required resource intensive and error prone human activity, plus it enables a data worker to augment/ override the AI scanning in a crowd sourcing ecosystem, or by allowing data workers to BYO scanning rules.

In the recent road test, we dumped a whole load of data into Azure Data Lake and set Purview scanning loose over it to do its AI and built-in classifier magic. The results looked pretty good and goes a long way to fulfil that requirement I mentioned before for pragmatic and accelerated data governance.

In the next blog, I will go through a detailed review and highlights of Azure Purview. Stay tuned!