Artificial Intelligence in Advanced Analytics Platforms

artificial intelligence

Artificial intelligence (AI) encompasses various technologies such as machine learning, natural language processing, deep learning, cognition, and machine reasoning. Usually, AI is defined as a biological system which is designed for computers to give them the human-like ability of hearing, seeing, thinking and then reasoning. One of the newest technology applications in businesses, Computer Vision, is an AI field that deals with how computers can be made to gain a high-level understanding of images and videos. The sub-domains of Computer Vision are video tracking, object recognition, learning, motion estimation, image restoration, etc.

According to a survey conducted by Narrative Science, 0% of businesses already use AI in some form or another, a figure set to increase at over 60% by the end of 2018.

Let’s look at a typical use case we are working on right now, after which we will compare the two exciting entrants into the area of Computer Vision.

Use case:

Marketing activities are centred around smart advertising in online platforms. The business wants to change the advertising to be based on a person’s demographics such as race, gender, and age which can increase the benefits for the company placing the advertising advertisement.

Related use cases (especially around emotion) is discussed in the short video blog here: https://exposedata.wordpress.com/2017/01/12/cognitive-intelligence-meets-advanced-analytics/

Two platforms compared:

The two emerging major services set to disrupt the  Computer Vision market are Microsoft Cognitive Intelligence and Amazon (AWS) Recognition. These services aim to place AI such as Computer Vision services in the hands of analytics developers or analysis by providing APIs/ SDKs which can easily integrate into applications by simply writing a few lines of code. The added benefit is the integration with their larger cloud-based offering which gives the businesses a quicker ROI, higher reliability, and lower cost.

Let’s have a look at Microsoft’s Cognitive Intelligence and Amazon’s Recognition based Object Identification, Text Recognition, Face Detection, Emotion (in depth) and Price.

Object Identification:

Amazon and Microsoft both provide APIs and SDKs to read, analyze and label various objects in images. Both Microsoft and Amazon services could identify and label the objects included in the uploaded image (with a calculated level of confidence as shown). However, Microsoft can also analyze videos in real time in addition to images. Figure 1 and 2 show the results of both platforms respectively.

1

Figure 1: Microsoft object identification results

2

Figure 2: Amazon object identification results

If you need to process videos, then Microsoft Cognitive Intelligence provides the superior service. It can also detect adult content and image or video category. However, if you are using images only, both products step up to the plate very well.

Text Recognition:

Similar to object Identification, we conducted a test to analyze images that include text. Unfortunately, Amazon doesn’t yet provide a full-text recognition service. The Microsoft offering can find, analyze and write back text in different languages. Figure 3 and 4 present the results by Microsoft and Amazon after analyzing the texts included in uploaded images.

3

Figure 3: Microsoft Text Recognition Result

4

Figure 4: Amazon Text Recognition Results

If you need to analyse text within images, the Microsoft service is at present the only option.  Amazon only shows that the uploaded image has text whereas Microsoft shows the actual text (even from multiple languages).

Face Detection:

One of the main applications of Computer Vision in AI is face detection. This can be extended to finding human demographics such as gender, age, emotion, wearing glasses, facial hair, ethnicity, etc. Figure 5 and 6 show our results.

5

Figure 5: Microsoft Face Detection6

Figure 6: Amazon Face Detection

Both Microsoft and Amazon have the ability to find demographic information such as gender, age, whether they are wearing glasses or not, having a beard, etc. Microsoft goes one step further as faces can be grouped into visual similarity (such as verifying that two given faces belong to the same person). In addition, Microsoft can process real-time videos of people.

Emotion in Depth:

Computer Vision analyses a person’s emotion by studying his/her face. It returns anger, sadness, contempt, disgust, fear, happiness, neutral and surprise percentages.

7

Figure 7 Microsoft Emotion in Depth

If a business requires the analysis of someone’s emotion then Microsoft can analyze and measure each of the emotions listed above based on faces. Amazon only returns the percentage of detected smiles. Also, Microsoft can process both images and real-time videos.

Service Price:

This is not a quote but highlighting the simple cost comparisons as obtained from the respective Microsoft and Amazon pricing websites:

For Object Identification and Text Recognition, Amazon is priced at $1.00 per 1000 images, compared to Microsoft’s $1.50 per 1000 images.

For Demographic Recognition (e.g. gender, age, wearing glasses, etc.),  Amazon is priced at $1.00 per 1000 images. Microsoft has a free plan if the number of calls is less than 30,000 per month, and above that, prices vary from $1.50 to $0.65 based on the number of calls.  In addition, Emotion “in depth” has its own prices at $0.10 per 1000 calls.

Amazon Recognition (all services): https://Amazon.Amazon.com/rekognition/pricing/

Microsoft object and text identification: https://www.microsoft.com/cognitive-services/en-us/computer-vision-api

Microsoft face detection: https://www.microsoft.com/cognitive-services/en-us/face-api

Microsoft emotion in depth: https://www.microsoft.com/cognitive-services/en-us/emotion-api

Summary of services:

The following table provides a summary of Computer Vision services between Microsoft and Amazon (at the time of authoring of this article).

t1

Conclusion:

Although Microsoft’s Computer Vision is in some areas more mature compared to the Amazon equivalent, it must be noted that Amazon’s Computer Vision services are much newer compared to Microsoft’s equivalent. We have seen a lot of investment by both vendors in this area, so expect Amazon to close the gaps in due course.  However, at the time of writing this, Microsoft is certainly leading the pack in Computer Vision.  But watch this space. 

Power BI and SharePoint Online – together at last!

Microsoft recently released a feature to enable organizations to easily insert Power BI reports into their SharePoint Online pages.

In his blog post Senior Program Manager, Lukasz Pawlowski, identified that the new web part for SharePoint Online will enable the addition of Power BI reports without requiring any coding by SharePoint authors.

The way the feature will work is simple.

  1. Publish your Power BI report to your Power BI service account
  2. Get the URL to the report from the File menu in Power BI service
  3. Add the Power BI (preview) web part to your SharePoint Online page
  4. Paste the URL of the report when prompted
  5. To finish, save and publish your page

This new feature is currently in preview and is only available to Office 365 tenancies that are set to “First Release” as it uses a new authentication method that has only been made available to First Release tenancies. This authentication will allow users to see reports based on their organization authentication without having to sign in again.

The use of this new feature requires users to have a Power BI Pro license as well.

Further details on how to use this new feature have been provided by Microsoft here.

Power BI Service Updates

Microsoft recently released an announcement highlighting changes that had been made to the Power BI service environment. These included:

Power BI admin role

–          An O365 admin can now assign a Power BI admin who will have access to tenant-wide usage metrics, and be able to control tenant-wide usage of Power BI features.

Power BI audit logs globally available

–          In public preview are audit logs for Power BI to enable admins to track usage of the platform

Public preview: Email subscriptions

–          Power BI will regularly send screenshots of a subscribed report page directly to your inbox whenever the data changes and a link back

New APIs available for custom visuals developers

Real-time streaming generally available

–          Real-time streaming has moved from preview to general availability. This allows users to easily stream data to Power BI via the REST API, Azure Stream Analytics, or PubNub

Push rows of data to Power BI using Flow

–          Power BI Flow connector which pushes rows of data to a Power BI streaming dataset without writing a single line of code

New Microsoft Azure AD content pack

–          A new content pack for Microsoft Azure Active Directory to quickly visualize how it is being used within an organization.

Further details can be found here: http://bit.ly/2lwYHsu

Is the Data Warehouse Dead?

data warehouse

I am increasingly asked by customers – Is the Data Warehouse dead?

In technology terms, 30 years is a long time. This is how old the Data Warehouse is – that makes the Data Warehouse an old timer. Can we consider it a mature yet productive worker, or is it a worker gearing up for a pension?

I come from the world of Data Warehouse architecture and in the mid to late naughties (2003 to 2010) whilst working for various high profile financial service institutions in the London market, Data Warehouses were considered all important and companies spent a lot of money on their design, development, and maintenance. The prevailing consensus was that you could not get meaningful, validated and trusted information to business users for decision support without a Data Warehouse (whether it followed an Inmon, or a Kimbal methodology – the pros and cons of which are not under the spotlight here). The only alternative for companies without the means to commit to the substantial investment typically associated with a Data Warehouse was to allow Report Writers to develop code against the source systems database (or a landed version thereof), but this, of course, leads to the proliferation of reports, and it caused a massive maintenance nightmare and it went against every notion of a single trusted source of the truth.

Jump ahead to 2011, and businesses started showing a reluctance to invest in Data Warehouses – a trend that accelerated from that point onward. My observations of the reasoning for this ranged from the cost involved, the lack of quick ROI, a low take-up rate, difficulty to align it to ongoing business change, and, more recently, a change in the variety, volume and velocity of data that businesses are interested in.

In a previous article “From watercooler discussion to corporate Data Analytics in record time” (https://exposedata.wordpress.com/2016/09/01/from-watercooler-discussion-to-corporate-data-analytics-in-record-time/) I stated that the recent acceleration of changes in the technology space, “…now allows for fast response to red-hot requirements… and how the “…advent of a plethora of services in the form of Platform-, Infrastructure- and Software as a Service (PaaS, IaaS and SaaS)…are proving to be highly disruptive in the Analytics market, and true game changers.

Does all of this mean the Data Warehouse is dead/ dying? Is it an old timer getting ready for pension, or does it still have years of productive contribution to a corporate data landscaper left?

My experience across the Business Intelligence and Data Analytics market, across multiple industries and technology taught me that:

A Data Warehouse is no longer a must-have for meaningful, validated and trusted information to the business users for decision support. As explained in the previous article the PaaS, SaaS and IaaS services that focus on Data Analytics (for example the Cortana Intelligence Suite in Azure (https://www.microsoft.com/en-au/cloud-platform/cortana-intelligence-suite), or the Amazon Analytics Products (https://aws.amazon.com/products/analytics/) allows for modular solutions that can be provisioned as required which collectively answers all the Data Analytics challenges and ensures data gets to users (no matter where it originates, its format, its velocity or its volume) fast, validated and in a business-friendly format.

But this does not mean that these modular Data Platforms that use a clever mix of PaaS, Saas, and IaaS services can easily provide some of the fundamental services provided by a Data Warehouse (or more accurately, components typically associated with a Data Warehouse), such as:

  • Where operational systems do not track history and the analytical requirements require such history to be tracked through (for example slowly changing dimensionality type 2).
  • Where business rules and transformations are so complex that it makes sense to define the rules and transformations by way of detailed analysis and for it to be hardcoded into the landscape through code and materialised data in structures that the business can understand and is often reused (for example dimensions and facts resulting from complex business rules and transformations).
  • Where complex hierarchies are required by the reporting and self-service layer.
  • To assist regulatory requirements such as proven data lineage, reconciliation, and retention by law (for example for Solvency II, Basel II and III and Sarbanes-Oxley).

Where these requirements exist, a Data Warehouse (or more accurately, components typically associated with a Data Warehouse) is required. But even in those cases, a Data Warehouse (or more accurately, components typically associated with a Data Warehouse) will merely form part of a much larger Data Analytics Data Landscape. It will perform the workloads described above, and there is a larger data story delivered by complimentary services.

In the past, Data Warehouses were key to delivering optimized analytical models that normally manifested themselves in materialized Data Mart Star Schemas (the end result of a series of layers such as ODS, staging, etc.) Such optimized analytical models are now instead handled by business-friendly metadata layers (e.g. Semantic Models) that source data from any appropriate source of information, bringing fragmented sources together in models that are quick to develop and easy for the business to consume. These sources include those objects typically associated with a Data Warehouse/ Mart (for example materialized SCD2 Dimensions, materialized facts resulting from complex business rules, entities created for regulatory purposes, etc.) and they are blended with data from a plethora of additional sources. The business user still experiences that clean and easy to consume Star Schema-like model. The business-friendly metadata layer becomes the Data Mart, but is easier to develop, provides a quicker ROI, is much more responsive to business change, etc.

Conclusion

The data warehouse is not dead but its primary role as we knew it is fading. It is becoming complementary to a larger Data Analytics Platforms we see evolving. Some of its components will continue to fulfil a central role, but it will be surrounded by all manner of services and collectively these will fulfil the organisation’s data needs.

In addition, we see the evolution of Data Warehouse as a Service (DWaaS). This is not a Data Warehouse in the typical sense of the word as spoken of in this article, but rather a service optimized for Analytical Workloads. Can it serve those requirements typically associated with a Data Warehouse such as SCD2, materialization due to complex rules, hierarchies or regulatory requirements? Absolutely. But its existence does not change the need for those modular targeted architectures and the need for a much larger Data Analytics Data Landscape using a variety of PaaS (including DWaaS), IaaS and SaaS. It merely makes the hosting of typical DW workloads much simpler, better performing and more cost-effective. Examples of DWaaS are Microsoft’s Azure SQL DW and Amazon’s Redshift.

 

 

Visualizations are the new black

This is the 3rd in a series of 3 articles:

First, we looked at Colouring with Numbers – can data present a better picture

The second showed how a story is worth a thousand visuals

Now we conclude with how Visualisations are the new black

 

In my last blog “A story is worth a thousand visuals”, we discussed how to consider the layout of the report to entice and lead the audience through to find information that is of importance.

Now that we have gotten the audience to this point it would be all for nothing if they can’t effectively interpret what they are seeing. This is where the choice of visualization to present this information is paramount. Choose the right visualization and the audience can understand and interpret the information clearly. Choose the wrong one and the information can become lost or misinterpreted.

“So how do I know that I have chosen the right visualization?”

Glad you asked.

You won’t.

No matter how you believe the information should be displayed, it’s ultimately up to the audience that you are delivering to that will determine if what you are portraying is effective.

To assist with trying to get a visualization that is effective as possible these are the five rules that I use to help achieve this:

Rule 1 – “Always consult with your audience”

You will always be closer to the data than your audience and you will naturally use this to establish your own beliefs around the correct way to represent information. If you consult with your audience, they will assist in helping to ensure you maintain an objective view. If you can’t consult with your audience, try at least to seek an independent reviewer. If you find you are having to explain what they are looking at, this is probably a good guide that the visualization you have created isn’t achieving its intended purpose. Remain objective and open to critique. Everyone perceives information in different ways and you have to remember that this needs to be received by an audience that may not see the information the same way as yourself.

Rule 2 – “Understand what it is you are trying to say”

If you can’t understand what it is you are trying to visualize, how can you effectively translate it into something that can be understood by others?

Before choosing any visualization, stop and take a minute to ensure you have formulated the question to which this visualization is going to provide the answer.

Always check your understanding of the true intent of the question. Interrogate further when required. Make sure the breadth and depth of what is being requested are captured ensuring the specific detail actually sought can be represented within your visualization.

Along with understanding what it is you are trying to visualize, check that the data that you are using is accurate and correctly represents the question and answer.

Rule 3 – “Choose an appropriate visualization”

The goal of data visualization is to communicate information as efficiently and clearly as possible to an audience to enable analysis and understanding. It tries to reinterpret complex data to make it more accessible and as much as the interpretation of data is a science, the presentation of the data is art.

Edward Tufte, a noted leading figure in data visualization, wrote in his book “The Visual Display of Quantitative Information” the following principles for effective visualization:

“Excellence in statistical graphics consists of complex ideas communicated with clarity, precision and efficiency. Graphical displays should:

  • show the data
  • induce the viewer to think about the substance rather than about methodology, graphic design, the technology of graphic production or something else
  • avoid distorting what the data has to say
  • present many numbers in a small space
  • make large data sets coherent
  • encourage the eye to compare different pieces of data
  • reveal the data at several levels of detail, from a broad overview to the fine structure
  • serve a reasonably clear purpose: description, exploration, tabulation or decoration
  • be closely integrated with the statistical and verbal descriptions of a data set.

Graphics reveal data. Indeed, graphics can be more precise and revealing than conventional statistical computations.” (Tufte, 1983)

So where do we start?

A good guide to determining what chart type helps to present what type of information was developed by Dr. Andrew Abela showed below in Figure 1. It’s based on the four analytical models of comparison, composition, distribution, and relationship.

f1

Figure 1 (Abela)

This provides a great starting point for choosing a visual representation of the data. But remember to use this as a guide. Always assess if the visual is staying true to what you are trying to present.

Some good resources to assist further with understanding types of visualizations include:

www.datavizcatalogue.com

http://labs.juiceanalytics.com/chartchooser/index.html

http://annkemery.com/essentials/

 

Rule 4 – “Use color to enhance the visual and not detract”

Colour can be a powerful tool to draw your audience in and focus their attention. But it has to be used with care as it can just as easily detract and cause confusion. Once again Edward Tufte provides us with some guidance here:

“…avoiding catastrophe becomes the first principle in bringing color to information: Above all, do no harm.” (Tufte, Envisioning Information, 1990)

We process color before we are even consciously aware that we are interpreting it and we can use this to our advantage when presenting a visualization to provide the audience with clarity and direction on interpreting the information.

Colour should be reserved in its use and only applied when it adds meaning to the data to do so.

For example, presenting the following two-column charts. Both are displaying exactly the same information

f2

Figure 2

f3

Figure 3

The chart in Figure 2 is harder to interpret than Figure 3 due to the selection of individual colors for each column. When the audience sees Figure 2 they instinctively try to apply a meaning to the color scheme. It is better to remove this mental fatigue and use a singular color as in Figure 3 as the audience will interpret this as all the same data with the comparison to occur at the individual column.

Try and use soft colors predominantly reserving more intense colors for drawing attention to specific points of interest.

f4

Figure 4

f5

Figure 5

As shown in Figure 5, increasing the lightness of the surrounding colors allows the intended data point to be drawn in to focus in comparison to the same information represented in Figure 4.

In concert with trying to highlight the relevant data, helper information such as axes, data labels, background colors and borders should be muted so as not to detract from the information being presented. Figures 6 – 9 below show some examples of how this may look when not taken into consideration.

f6

Figure 6

f7

Figure 7

f8

Figure 8

f9

Figure 9

To ensure consistency and cohesiveness throughout your visuals, establish a color palette that you can use. The palette should enable you to display data that is of the following types; sequential, diverging and categorical.

Sequential color palettes are used to organize quantitative data from high to low using a gradient effect. You are generally wanting to show a progression rather than a contrast. By using a gradient-based color scheme this allows you to show this progression.

f10

Figure 10

Diverging palettes show information that moves outward from an identified central point of the data range. A typical diverging palette uses two different sequential palettes so that they diverge from a shared light color toward dark colors at each extreme but provide a natural visual order that assists the audience in interpreting the progression.

f11

Figure 11

Categorical color palettes are used to highlight categories of data. With categorical data, you typically want to create a lot of contrast to ensure the visual distinction between each category. To do this use different hues to represent each of your data points.

f12 

Figure 12

After establishing your palette ensure you include complementary colors where the brightness is reduced to enable you to use colors from your primary palette to highlight and the secondary palette to support. If we were to do this with Figure 12 it would create a palette as shown in Figure 13 below.

f13

Figure 13

Fortunately, there are many websites that can assist you in establishing your color palettes without having to have an intimate understanding of color theory. Some recommendations I can make are:

http://paletton.com

http://colorbrewer2.org/

http://tools.medialab.sciences-po.fr/iwanthue/

http://www.colorhexa.com/

https://color.adobe.com/create/color-wheel/

A final word on the use of color wouldn’t be complete without recognizing accessibility requirements. Approximately 10% of males and 1% of females suffer from poor colour perception, commonly referred to as colour blindness. It is recommended that designing your palettes that the colours you choose should accommodate for this. Colorhexa has a good visual tool to assist you with understanding how a colour is perceived by the different types of colour perception.

Rule 5 – “Ensure clarity in your visualisation”

When designing your visualisation, remember that the key is to communicate information as clearly and quickly as possible. Only visualise information that is relevant and enhances what is being interpreted.

Now that you have constructed your visual, stand back and look at it. Squint. Is there anything that detracts or confuses the information you are trying to present?

An example of how too much noise can cause confusion is illustrated in the example below in Figures 14 and 15.

In Figure 14 the American Joint Economic Committee, Republican Staff released a chart to demonstrate the complexity of the American Affordable Healthcare Act.

f14

Figure 14

An American citizen, Robert Palmer, felt that the chart was purposefully designed to highlight what is a complex topic by making the chart itself difficult to read. Thus, he redrew it as shown in Figure 15 to demonstrate while still a complex topic clarity of information could still be presented.

f15

Figure 15

(Palmer)

TL;DR

In summary, if you have skimmed to the bottom of this looking for the quick answers here’s my rules for visualisation:

Rule 1 – “Always consult with your audience”

Rule 2 – “Understand what it is you are trying to say”

Rule 3 – “Choose an appropriate visualisation”

Rule 4 – “Use colour to enhance the visual and not detract”

Rule 5 – “Ensure clarity in your visualisation”

 

References

Abela, D. A. (n.d.). Charts. Retrieved from Extreme Presentation: https://extremepresentation.com/design/7-charts/

Palmer, R. (n.d.). Retrieved from Flickr: http://www.flickr.com/photos/robertpalmer/3743826461/

Tufte, E. (1983). The Visual Display of Quantitative Information. Cheshire, Connecticut: Graphics Press.

Tufte, E. (1990). Envisioning Information. Graphics Press.

 

 

 

Cognitive Intelligence meets Advanced Analytics

cognitive intelligence

Acquiring knowledge of anonymous customers through Cognitive Intelligence is the next generation customer based Business Intelligence.

Human behavior and characteristics such as speech, demographics, and emotion can now be expressed digitally and blended that with Advanced Analytics.  Exposé apply this across a number of different use cases as shown in the video.

 

Or navigate to – https://www.youtube.com/watch?v=XkeCLp7noyo

See more on Advanced Analytics

Advanced Analytics and Big Data Platform – RAA Case Study

data platform

An Exposé case study around our advanced analytics and big data platform for RAA that allows for the acquisition and blending of large volumes of fragmented geospatial data, transforming it using massive processing capacity, using predictive analytics to assess the risk of millions of properties, and providing interactive and geospatial visualisations of the blended data and results.

This video case study shows a solution summary:

See the full case study here: expose-case-study-raa

See another big data solution here