This case study showcases our solution that allows Sales and Marketing to match customers to the products they are most likely to buy using retail data and predictive analytics. This case study is just as relevant today as it was just shy of one year ago when we created it. Combining this solution with Cognitive Intelligence such as facial recognition (as shown in the article here) provides even more opportunity in the retail sector.
Utilising Internet-connected devices and real-time Advanced Analytics, we created a solution that provides better client care, a reduction in cost, time and potential health risks.
The solution helps the organisation overcome customer wellbeing challenges. Medication and Food must be kept at constant temperatures. This clever solution uses Azure IoT Suite and Cortana Intelligence Suit of technologies to create a proactive monitoring solution and ensures the maximum wellbeing of aged care clients.
A reporting solution that enabled an education provides the ability to identify student attrition rates, monitor potential student attrition causes and predict students who are likely to withdraw in the future.
Our case study on a Higher Education Institution and the 360-degree student attrition solution we designed and developed. An intelligent way to understand attrition in the past, and using that to predict attrition in the future.
It allows the institution to foster relationships, at-risk students, before the event, ultimately decreasing attrition and avoiding revenue leakage.
An Exposé case study around our advanced analytics solution for the ‘voice of business in South Australia’, Business SA. The solution was an important component of a large digital transformation program that saw Business SA transition to a modern, automated and simplified organization which was underpinned by the following technology changes;
• A cloud-first strategy which reduced Business SA’s dependence on resources that provided no market differentiation
• Simplified the technology landscape with a few core systems which performed specific functions
• Established a modular architecture which is more able to accommodate change
• Implemented a digital strategy to support an automated, self-service and 24/7 service delivery
• Improve data quality through simpler and more intuitive means of data entry and validation
• Utilise the latest desktop productivity tools providing instant mobility capabilities
An Exposé case study around our advanced analytics and big data platform for RAA that allows for the acquisition and blending of large volumes of fragmented geospatial data, transforming it using massive processing capacity, using predictive analytics to assess the risk of millions of properties, and providing interactive and geospatial visualisations of the blended data and results.
Market segmentation is a common practice in marketing and sales in order to better understand – and therefore be better able to target – customers. This same principle, though, can be applied to any business problem where the division of a diverse population into sub-populations based on similarities (or differences) would be advantageous.
Fortunately, rather than having to slice every variable an infinite amount of ways, we can utilise unsupervised learning algorithms to produce groupings of samples based on the similarity of data features.
This post will discuss the application of a technique to accomplish this, using a mixture of technologies (SQL, Python and R) and algorithms (Self Organising Maps (SOM) and Hierarchical Clustering)
The following example is taken from a pool of insurance claims, with the aim to understand sub-populations contained within the data to allow for appropriate monitoring and exception reporting. Specifically, we want to know if the mix of claims changes for the worse.
Claims by nature have a structure whereby a majority of the portfolio consists of low-value claims, reducing quickly to very few at the highest values (for the statistically minded, the value generally follows a log-normal distribution). Typical partitioning strategies are to slice by claim duration (shorter claims are in general cheaper), and this is suitable in a lot of cases when dealing with claims in aggregate. But given the lack of granularity, the challenge is that when there is a change in expected durations, the second level of analysis as to the “who” is required.
By using unsupervised learning, we will essentially encode the “who” into the cluster exemplars, so we can then focus on the more important question: “So what are we going to do about it?”
Obtaining the data was, fortunately, a trivial task, as it was all contained in an on-premises SQL Server database, and the data was as clean as it was going to get. 5 years of historical records were used as the training sample.
Execution of the modelling was done using the R Kohonen package, and subsequent clustering of the SOM model by the hclust function.
Finally, to glue everything together into a processing pipeline I used Python 2.7 with:
sqlalchemy: to connect to the database;
numpy / pandas: for data massaging;
rpy2: to connect to R.
I will not go into detail around the technical implementations of SOM and Hierarchical Clustering, there are far better explanations out there than I could hope to provide (in Tan 2006, for example). However, to provide a simple overview, a SOM is an m×n grid of nodes, to which samples are assigned to based on the similarity (or dissimilarity) measure used; commonly, and in our case, this is Euclidean distance.
It is an iterative algorithm with random initialisation, and at each step, the node codebook (a vector identical in structure as the input data, with feature values that represent the samples assigned to the node) is updated and the samples re-evaluated as to which node they belong. This continues for a set number of iterations, however, we can check to see if the codebooks are changing to any degree using a ‘Change Plot’. Figure 3 is an example of the ‘Change Plot’ from the Kohonen package, where we observe that after 50 or so iterations, there is a minimal change occurring.
When we have a SOM model we are happy with, Hierarchical Clustering allows us to condense the grid into a smaller number of clusters for further evaluation. Clustering is performed over the node codebooks, after which a number of clusters are selected. Note that one can use the nodes in the SOM as the clustering, but generally, based on the number of samples you train the model with, a SOM grid contains too many nodes to be useful.
The process by which the modelling was undertaken is depicted in figure 4.
As can be seen from figure 4, the Python script initiates everything and uses the results from the SQL and R calls.
This approach was taken primarily due to my own limitations in R programming; I am far more comfortable developing in Python and know my Python code will be more efficient than my R code purely due to proficiency.
My selection of R and the Kohonen package was based on my research into a suitable implementation of SOM to use. kohonen has an implementation called SuperSOM, which is a multi-layered SOM that gets trains each layer separately, that I thought would be ideal for temporal features in the source data (e.g. layer one = features at t1, layer 2 = features at t2 etc).
Finally, the data set was not big by any stretch so whilst training a SOM can be a compute-intensive task, in this case, anything more than a decent laptop was not required – on my i7 laptop, training of the SuperSOM took only 3 seconds to run 80 iterations against ~5,000 samples.
Both Quantisation Error (QE) and Topological Error (TE) were used to evaluate the quality of the model where;
QE = the mean distance from node samples to the node codebook. A low QE ensures that the node codebook is close to the samples allocated to it.
TE = the mean distance to a node’s nearest node. A low TE means that similar nodes are positioned close to each other on the grid.
TE is particularly imported in our case as we wish to cluster the result. Non-contiguous clusters are a side effect of having a relatively high TE.
The clustering was used to create data exemplars around which monitoring was developed to understand and changes in the population mix. Interventions could be focused on the relevant sub-population based on the dominant features of the samples in that group (ie if group “A” is a high-cost group, and feature “X” was particularly dominant how can we influence this feature to move the sample into a lower cost group). It is for this targeting reason that marketing has been a large user of clustering historically.
New samples can be allocated to a cluster by find first allocating them to the nearest node, then assigned the cluster id of the node.
Other Use Cases
Marketing: To group customers based on buying habits, product type purchases, purchase value etc to tailor marketing activities and/or focus on higher value customers.
Customer Service: Triage call types based on historical interactions into high/medium/low priority or risk.
Government: Understanding the demographics of users of services in order to better tailor customer experience
SOM uses random initialisation, therefore to get repeatable results a seed needs to be set;
Min-max scaling between 0 and 1 was used given the presence of binary variables;
In a single som scenario, one can just do dist(som$codes) to get a distance matrix for clustering purposes, but for superstorm, we need to handle the layers. So a function was created to calculate the average weighted distance across layers to output as the distance matrix. The weighting was based on layer weight and also node distance (ie the further the node away from each other the higher the weighting) to form contiguous clusters.