Streamlining the Process of Process Mining

Tweet about this on TwitterShare on LinkedInShare on FacebookShare on Google+
How do we actually do process mining

Much discussion around process mining centres on the algorithms and technical methods used to analyse data and to visualise the analysis results.

But practitioners need to think seriously not just about these issues but also about the broader methodologies into which these tools and techniques are embedded – or the process of process mining.

There have been limited but important contributions by both researchers and practitioners in this space including the ‘L star’ process mining project methodology which is summarised in the following figure.

L star process mining methodology

Adapted from: Process Mining: Discovering and Improving Spaghetti and Lasagna Processes

But the experience of process mining for many practitioners is that real process mining projects can be much more challenging than the neat sequence of steps above might suggest. Indeed, the process of process mining itself has bottlenecks!

The good news is that both researchers and vendors have recognised these challenges and many new tools are emerging which can assist us in alleviating these bottlenecks and streamlining the process of process mining. This blog post will explore some of these developments.

Streamlining the process of getting data

While the data requirements for process mining are relatively trivial and easily explicable, actually getting hold of the data, including at the right level of granularity, can be a challenge.

And the need to involve expensive technical resources like database and reporting analysts to get this data can be a further challenge.

Process mining tool vendors are responding to these realities by adding functionality to enhance the ease with which process data can be accessed by end users.

Fluxicon Disco – Airlift 

One such example is Airlift which aims to enable Disco users to retrieve process data whilst minimising the intervention of these expensive technical resources.

From Fluxicon Blog – Disco 1.7.0

Airlift does this by providing the following functionality:

  • A user-friendly user interface to enable non-technical users to request process data from within Disco;
  • Translating the user request and passing it through an appropriate interface to back-end data sources; and,
  • Passing the returned data directly into the Disco tool for analysis.

While this approach likely still requires technical resources to configure the initial data extracts and the interfaces to airlift, the advantage is that once these are configured, business users can independently retrieve and analyse updated data extracts on an ad hoc basis.

Celonis system integrations  

When we talk about extracting process logs we often really mean going hunting in the data structures of sometimes very complex applications to find events captured in the data which correspond to steps in the business process we are looking to analyse.

One of the reasons we require expensive technical resources with good application knowledge to do process mining is precisely to create and validate such mappings.

The approach by Celonis Process Mining to streamlining access to process mining data is to create standardised mappings for a number of the most popular enterprise applications including Oracle, SAP and HP.

Celonis system integrations

 From Celonis Process Mining IT Systems

This may be attractive in particular for smaller organisations which lack the resources or capability to extract, validate and transform their application data to enable it to be process mined.

Streamlining the process of analysing data

But it’s not just the provision of data that we can look to streamline.

Apart from our initial analysis of a new data set, we often want to repeat the same analysis on a similar or updated data set to one that we’ve already analysed. This might be to check if problems we’ve identified are persisting or if problems we thought we’d fixed are still occurring.

In this situation, it can be time consuming to perform the same data transformations and tool configurations each time we want to do this.

A number of tools are available which fully or partially automate process analysis. While they’re not all enterprise-ready, they provide some great insights as to how such functionality might be implemented and more generally how process mining tools might evolve in future.


RapidMiner is a very popular visual programming environment for data analysis. The following image shows the way that RapidMiner users can assemble analysis components into analysis workflows by ‘piping’ outputs between different components.

From RapidMiner

RapidProM was presented at BPM 2014 by Ronny Mans. It extends the standard RapidMiner tool by incorporating around 40 ProM plugins which users can assemble into process mining workflows.

The following is an example ProM workflow which reads a log file, mines a process model and then checks the conformance of the model against the log.

Rapidprom workflow example

From RapidProM User Guide

Once we’ve assembled such a workflow we can run it by hitting the play button. We can save it and run it later. And we can share it with other users.

And because RapidProM extends an existing data analysis tool we can use standard RapidMiner functions including statistical and data analysis functions along with ProM functions to do more sophisticated forms of process analysis than is supported by current process mining tools.

The popularity of RapidMiner suggests there is strong demand for code-free interfaces for data analysis – which in turn suggests that a similarly large proportion of process mining practitioners are potential RapidProM users.


While RapidMiner offers a compelling illustration of how process mining automation might be implemented, it’s also true that the visual programming paradigm of RapidMiner is unlikely to appeal to everyone.

PMLAB was presented at BPM 2014 by Josep Carmona and takes a slightly different approach to process mining automation.

PMLAB implements a number of core process mining functions into the python programming language. This enables the authoring of short python scripts by which these functions can be chained together to create process mining workflows.

Shown below is a simple script in which a very simple log is created and a BPMN model is mined from that log.

PMLAB workflow example

From PMLAB User Guide

The code-driven analysis paradigm of PMLAB clearly distinguishes it from the point and click user interfaces of popular existing process mining tools like ProM and Disco. And it obviously limits the accessibility of PMLAB for a significant group of process mining practitioners – i.e. business users and non-technical analysts.

But for people from data mining, statistics and machine learning backgrounds who are already familiar with languages like Python and R, incorporating process mining tools into these languages may be preferable to learning a new tool.

And given Python is the fastest growing tool among the large and growing data science community, its arguable that the availability of robust process mining tools for Python would increase the uptake of process mining among this community.

Fluxicon Disco – Recipes 

While not offering full process mining automation, Disco includes functionality for reusable log filters which can accelerate process mining and hence are worth exploring in this context also.

Filters in Fluxicon Disco allow for the filtering of logs being analysed using combinations of both process criteria (eg ‘include only cases where activity B follows activity A’) and attribute criteria (‘include only cases from region C’).

Such filters are very powerful tool for comparing different logs and different views or aspects of a single log.

But such filters can take some time to configure and once we’ve configured and applied them to a particular data set we often want to share them, reuse them later or apply them to different data sets.

Recipes enable us to do just this. The following figure shows an example of a recipe within Disco. Note in particular the options to favourite and export the recipe.

Disco example recipe

Disco recipes are certainly a great tool for rapidly asking the same or similar questions of new data sets and hence for accelerating the data analysis process.

And because they can be accessed through Disco’s existing point and click user interface, of the three tools presented, they are clearly the most accessible option for accelerating process mining for existing ProM or Disco users.

Where to from here?

I hope the above highlights have demonstrated that streamlining the process of process mining is a very fast moving space with some very exciting contributions from researchers and vendors alike.

But I’ve only really scratched the surface – so I’d love to learn more about other tools or approaches beyond those discussed in this post.



Tweet about this on TwitterShare on LinkedInShare on FacebookShare on Google+

BPM 2014 highlights – Hybrid Process Mining

Tweet about this on TwitterShare on LinkedInShare on FacebookShare on Google+
The ‘spaghetti’ problem

One of the most perplexing outcomes for new process miners can be generating a ‘spaghetti model’!

spaghetti model

From Wil van der Aalst – Process Mining – Chapter 12 – Analyzing Spaghetti Processes

You can see above that such models take their name from the proliferation of events and flows which make the model practically unreadable.

While there are approaches to minimising the issue, including by filtering event logs before or during analysis, there may still be limitations with existing process mining approaches and such that while we can minimise the ‘spaghetti’ problem we may not be able to entirely eliminate it.

“Empty the dishwasher, mop the floor and put out the garbage!”

One such limitation is that most approaches to managing processes to date (including process mining) typically emphasise the specification of a sequence of activities comprising a process and the order in which these activities MUST be completed.

Because not only the goal of the process but also how this goal is arrived at are prespecified such approaches can be described as imperative.

basic workflow

More recently, a perceived misalignment has been identified between such approaches and the rise of knowledge work in which workers increasingly use judgement to decide for a particular case which activities need to be completed and in what order.

“Clean the kitchen. Just don’t touch the stove!”

A key tool in this emerging space is declarative process modelling which is contrasted by proponents with traditional imperative approaches.

declarative model

From Technische Universiteit Eindhoven (TU/e)

While the notation in the above diagram may be unfamiliar, you can see that..

  1. The activities aren’t simply connected into a single sequence of activities. What this means is that the health worker can choose to perform the medical subprocess medical and the financial subprocess in whichever order is most appropriate each time the process is performed.
  2. There are also many different types of connectors between the activities and these specify a range of different relationships between activities beyond simply arrows indicating the order in which the activities are performed. E.g. the ‘not co-existence’ flow between ‘give tetanus vaccine’ and ‘prescribe antibiotics’ specifies that if the health worker performs one of these activities, they shouldn’t then perform the other.

Thus a declarative model specifies only constraints on how the process can be performed without attempting to specify exactly how the process should be performed.

Because there are ordinarily relatively fewer constraints on how a process is performed than individual steps performed in a process this should result in less complicated models which are easier for everyone to understand, right?

And by combining this approach with process mining we should be able to generate models from process data which are not only simpler but which also reflect the actual process performed rather than just the process as designed, right?

OK lets try it

The following is an example of applying declarative process mining to real world data:

declarative model spaghetti

From Pedro H. Piccoli Richetti, Fernanda Araujo Baião, Flávia Maria Santoro – Declarative Process Mining: Reducing Discovered Models Complexity by Pre-Processing Event Logs

So what happened? For starters the number of constraints inferred from the real world process data seem to increase rapidly with the number of steps in the process.

But perhaps also real world processes may have both declarative and imperative elements. And a purely declarative process mining approach may be a poor fit for constructing models of such processes.

So whats new?  

Work by Tijs Slaats, Hajo Reijers and Fabrizio Maggi presented at BPM 2014 involved the development of a hybrid process mining method. Ie combining both imperative and declarative approaches.

Firstly, the following is a process model generated from a real world process data set using a popular imperative process mining algorithm:

more spaghetti

From Tijs Slaats, Hajo Reijers and Fabrizio Maggi – The Automated Discovery of Hybrid Processes

The result is a familiar ‘spaghetti’ model.

However, the next diagram is a process model generated from the same data set using the hybrid process mining approach.

hybrid model

From Tijs Slaats, Hajo Reijers and Fabrizio Maggi – The Automated Discovery of Hybrid Processes

The hybrid approach generates a much simpler process model than the imperative process mining approach.

And it does this by neatly decomposing the process into a hierarchy of processes each of which can be either imperative (e.g. Root) or declarative (e.g. D2.1).

hybrid model detail


From Tijs Slaats, Hajo Reijers and Fabrizio Maggi – The Automated Discovery of Hybrid Processes

What’s the potential business impact?

Given that business users find simpler process models much more comprehensible and engaging, if these initial results can be replicated more widely, hybrid process mining has the potential to significantly enhance the value of mined process models to business users.

So what’s next?   

It would be great to see further testing of this approach against a broad range of process logs to confirm whether the approach can be expected to generate much simpler models more generally.

Further, evaluating process mining algorithms typically involves checking them against four competing criteria: Fitness, simplicity, precision, and generalization. Such analysis for the hybrid approach would demonstrate that the gains in simplicity achieved by the hybrid approach are not simply at the expense of other evaluation criteria.

Similarly performance will be a significant determinant of the usefulness of this approach in the real world. So performance benchmarking against the existing algorithms would be good to see also.


Tweet about this on TwitterShare on LinkedInShare on FacebookShare on Google+

BPM 2014 highlights

Tweet about this on TwitterShare on LinkedInShare on FacebookShare on Google+

BPM 2014 – one of the main events on the Business Process Management (BPM) calendar – was held last month in Eindhoven, Netherlands.

Though the conference attracts mostly academic attendees there was plenty of work presented which should flow through to exciting and powerful new tools for practitioners over the next few years.

The last few years have seen increasing interest in process mining within the BPM community and BPM 2014 was no exception with a number of process mining sessions and demos.

I’ll be posting here shortly about some of these highlights.

Tweet about this on TwitterShare on LinkedInShare on FacebookShare on Google+