Ever wondered what the link was between flight safety events and shopping baskets? No, me neither. But nonetheless, there is one. Let’s talk about it.
One of our main jobs at Flight Data Services (as a flight safety company) is to generate events and alerts when the aircraft that we monitor exceed certain tolerances. For example, a customer might want us to monitor if their pilots regularly fly outside of their allocated altitude range (i.e. a level bust) — which is largely a safety issue, as this is a risk factor for mid-air collisions. Similarly, an airline might want to know if an engine runs too hot for too long (as this will increase maintenance costs and potentially cause safety issues).
So, what’s the point of this post? Well, any safety organisation would like to flag up as many potential safety problems as possible, to avoid the risk of overlooking something. It’s common for operators to set event thresholds low and generate as many events as possible, in the hope of catching all genuine safety events.
The problem is that setting these thresholds at such low values causes a considerable number of false positives (i.e. invalid events) — and for each event that gets generated, a trained human (i.e. an analyst) has to manually inspect the data and make sure that the event was genuine. This often ends up not being the case, as there are various causes of unusual patterns of data — and thankfully, very few of them are actually due to pilot error or malfunctioning aircraft.
In actual fact, the events that we see are more likely to be invalid — potentially caused by faulty sensors, random interference in the electrical systems in the aircraft, the submission of an incorrect flight record, or any one of a hundred other things. This makes false positives one of the main costs of the analysis department, as events are expensive to validate — and this also reduces the time available to each analyst for investigating real events.
In our database, we record quite a few important pieces of metadata about each event — things like the takeoff and landing airports, the airline operator, the type of aircraft, the airport it came from, the phase of flight that the event occurred in and so on. It turns out that this metadata is useful to the data scientists here — we can apply data mining techniques to start analysing frequent patterns in the data and then potentially pinpoint the source of invalid events.
So what does all of this have to do with shopping baskets?
Beer and nappies
Let’s talk about something called Market Basket Analysis. This refers to a set of techniques that were developed during the 1990s and early 2000s that originated in several forward-thinking supermarkets that were designed to find items that frequently commonly purchased together during a single shopping trip. The information from using this technique was then used to inform store layouts and so on.
This (possibly apocryphal) story will hopefully give a fun bit of background about the rationale behind the technique. Essentially, a consultant at a data mining firm in the 1990s was conducting some affinity analysis on a database of several million transactions, and discovered that between 17:00pm and 19:00pm on Friday evenings, there would be a spike in transactions containing both beer and nappies. Somewhat amusingly, it turned out that it was young fathers coming home from a week at work, buying some beers for the weekend, and topping up their little one’s nappy stash. According to some sources, this did genuinely happen. Whatever the reality, it’s a fun story.
How it’s useful
There are various algorithms (A-Priori, Eclat, FP-Growth, etc), but they all aim to achieve roughly the same thing — that is, given a large number of lists containing various items (in this case, a shopping basket), find the most “frequent itemsets”. An itemset is just what it sounds like — a subset of a shopping list that can contain several items. The point is to discover things that are frequently bought together (and sets of things that are related).
Interestingly enough, we can apply these same techniques to event analysis to discover frequent patterns in sets of invalid events. That is, we can build a classification model based on rules generated by a market basket analysis algorithm — and then look at the rules it generates to get an idea of what combinations of features are overrepresented in invalid events. This has been fairly useful in uncovering patterns that we were previously unaware of — using these techniques, we’ve been able to pinpoint various interesting nuggets of information — for example, that some aircraft were being processed using the wrong data frame, that some others were mis-reporting certain parameters, that some were being sent with incorrect flight records, and so on.
We ended up developing a set of algorithms to pinpoint these data problems that have since been put to use in a tool that other employees at Flight Data Services can use to explore events that they’ve invalidated during the last few months.
So that’s the link between shopping baskets and safety events — some snazzy data mining algorithms. In my next post, I’ll cover these concepts in a little bit more detail.