A Sample of America
Dave Jesse on June 10, 2016

A Sample of America

After a few simple blog posts, here is a slightly more tricky one to get your head around. We’ll need to do some background plodding before we get to the meat of the matter. I think it’s worth the effort because, as with all the best stories, there is a surprise in the ending.

Statistical Sampling

We need to start with some statistical theory, and although this can be done with elegant mathematics, I think it’s best done with chocolate bars. I want you to imagine you are making chocolate bars and it’s your job to make sure that every chocolate bar is the same length and weight.

OK, you start in an artisan craft kitchen and can measure every bar. As you are a skilled chef they are very similar in length and weight, but inevitably, as these are hand crafted, there is a little variation which you put down to the nature of a hand-made product.


With the success of your business, you invest in machinery and soon you are making thousands of these a day. The machine makes them very consistently, but they still have some variation and you need to make sure that they all meet or exceed the weight shown on the packaging. Now, as the variation is very predictable you find that if you take a sample of 20 bars, the distribution of weights in your sample is very similar to the distribution of weights of all the bars made that day.

So, each day you sample 20 bars and track the distribution of weights of the sample. This gives you an estimate of the weights of all the bars made each day and you use this to make sure that the machine produced satisfactory chocolate bars for that day.

There are some issues with sampling that we take for granted. Firstly, that the machine makes bars of varying weights at random (we’ll skip the meaning of the word random, otherwise this blog will never end). Secondly, we assume that we can take a sample of the bars which have the same distribution as the entire production. This is not so easy as it may seem. For example, should you take 20 samples all at the same time, or one every ten minutes, or right at the end of the day? And if the bars are produced in rows on a production line, should they all be from the edge of the machine or from the middle or randomly spaced?

Then there are the non-obvious issues. If the machine can occasionally make a bar with no chocolate coating at all, will any get caught by a sampling technique? If the sample size is small, it will probably not catch any of the naked bars. Then, if there are different machines. If I sample bars only from one machine, can I say anything at all about the output of a second machine? Especially if one is making dark chocolate bars and the other milk chocolate, will sampling one be representative of the output of the factory as a whole?

Enough of chocolate bars – you get the idea that sampling for production quality assurance measurements requires some care to make sure the results are meaningful.

What’s Special about America?

Let’s get back to Flight Data Monitoring, or as it is called in America, Flight Operations Quality Assurance.

So once upon a time in an office in Washington, a gentleman was asked the question: ”If Flight Operations Quality Assurance is a Quality Assurance process, what sampling size do I need to provide a representative sample?”. The answer he gave was 10% of flights was sufficient.

The logic here is that, because all flights are the same, by monitoring one flight in ten an operator can obtain a representative measure of all the flights.

Now as all operators are trying to reduce costs, there is great pressure to adopt the 10% sampling approach and reduce the cost of FOQA significantly, so some operators take this approach. Let’s see what the consequences are.

All Flights Are The Same

For all flights to be the same, it implies that all pilots fly the same way. This may be true on a dull day, but we remember Capt Chesley Burnett ”Sully” Sullenberger III of ”Miracle on the Hudson” fame as an outstandingly professional pilot because he stood out from the crowd.

The Naked Bar Problem

Let’s say that an excellent event occurs. I’ll let you use your imagination about what this could be. Now, if only 10% of flights are monitored, the pilot has a 90% chance of this not being spotted. Frankly, pretty good odds, so the chance of spotting exceptional events is very low. This is the main reason why sampling is not acceptable outside America.

The Two Machine Problem

If you have a mixed fleet, it is important to make sure that the sample is representative of each aircraft type, and perhaps also of each route you fly. This sampling issue is the equivalent of trying to sample output from milk and plain chocolate bar lines at the same time.

What’s Really Special About America?

Imagine a minor incident occurs and the crew concerned inform the safety department, for example by submitting an Aviation Safety Action Program (ASAP) report. The safety department are keen to understand what happened and get hold of the data for this flight and process it along with the normal 10% sample flights.

To return to the chocolate metaphor, we now have a sampling process which is selecting adding abnormal bars. The random sampling will catch the correct proportion of anormalities, but by adding bias into the sample, the sample is no longer representative of the population as a whole and we can no longer use this to derive representative statistics.

In the same way, if our sampled flights are no longer representative of the flights as a whole, the statistics of event rates, parameter distributions etc. are not representative either. Furthermore, if there are no records of the degree of distortion of the sample, it is not possible to try to compensate for this effect.

So What?

The conclusion is that FOQA statistics for American carriers who carry out flight sampling can show increased event rates due to selective addition of eventful flights. This may be controllable for those within the operation who understand the extent to which their sample has been distorted, but it undermines the value of such data for de-identified benchmarking with other operators and across continents.