Well, I cheated. The autocorrelation plot at the end of the last blog was particularly poor because I included the text header of the file and only processed the first 1000 words. It looked like this:
Simply by removing the text header (which is clearly not part of our repeating patterns) the plot improves to this:
I say “improves” because what we are looking for here is clear spikes rising above the background noise. The spike at around 130 is promising, and indicates that this is repetitive data, especially as there is one at about half that number. (Remember from the last blog, a spike is where data matches itself when displaced that number of tines on the comb, so a spike here shows that there is repetition in the data every 130 or so samples).
Look at the Data
Actually, the first thing we do whenever we get some data is to look at it in a hex editor. The heading text is easy to read
I can see that the operator has not entered his customer name yet!
The data that follows the heading text is not so easy to read. It looks like this.
Looking at the data you can see that the numbers line up, with columns headed 1, 5, 9 and 13 all showing the value 03. This tells us that the data is stored in 32-bit records (which agrees with the ARINC429 source we have been told about) and so I took each four-byte record as a value and converted this prior to making the correlation plots. As computers store data in different ways, I reordered the bytes so that the data shown as 00 03 00 00 became 00 00 00 03 (look up little endian if you’re interested).
We can plot the raw data values like this if we wish:
While there are numbers in this it is hard to tell what the repetition period is. One thing you can’t see from the graph at this scale is that the lowest values are all 3 –not zero as one might expect. This is because I converted the data as recorded and have not yet separated out the data part from the whole ARINC 429 field.
We need to delve into ARINC 429 data formats for a while. This data format is used to transfer digital data on an aircraft. Each value is held in a 19-bit space, then a label is added to identify the data and some housekeeping information is also included. The resulting package is broadcast to any equipment connected to the transmitting equipment. The next piece of data is then transmitted and so on.
Here is the overall word format, showing the arrangement of the 32 bits of data:
Compare this to the data we have, where the frequent pattern of data, in Hexadecimal 00 03 00 00 or binary
0000 0000 0000 0011 0000 0000 0000 0000
puts binary 1’s in bits 18&17. This doesn’t make sense as these data bits would appear in every piece of data.
Similarly, if we do not swap the bytes we get 00 00 00 03 which puts the binary 1’s in bits 2 & 1 which makes every label the same. This is also strange.
These 1’s most probably lie in either the sign and status matrix (SSM) location or the Source and Destination Identifiers (SDI). As they occur on an even boundary, (i.e. 03, not 60) they must be in the SDI locations.
If the data is for an ARINC717 FDR there will be synchronisation words with the ever-memorable hex value 247. A quick search shows it up looking like this:
This confirms our assumption that the bytes do not need swapping and that the 03 lies in the SDI. This record is then
This puts our old friend 03 as the SDI, as we expected. The SSM is always zero and the twelve data bits of ARINC717 are held in bits 28-17.
Diverging from the ARINC429 standard, there is no label to the data words and the parity bit is not used. Even more unexpected, the data is one byte out from the normal byte locations with an odd (in both senses of the word) 1,183 byte file header.
Extracting The FDR Data
Now we know where to find the FDR data bits, we can convert these into number values and see what we’ve got.
These values are correctly between 0 and 4095 (the full range of 12-bit binary data) and show some periodicity. Let’s do the autocorrelation step to check it’s what we expect. By the way, I used 1024 lines to try to make the pattern clear and I set up my autocorrelation plot as red bars with black outline. With many bars, the red gets lost and all you can see is the black outline!
First you can see repetitive spikes with the slope we expect (remember the 4, 3, 2, 1 pattern from last blog?). The documentation says that the data rate is 512 words per second and we can see one spike at around 500 that is slightly dominant over the slope of the other spikes. To go back to the comb analogy, data at 1 sample per second will be on each tine, and we will have to move the comb 512 increments along the time scale before they line up again. Therefore we will see a good match at 512.
The complexity is that we also have data at twice that sample rate, which will match at 256 steps, and four samples per second which will match at 128 steps. Think of a comb with many patterns of tines, with shorter ones in the gaps. Here I can introduce you to the SalonChic 8″ Deluxe Triple Teasing Comb which beautifully illustrates this.
Imagine sliding two of these over each other – the red tines will line up frequently, and then less often the black ones will line up. Hence spikes at different numbers of steps. Because our data rates are all multiples of one another, the lines are all harmonics (but this is not the case if we are monitoring helicopter gearboxes, but I digress…).
Let me zoom into the section around the 500th line.
The spike is actually at line 516. That is, when we move our data 516 samples along, it matches the original pattern very well. The problem is that this is not the expected value of 512 words per second. These things are not approximate – things are repeating every 516 samples, which is most strange.
Back to the Data
We must have some more data than we should have, and so the hunt was on to find this rogue data. To be able to separate out this data, we need to be able to identify the rogue samples from something in the ARINC 429 data format. Tests were carried out on the SSM bits, the SDI bits and the data labels, any one of which could signify a change in the data content.
The test of SDI bits showed that nearly all the samples had SDI=3 but some samples had SDI =0 (a few 1 and 2 values were also present). These ‘rogue’ words arise every 127th word, are not mentioned in the documentation for this data and have strange “data” values associated with them. As an observation, the pattern of 127th word is most peculiar. If a process outputs an extra word in each frame or subframe, you would see a period of 2n+1 such as 129 or 513. Equally, if the data arose from an asynchronous process the interval would occasionally vary, but all the data we tested had a fixed interval of 127 words.
Without any more clues, all we can do is throw them away. Here are the same two plots with the rogue data removed.
The peak line is now at 512, exactly as we expected. Removing one sample every 127 lines has moved this peak four lines to the left.
The autocorrelation is cleaner than before, with very clear lines at multiples of 64 lines. This comes about because this is the sample rate for normal acceleration at 8Hz (512/8=64).
Before and After
These steps have taken the data from a muddle at the end of the last blog, where the peaks could hardly be seen, to clearly identified spikes at the right intervals, ready for conversion to engineering units.
I hope you see why combs are an essential tool in engineering, even for those of us with little hair!