No account? Create an account # come_to_think

## come_to_think

The prevalence of sawteeth
A while ago I was taking my blood pressure every morning, with an apparatus that eventually proved so unreliable as to be useless. Tabulating the results, I noticed a prevalence of zigzags:  If the reading was higher yesterday than the day before, that seemed to make it less likely that today's reading would be higher than yesterday's. That seemed plausible in that an increase would make it less likely that the latest reading was below the mean.  Inquiring about this on alt.sci.math.probability, I was rewarded with the following argument:  Suppose that the readings are independently & identically distributed (probably a good approximation, tho of course one can imagine events that would affect the blood pressure for periods longer than a day) and that ties are not allowed (not actually true, but the precision of the readings was such that ties never actually occurred).  Then, of a successive triple of readings, all orderings are equally probable -- say 123, 132, 213, 231, 312, 321, where the numerals are mere ordinals representing relative magnitude.  Of these, three (123, 132, 231) present an initial increase, of which only one (123) presents yet another increase.  Thus, the odds are 2 to 1 against a further increase; and likewise, after a decrease, the odds are 2 to 1 against a further decrease.  Thus, strings of sawteeth are fairly probable.

This can be generalized:  Suppose one has experienced a sequence of n increases; what are the odds against the next reading being higher still?  Among the n+2 numbers then in hand, there are (n+2)! equiprobable permutations, but of these, only n+2 begin with an increasing sequence of n+1 numbers, corresponding to the n+2 ways of leaving out one of the n+2; and of those, only one has the n+2nd larger than the rest.  So the odds against a further increase are n+1 to 1.  Each increase makes a further increase modestly less probable.

It surprises me that this result is independent of the distribution.  In a hasty effort, I did not succeed in finding it in Feller or on the Web. Tests that are independent of the data distribution are known as "non parametric statistics", and while they are less informative than the parametric variety, they are incredibly useful for their generality.

I just performed a statistical test looking for "batch effects" in sets of experiments that consisted of three case/control pairs performed on three different days. It used exactly this kind of ordering, and asked if the order of the three cases was independent of the order of the three controls. It was not, revealing the day the experiment was performed as a relevant variable.

If you like this stuff, download the R language and play with their tutorials. That will give you permanent nerd credentials. I will also mention that the idea that all orderings are equally probable is only valid if the time between measurements is long enough. For example, if you measure the height of the tide at 10 minute intervals, you will almost ALWAYS get the ordering 1,2,3 or 3,2,1, since the measurement frequency is way higher than the highest frequency components of the tide itself. Similarly, if you measure exactly once a week, you'll get the same effect due to "beating" between the tidal frequencies and measurement frequencies even though the measurement frequency is relatively low.

There is probably some theorem that tells you how frequently you need to (randomly?) sample for all orderings to be equally probable, but I don't know what it is. The case of measurements made by a flaky instrument a day apart probably qualifies. I'll look it up, but I'm afraid I'm at an age when one loses rather than gains nerd credentials. 