Sunday, February 28, 2021

I saw this gem at www.askpython.com:
Someone asked me if we can have a lambda function without any argument?
Yes, we can define a lambda function without any argument. But, it will be useless because there will be nothing to operate on.
Using lambda function without any argument is plain abuse of this feature.

Obviously the author never heard of a thunk. 

Tuesday, February 2, 2021

Descriptive statistics

Suppose I ran a benchmark. Being thorough, I ran it several hundred times, but that just produces a big pile of numbers and we need a way to summarize the results concisely.

The naive way would be to report the average run time and perhaps the variance or standard deviation. This would allow us to assert that the benchmark takes, say, 10.5 ± 1.2 seconds. A reader would feel confident that the bulk of the benchmark runs would take somewhere between 9.3 seconds and 11.7 seconds, and that a lot more runs would take near 10.5 seconds than the minimum of 9.3 seconds or the maximum of 11.7 seconds. The mean (average) and the standard deviation are the characteristic statistics of a normal (Gaussian) distribution. Rather than give the results of hundreds of runs, I summarize by saying that the distribution is Gaussian and I give the characteristic statistics of the best fitting Gaussian distribution.

But what if the distribution isn't Gaussian? By reporting the statistics of the best fitting Gaussian at best I've reported something irrelevant, at worst I've mislead the reader about the nature of the distribution.

It turns out that program run times do not typically follow a Gaussian distribution. A number of different factors influence the run time of a program and these are combined multiplicatively. The resulting distribution will be heavily skewed towards the shorter times, but have a long tail of longer run times. It will look a bit like this:

But a curious thing is going on here. If we plot the exact same curve but use a logarithmic scale for the X axis, we get this:

Our old friend the Gaussian distribution appears. Run times of computer programs typically follow a Gaussian distribution if you plot them on a logarithmic scale. This is called a log-normal distribution and it is one of the most frequently encountered distributions (after the Gaussian).

When reporting a log-normal distribution, it makes sense to report the mean and standard deviation of the Gaussian that appears after you transform to a logarithmic scale. If you transform and take the mean, you end up with the geometric mean. If you transform and take the standard deviation, you end up with something called the geometric standard deviation. This latter is just a little tricky. Instead of a margin of error that you add or subtract from the mean, you have a factor by which you multiply or divide the geometric mean. For example, I might assert a particular benchmark takes 10.5 ×/÷ 1.2, and you can interpret this as meaning that the majority of runs took between (/ 10.5 1.2) = 8.75 seconds and (* 10.5 1.2) = 12.6 seconds.

When you see a paper reporting the mean and standard deviation of some benchmark runs, you should be instantly suspicious. Benchmark runs don't usually follow a Gaussian distribution and it is likely that the author is unaware of this fact and is mindlessly applying irrelevant statistics. But if you see a paper where the geometric mean and geometric standard deviation are reported, you can have much more confidence that the author thought about what he or she was trying to say.

I find it particularly annoying when I see a benchmark harness that will run a benchmark several times and report the mean and standard deviation of the runs. The author of the harness is trying to be helpful by reporting some statistics, but the harness is likely reporting the wrong ones.

Monday, February 1, 2021

Oddity noticed while running a benchmark

I've heard it said that using a trampoline to achieve tail recursion incurs a cost of around a factor of two. I was curious whether that was accurate and where that cost comes from, so I decided to measure it with a micro-benchmark. I chose the Tak benchmark as a starting point because it doesn't do much but function calls and it is fairly well known. The basic Tak benchmark is this:

public static int Tak (int x, int y, int z) {
    return (! (y < x))
        ? z
        : Tak (Tak (x - 1, y, z),
               Tak (y - 1, z, x),
               Tak (z - 1, x, y));
}

Tak (32, 16, 8) => 9
When called with arguments 32, 16, and 8, the benchmark makes an additional 50,510,520 calls to itself, of which 1/4 of these (12,627,630) are self tail calls. One run of the benchmark consists of calling Tak (32, 16, 8) ten times. On my laptop this takes around 900 ms.

I ran the benchmark 250 times and recorded how long it took. Here are the times plotted against the run number:

For the first 50 runs of the benchmark the average time seems to slowly increase from around 830 ms to around 930 ms, then it goes to about 900 ms for the next 50 runs, then it settles down to around maybe 890 ms after that. Mostly it stays in a fairly narrow band around the average time, but now and then the time will jump to over 1000 ms for a couple of runs.

I don't know why the average run time drifts over time like this, but the effect is visible on several different benchmarks. Since it takes the better part of a minute to perform the first 50 runs, the period over which the runtime increases is relatively long — on the order of 30 - 40 seconds. My guess is maybe as the benchmark runs the processor heats up and begins to slow down slightly in order to control the heat.