Jack and Jill run a race. They’re both the same age, wearing the same track shoes and other similar running attire. The weather is favorable — a pleasant, windless summer afternoon, and they’re running on that fancy red rubberized track. Jill soundly beats Jack by a margin of about half a second. Therefore, girls are faster than boys.
Obviously there’s a problem here. The race was as fair and controlled as it could be, and the results indisputable. The conclusion drawn from those results, however, is obviously bogus. It seems painfully apparent in this case, but strangely when you change “Jack” and “Jill” to “C++” and “C#” (or any two arbitrary languages), people happily accept the drawn conclusions even if they are just as bogus as the conclusion about the superiority of the female form on the racetrack.
Performance benchmarking actual languages is not possible. Languages themselves are little more than a grammar and some rules for various things all nicely wrapped up in a thick document that comprises the standard for that language. They don’t have an intrinsic speed, except in the form of possible complexity and efficiency guarantees for algorithms in the language’s standard library. When people (including myself) discuss the relative performance of languages, the topic at hand is really particular implementations of those languages being applied to particular solutions to particular problems within a particular context. In most cases this distinction is implicitly understood by all parties involved, but it’s worth mentioning here because, well, you never know.
Keep in mind as well that performance is very fickle, influenced by a vast array of subtle factors. It is therefore very difficult, and in some cases downright impossible, to obtain reliable, meaningful performance benchmarks that can hold up in general. Note my emphasis on “particular” above: we’re dealing with specifics, here. The results of a specific test (Jack and Jill) do not necessarily imply anything about the general case (boys and girls). You cannot really make definitive unilateral claims about the performance of programming languages. Benchmarking can only provide conclusive results within constraints that are based on the context of the benchmark; it is consequently very important to know what the context was. Based on that context, you can determine for yourself if the benchmark has any relevance to the problems you need to solve.
So far I’ve been focusing on benchmarks that are sound, but whose results are being misconstrued, but there is another danger. The benchmark itself might be flawed. This can happen to anybody, even experienced veterans (a recent article in the professional magazine Game Developer contained a particularly flawed benchmark on the relative performance of managed and unmanaged code, for example). In most cases, these flawed benchmark come from programmers who only know one of the two subject languages well. Consequently, the test case in their “native” language is well-written and well-optimized, employing language-specific idioms and techniques that increase performance, while the other test case tends to be a direct port of the first with only the minimal syntactical changes required to get it compiling and running. The test case for the unfamiliar language might end up employing idioms that make no sense in that language, or relying on assumptions that no longer hold (for example, in C++ dynamic memory allocation tends to involve a linear walk through the available blocks of memory to find a free chunk of the appropriate size, which is much less different than the simple increment that tends to comprise allocation in C#). If those assumptions are not the subject of the benchmark itself, they can interfere with the results and skew the benchmark beyond usability.
Keep this in mind the next time you see an unqualified claim that “language X is faster than language Y.” Even if the claim is accompanied by a detailed description of the benchmark and its results, analyze the results for yourself and question them if necessary. Judge the results as what they are (results from a specific situation) and don’t fall into the trap of assuming that the specific proves the general.