How High Can You Jump?

By Kevin Tatterson

I’ve invented a new phrase: "highly correlated single metric". A hcsm is a single test for assessing the fitness of someone or something. The results of a hcsm test would be highly correlated with the result of conducting numerous tests.

For example, there are many different types of fitness tests. The SWAT team and Navy Seals have a number of tests for assessing fitness: max # of push-ups in 1-2 min, long jump, pull-ups, timed sprints, vertical jump, and more. Studies show, however, that the vertical jump test is highly correlative with overall athletic performance.

Imagine that: admission into elite Special Forces could depend entirely on how high you can jump. It’s reasonable, right? I mean, the facts state that a strong correlation exists! We could streamline the admission process down to a simple, single, vertical jump test – we’d save time, money, and remove having to think. In fact, we could fully automate the test and get rid of any human involvement. Just walk up to a special machine that measures your vertical jump – if you’re high enough, you’re in!

Of course I jest. If this silly fiction became reality, everyone who wanted to be in the Special Forces would simply practice their vertical jump – nothing else. As a result, their overall fitness would suffer – and our Special Forces would simply be comprised of people that could do nothing well – except jump.

What is funny about the hcsm is the human nature desire around it. We seem to be programmed to believe hcsm’s exist – and almost always believe the marketers who spin them. Given these tendencies, all of us should think carefully when presented with hcsm-style comparative data.

Here are some places you can find hcsm’s

CPU’s. Five+ years ago, the hcsm was clock speed. Today, with multiple cores and ever-evolving chip architectures, clock speed is now just a small part of the equation. Industry provided benchmarks try to fill the hcsm need (BAPCo Sysmark, SPEC, etc.) – but can be controversial. Consider that AMD recently resigned from BAPCo because their Sysmark metric doesn’t show any benefit for AMD’s latest chip which they spent the last the three+ years designing and bringing to market (AMD integrated GPU capabilities into their Llano APU).
GPU’s / video cards. 15 or so years ago, when the first generation of 3-D games were really taking off (Quake, Unreal, etc.), GPU benchmarks were written to compare 3-D video card performance. Guess what happened? For a brief period of time, GPU chip’s were designed to maximize the benchmark metrics – but provided little real world performance improvements. Wisely, benchmarks have since evolved to reflect real world performance (Futuremark 3DMark).
LCD televisions. Buying a LCD TV with the “best specs” has turned into a sad experience. Marketers want you to focus on hcsm’s like “contrast ratio” and “response time”. However, these metrics all but lost their meaning, with Sony claiming “infinite contrast ratio”, and “response times” being measured differently by different manufacturers (GTG=gray-to-gray, or BWB=black-white-black). Maximum PC published a great article debunking these myths.
Interviewing. The book Sway: The Irresistible Pull of Irrational Behavior (written by Ori & Ron Brafman) states that the hcsm for interviewing is a technical interview. Reducing the interview process to being strictly technical, however, is akin to the Special Forces focusing solely on the vertical jump test – with all the negative consequences.
Supercars. The latest season of Top Gear BBC has poked fun at the fact that supercar manufacturers are obsessed with using the Nürburgring test track as the ultimate hcsm. As a result, other real-world automotive features and comforts are ignored. Ironically, however, the only numerical measurement Top Gear BBC produces and ranks is: lap times around their own test track.
Software Quality. Almost every VP R&D and Director of Quality has strong hcsm tendencies for their Product’s quality. Few agree on the hcsm, though: bug flow rates, the number of static code analysis issues, the number of open defects, code complexity, customer satisfaction surveys, and so on. (Admittedly, it is typical to use more than one of these metrics.) Whatever the case, all of these measures can be influenced by outside factors and/or “fudged”.

As you can see, on its own, the hcsm is actually pretty weak. It leaves many questions unanswered, even good metrics can be faked, and marketers spin their own meaningless hcsm’s.

Now to the crux point: what would be the highly correlated single metric for a 3D Modeler? I surveyed some of Spatial’s most experienced Modeling developers/staff for their thoughts:

Jeff Happoldt & Karthick Chilaka gave the same response: "Market share. Strong market share indicates that the Modeler must work. If it’s been accepted by the majority of the market, it must have the best qualities."
Vivekan Iyengar: "Given a complex model, push numerous cutting planes through it in all three dimensions. This will result in many difficult, near coincident intersections. This heavily tests surface/surface intersections."
John Sloan: "This isn’t posslble. Every Modeler is its own ecosystem – and has evolved via selection pressure – their environments have dictated their growth. As a result, every Modeler will have different strengths and weaknesses. Forced to choose, however, I’d evaluate Boolean/Intersection robustness."
My answer: "The size of the release package. The thinking: assuming all Modelers have equal functionality, a concise package implies careful architecting."

Here’s the real story, though: when I asked my coworkers this question, they balked (heck, even I balked at my own question). They didn’t like the feeling of being cornered into a "single metric". Intuitively, they wanted to evaluate many different aspects of the Modeler.

I guess that’s just it. A highly correlated single metric should be thought of as nothing more than what it is: a statistic. You’d be foolish to make decisions based solely on hcsm’s. In my opinion, the best selection decisions are made under careful scrutiny, in real world situations – and that takes time and effort.