July 18, 2010

Mathematical Statistics (1) in Public: indecency, or pearls before swine?

This post is a re-edit of a half serious note that I was inspired to write by the use of words in a snippet of tongue in cheek conversation about professional statistics. Hopefully this will be found to be an enjoyable read for at least a few people.

The inspiration, with no apologies for the really awful pun, stop reading this post now if you can't suffer punning. That's the mood I was in, so it is required for accurately setting the scene.


[...]In statistics one should be opinionated. After all, it is generally not considered a field of discreet mathematics. This is a conclusion most often reached when indiscretion prevails, no?


I disagree - in statistics, one should not be opinionated, one should be *right*. And clear, succinct, and to the point.

Well... I pretty much completely agreed with her, and felt that her rebuttal didn't really address what I had said. So already being in a very silly mood, I wrote some explanations of how I thought the words used applied to statistics.


Advised in survey preparation and use, and in development of techniques which will be subject to intellectual property protections (2). In announcing results, discretion greatly increases the chance that someone else will get credit for the work (3).

Use at exactly the same time as being discreet, at all times in the planning stage of a study, and in the execution and data preparation. Stop immediately upon reaching EDA or analysis. If you are not strongly opinionated in the pre-analysis, you get a bad formulation of a poorly thought out question, a biased crappy design, crappy biased data, results that are pure crap, and a bunch of people saying "One time, at band camp, I read 'How to Lie with Statistics'...". During analysis of the data, shut up and let the data and the PLANNED analysis have their turn. If you want to outsmart the data to support an opinion during the analysis phase, quit mathematics, go back to school for an MBA, and sign up as a derivatives trader or "investment banker". (4)

There's the question - there in methods, here is the answer - here in results. Statistics are a measure, like the time or the mass of the moon, and thus have an accuracy and an error in lieu of right and wrong. Beliefs and opinions, and even informed opinions, thesis, hypotheses, and other vanities of Man (5) can be right and/or wrong... so of course we may have a wrong idea of what question we thought we answered. Re-check the question... is that what you thought you asked? (6) Bad execution could lead to a "wrong" answer, but we can find out what you did correctly answer by reading your notes. You have notes on the process, right?? As to having a "right" answer, meaning an accurate one, well... we have methods for two questions(7): how confident am I that this is the right answer, and how right do I think the answer is. Bayes, or non-Bayes is the choice, "rightness" is then *measured*.

Look at "rightness". See that part about the question? You better know as exactly as you can what question you answered or you will quickly leave the field of mathematics and enter the field of mumbling. Beyond that, "clear" is a communication issue. (8)

The less succinct the design and analysis, the less likely you or anyone else is to understand what was done and what happened. Suspect anything that requires too much explanation or is excessively sophisticated (9) of being misunderstood and misapplied

To the point:
As phrased, this is not part of m. statistics, but part of communications. See succinct for a related topic in mathematics. (10)


(1) My latest job search has revealed to me the semantic difference between mathematical statistics and statistics. "Statistics" is what a manager with an MBA and an Excel plug-in does and "has 10+ years of experience" in. "Mathematical Statistics" is the kind you study in school and have publications in.

(2)Carefully note that pure mathematics is unpatentable, categorized as properly being either a discovered work of God, or a discovered element of Nature and in either case a discovery, not an invention. Applied math has fewer such protections, so long as the application of the math(s) is part of the patent application.

(3)Think of it this way, if the results generate scandal with your name on it, then 'everyone knows' it's your work!

(4)Yes, this does lead to the conclusion that one should announce results in an unopinionated but indiscreet manner. It's no longer opinion, so announce the discovery with great excitement, not with great opinion.

(5)Capitalization used to indicate the common name of a species.

(6)Seriously, of "description", "methods", "analysis/results", and "discussion", only "methods" and "results" really matter. "Description" and "discussion" are opinion pieces about what the author thinks was asked, and what they think the results might mean in relation to what they think was asked. Check them with propaganda filters fully powered.

(7)The choice of which question should be asked seems to be a religious issue amoung mathematicians. And like most religions, they have sophisticated definitions for what they actually measure, that when carefully evaluated can only be answered if you have access to all possible universes, meaning that humanity can never actually test against their "gold standards" and just have to depend on the chosen method as a matter of faith. Like the "do parallel lines in this universe actually stay the same distance apart" problem, but with even less hope of ever having a usable answer. (11)

(8)Only use it when you want to be taken seriously.

(9)Check the pre-nineteenth century definitions for "sophisticated" to really understand how much to trust sophistication. While the dictionary might say that these definitions are obsolete, using them seems to have a tendency to make sentences with "sophisticated" in them more correct than the "modern" definition does.

(10)By the way, "to the point" is a highly advisable communications strategy, people have short attention spans. Two hundred words for a press release, five hundred for a press article, as few as possible for a published paper. When both papers about a discovery are published, observation has led me to suspect that the shorter one both gets read more and gets more credit. Make it shorter than what I wrote here, capice?

(11) For any English as a second language readers, yes, it really is "a usable" and not "an usable". _Spoken_ English governs the use of 'a' versus 'an', and 'usable' is pronounced as though there is a 'y' at the beginning. Just like saying "an 's.d.r.'," in which 's' sounds as though it starts with an 'e', but vice versa. Too many misuses in academic papers, making them noticeably harder to read smoothly, has made this a pet peeve of mine.

No comments:

Post a Comment