November 4, 2012
I support the Oxford comma.
Consider this wish: "Live healthy, learn much, rock and roll." Super-stardom, or a rocking chair and rolling in your grave? Depends on how many things are in that list, right? For the Oxford comma camp there are three things, so "rock and roll"! For the non-comma there are four... "rock" and "roll" to you I guess.
In term of social acceptability, note that the Oxford Comma is the subject of a top-40 hit song, "ommission of serial comma" is not.
Before you dismiss me as an extremist, please note that I am forgiving in the case of the omission of the comma before the abbreviation "etc."
August 14, 2012
Privacy, Identity, and Third-Party Service Providers
"We may share information with third party service providers."
Ever read that (or equivalent) in a privacy policy?
It means: "Access to your personal information by this site, including information that you have not provided directly to us, is restricted only by expense, and anything you do here may be linked to widely available databases tracking your behavior." This is also true offline, and has been true since before the 1990's. And about combining offline and online data: this was never effectively restricted. As soon as your address existed in the database of an online company that allowed "third party services", they could buy the results of your warranty cards, vehicle registrations and other public records, magazine subscriptions, credit card purchasing profiles, and more. Surprised?
Would you be more surprised if you found out that knowing how often your doctor prescribes types of medicines is also available. And that availability has been considered constitutionally protected as part of the corporate "freedom of speech" for the BUYING corporation? Imagine if we as individuals could demand information because it might change how we say something! (Yes, that's a power that corporations have claimed as part of their rights due to their "personhood".)
You may rightly wonder how all this is available. If you read the fine print on various contracts you'll see a lot of "may be shared with third party service providers" and "may be used for marketing purposes". Those two together are the glue for the corporate identity markets: the former is taken to mean "I can share all the data I have with a data aggregater to get the information about you that I don't have". And the latter: "and it can be shared with any other company that also wants to market to you or any of the characteristics that have been attributed to you."
What data links these databases together? Anything they can get. An address (shipping, billing, receive our packet, free gift, ...)? Great! A phone number? Great! A cell phone number? Even better: location, and more individual than a house phone! A credit card number? Not so great: we'll have to pay the credit card company to give us the name/address/phone of the people who bought from us, because the card data is protected by law. But rock solid reliability, since they have your billing address (and probably your primary email address, and a couple of phone numbers - after all, you have to "call from your home phone to activate", right? Even if you never gave them your home phone before that...). And then we'll have to pay again to link it to any other databases. An email? Okay... they're unreliable, but as long as you've bought from someone and given them that email, or given that email to someone you do business with (electronic billing, maybe?) we can probably link it to your "real" identity.
Yes, with school and other events, this blog has, alas, been neglected. On the upside, I am remembering it again, and have a bit of a backlog of draft posts, like the above. The topics may broaden a bit, as I explore a bit more about teaching and communicating scientific and technical skills.
July 27, 2010
Scientific Grammar
"The Science of Scientific Writing" (Gopen and Swan, American Scientist, Nov-Dec 1990) is an article that does a great job at documenting these problems and showing how to fix them. The article stresses a simple pattern: start with the familiar: end with the new. As they put it:
"In our experience, the misplacement of old and new information turns out to be the No. 1 problem in American professional writing today."
Gopen and Swan back their thesis up with "worked examples." Taking passages from published articles, they show how to revise them for clarity.
The article ends with seven rules to summarize what they have found. I am putting the rules here to remind me of them, and to entice the reader unfamiliar with them, to visit the original article and learn what they are reminders for.
- Follow a grammatical subject as soon as possible with its verb.
- Place in the stress position the "new information" you want the reader to emphasize.
- Place the person or thing whose "story" a sentence is telling at the beginning of the sentence, in the topic position.
- Place appropriate "old information" (material already stated in the discourse) in the topic position for linkage backward and contextualization forward.
- Articulate the action of every clause or sentence in its verb.
- In general, provide context for your reader before asking that reader to consider anything new.
- In general, try to ensure that the relative emphases of the substance coincide with the relative expectations for emphasis raised by the structure.
July 26, 2010
Missing people in US phone surveys
For a long time, "random dialing" has been a great way to get a random sample that could include about 99% of the population (while the 1% without phones were generally considered to not need consideration for most purposes - everyone imagined jails and wilderness hermits in cabins). While incoming calls were free to telephone subscribers, and most of the population had phones, this was almost ideal. It wasn't actually sampling people, but households, since there was generally one phone per house. However, with this sampling frame it is possible to cleanly stratify by households size, and get back to estimating individuals fairly easily. At the beginning in the early 20th century, there was some problem with the number of households that had no telephone service, but as this shrank toward 1%, the error became negligible.
In the early 21st century, we now have a different problem: mobile phones often charge for incoming calls, so they are not allowed to be "random dialed", and more and more households are relying on them exclusively. How many? Well... I'll show you a picture.
You can clearly see a fairly dramatic age bias. It appears that 25 year-olds are at about 50% reachability by "random dialing", while other age groups may be as high as 90% reachable. Les obviously, that 50% has a pretty good chance at being correlated with other aspects of their lives. Anything that can be done? Maybe. At the very least, stratify your sample by age and use data like that in the above chart to correct for reachability by age group.
If you have another sampling technique available, you can try to use it to infer differences within an age group between with-house-phone and without-house-phone, and then adjust your results appropriately for that. If it's at all controversial though, brace yourself. Even though you should now have more valid results, the people who agree with the groups that were originally overrepresented will now direct very pointed charges at you of "lying with statistics." Even though all you actually did was "question with statistics to the best of your ability."
Update: The National Marine Fisheries Service is trialling a switch to postal surveys in light of the increasing problems with telephone surveys.
July 22, 2010
Parenting alone: the googlefight
From Google searching, with English language set:
| "single mother" | 3,030,000 |
| "single father" | 568,000 |
| "single parent" | 5,400,000 |
From"googlefight.com":
| "single mother" | 24,000,000 |
| "single father" | 12,800,000 |
| "single parent" | 3,670,000 |
Major news stories have been based on less than this. Easily accessible and reliable data is great, but when it's not, the easy should not replace the reliable. Check your data before taking it seriously.
The serious data, for the US, from the US Census Bureau (Jan, 2010 press release):
In 2009, 12 percent of the 1.7 million father-only family groups with children under 18 were maintained by an unemployed father, compared with 7 percent in 2007. Of the 9.9 million mother-only family groups, 10 percent were unemployed in 2009 compared with 6 percent in 2007.
Or reformatted, the 2009 Census data:
| "single mother" | 9,900,000 |
| "single father" | 1,700,000 |
| sum, single parents | 11,600,000 |
And a lot of unemployment.
PS: The pattern in the unemployment numbers is recurring for US data. Women have greater unemployment than men when unemployment is low, and men have greater unemployment than women in times of high unemployment. All kinds of odd questions are suggested by this: Do women have more stable jobs? Is gender-correlated pay inequality causally related to apparent gender-correlated job security? And if so, which way? Would low pay cause secure work, or secure work cause low pay?
PPS: There's not any immediately obvious link on Googlefight to find out how they get with their numbers. Anyone know why it's so different from what I see "fresh from Google"?

