January 31, 2014

Thoughts on Archival Quality Systems

Sometimes a bit of perspective can be had by considering a new viewpoint, and today I'm having a bit of that experience as I consider the process of repairing, refurbishing, or upgrading interactive installations for museums. The idea of a concrete archive of knowledge, trying to pass itself on to the future, is really interesting. As someone who does a lot with code and algorithms, and a fair amount with electronics and small carpentry, I can see an installation as a set of these parts, but it seems wrong - the real point is the preserving, explaining, and sharing knowledge. All of the technical and material aspects of the design need to be subordinate to this purpose.

There are two big challenges with installations: people, and time.

People are the whole point, but people are also really rough on things that are not their own. Particularly the young people who are still learning, and since they are still learning, are the usual target audience for museum installations. So there has to be a lot of both durability and maintainability in an installation piece.

Time is an essential trait for most museums: most have preserving knowledge as a significant part of their mission. Unlike an artifact, as time goes on, an installation is expected to change and stay current with the latest knowledge. Fortunately, it is not usually expected that they will do this without human intervention (Whew!), but this does mean that they need to be fully prepared for upgrades and replacing parts.

Funding is almost always fairly short term for installation development, so they should be cheap to maintain. Further, due to the vagaries of staffing in non-profits, they should not require any particularly esoteric skills to maintain. Assume that at some point in the future, some poor volunteer is going to be called on to determine if "the old can be set up or fixed for ." Part of your goal as the original designer should be to make this person successful, and thus perpetuate the knowledge that the installation represents.

In my next post, I plan to elaborate these thoughts into steps that are a bit more concrete. I hope to see from that if this point of view is particularly supportive of particular design methods, or if it has any surprises about what to emphasize.

November 4, 2012

I support the Oxford comma.

With a highly divided election approaching, I thought I'd make my political stance clear: I support the Harvard, Oxford, and serial comma.

Consider this wish: "Live healthy, learn much, rock and roll." Super-stardom, or a rocking chair and rolling in your grave? Depends on how many things are in that list, right? For the Oxford comma camp there are three things, so "rock and roll"! For the non-comma there are four... "rock" and "roll" to you I guess.

In term of social acceptability, note that the Oxford Comma is the subject of a top-40 hit song, "ommission of serial comma" is not.

Before you dismiss me as an extremist, please note that I am forgiving in the case of the omission of the comma before the abbreviation "etc."

August 14, 2012

Privacy, Identity, and Third-Party Service Providers

"We may share information with third party service providers."

Ever read that (or equivalent) in a privacy policy?

It means: "Access to your personal information by this site, including information that you have not provided directly to us, is restricted only by expense, and anything you do here may be linked to widely available databases tracking your behavior." This is also true offline, and has been true since before the 1990's. And about combining offline and online data: this was never effectively restricted. As soon as your address existed in the database of an online company that allowed "third party services", they could buy the results of your warranty cards, vehicle registrations and other public records, magazine subscriptions, credit card purchasing profiles, and more. Surprised?

Would you be more surprised if you found out that knowing how often your doctor prescribes types of medicines is also available. And that availability has been considered constitutionally protected as part of the corporate "freedom of speech" for the BUYING corporation? Imagine if we as individuals could demand information because it might change how we say something! (Yes, that's a power that corporations have claimed as part of their rights due to their "personhood".)

You may rightly wonder how all this is available. If you read the fine print on various contracts you'll see a lot of "may be shared with third party service providers" and "may be used for marketing purposes". Those two together are the glue for the corporate identity markets: the former is taken to mean "I can share all the data I have with a data aggregater to get the information about you that I don't have". And the latter: "and it can be shared with any other company that also wants to market to you or any of the characteristics that have been attributed to you."

What data links these databases together? Anything they can get. An address (shipping, billing, receive our packet, free gift, ...)? Great! A phone number? Great! A cell phone number? Even better: location, and more individual than a house phone! A credit card number? Not so great: we'll have to pay the credit card company to give us the name/address/phone of the people who bought from us, because the card data is protected by law. But rock solid reliability, since they have your billing address (and probably your primary email address, and a couple of phone numbers - after all, you have to "call from your home phone to activate", right? Even if you never gave them your home phone before that...). And then we'll have to pay again to link it to any other databases. An email? Okay... they're unreliable, but as long as you've bought from someone and given them that email, or given that email to someone you do business with (electronic billing, maybe?) we can probably link it to your "real" identity.

Yes, with school and other events, this blog has, alas, been neglected. On the upside, I am remembering it again, and have a bit of a backlog of draft posts, like the above. The topics may broaden a bit, as I explore a bit more about teaching and communicating scientific and technical skills.

July 27, 2010

Scientific Grammar

Scientific writing is sometimes hard to read because of bad grammar, even more than because of strange abbreviations and technical terminology. This is sadly expected in journal articles, even though clear writing will make it more likely that someone will read far enough through your research to use it and cite it. It is also the reason that so many whitepapers are written by non-experts. The sponsoring organization wants people to read them, not fear them.

"The Science of Scientific Writing" (Gopen and Swan, American Scientist, Nov-Dec 1990) is an article that does a great job at documenting these problems and showing how to fix them. The article stresses a simple pattern: start with the familiar: end with the new. As they put it:

"In our experience, the misplacement of old and new information turns out to be the No. 1 problem in American professional writing today."

Gopen and Swan back their thesis up with "worked examples." Taking passages from published articles, they show how to revise them for clarity.

The article ends with seven rules to summarize what they have found. I am putting the rules here to remind me of them, and to entice the reader unfamiliar with them, to visit the original article and learn what they are reminders for.
  1. Follow a grammatical subject as soon as possible with its verb.
  2. Place in the stress position the "new information" you want the reader to emphasize.
  3. Place the person or thing whose "story" a sentence is telling at the beginning of the sentence, in the topic position.
  4. Place appropriate "old information" (material already stated in the discourse) in the topic position for linkage backward and contextualization forward.
  5. Articulate the action of every clause or sentence in its verb.
  6. In general, provide context for your reader before asking that reader to consider anything new.
  7. In general, try to ensure that the relative emphases of the substance coincide with the relative expectations for emphasis raised by the structure.
This article won't replace "The Elements of Style" by Strunk and White, but it is a useful addendum for the scientific writer.

July 26, 2010

Missing people in US phone surveys

It is true that people often like to denigrate statistics derived from survey data, but the reason that I hear most frequently - "But five thousand is less than 0.1% of 300 million!" - is not actually a significant source of error. The error to watch for more carefully is sampling bias.

For a long time, "random dialing" has been a great way to get a random sample that could include about 99% of the population (while the 1% without phones were generally considered to not need consideration for most purposes - everyone imagined jails and wilderness hermits in cabins). While incoming calls were free to telephone subscribers, and most of the population had phones, this was almost ideal. It wasn't actually sampling people, but households, since there was generally one phone per house. However, with this sampling frame it is possible to cleanly stratify by households size, and get back to estimating individuals fairly easily. At the beginning in the early 20th century, there was some problem with the number of households that had no telephone service, but as this shrank toward 1%, the error became negligible.

In the early 21st century, we now have a different problem: mobile phones often charge for incoming calls, so they are not allowed to be "random dialed", and more and more households are relying on them exclusively. How many? Well... I'll show you a picture.

You can clearly see a fairly dramatic age bias. It appears that 25 year-olds are at about 50% reachability by "random dialing", while other age groups may be as high as 90% reachable. Les obviously, that 50% has a pretty good chance at being correlated with other aspects of their lives. Anything that can be done? Maybe. At the very least, stratify your sample by age and use data like that in the above chart to correct for reachability by age group.

If you have another sampling technique available, you can try to use it to infer differences within an age group between with-house-phone and without-house-phone, and then adjust your results appropriately for that. If it's at all controversial though, brace yourself. Even though you should now have more valid results, the people who agree with the groups that were originally overrepresented will now direct very pointed charges at you of "lying with statistics." Even though all you actually did was "question with statistics to the best of your ability."

There are many biases that will be harder to find. The most usual one: the survey was planned by people who wanted to show "regular people" that is, the people like themselves. And a sampling plan was drawn up that seemed like a good way to meet "random" people. Often the same way they meet random people - where they met the people they now hang out with - people they generally are similar to and agree with. It's a very confirmatory feeling to have your survey agree with you, so people who have executed a survey this way may be a bit defensive if you suggest that it is perhaps just maybe a little tiny bit biased. Unfortunately, these surveys seem to be some of the most common ones in politics.

Update: The National Marine Fisheries Service is trialling a switch to postal surveys in light of the increasing problems with telephone surveys.