Good question! Log transformation of data is a common technique to deal with several problems: typically anything where the scatterplot of a relationship look log-normal, "looks like an exponential curve", has "too long" of a rightward tail, or numerous other things. I will leave aside the question of whether the log transform is the right one in a particular instance, and proceed directly to today's (well, yesterday's) question:
When analyzing the residual deviations after the model is fit, what does the deviation of the log of the variable of interest mean? And make it friendly, statistics guy!Well... ouch. I don't like the implication of "y'all are hard to understand". But then again, a lot of people feel that way, and that's why analysts are paid to be analysts. So what to do? What do "regular people" want to see, that is also a good representation of the reality?
Percent deviation from predicted.And here's what they don't want to hear (but you want to do):
- If you have "standardized residuals" (how very statistical of you!), first multiply by the RMSE to get actual deviations.
- Take the exponent of the deviations, and respect the sign of the deviation!
- Now you have the ratio of measured over predicted. If it's greater than 1, the subtract one and multiply by 100 to get the percent "high". If it's less than one, then subtract the ratio from one and multiply by 100 to get the percent low.
Example, just for practice:
Say your RMSE is 0.5. Then one deviation high (positive) is e to the 0.2: about 1.65; and one deviation low is e to the -0.5: about .61.
So one up is about 65% high, and one down is about 39% low.
One additional note here: symmetrical percent bands, like 10% to 40% high or low, will be misleading because the "high" band is actually expected to have smaller counts than the "low" band if a log model is correct. But this is the way that people have become accustomed to having data presented, and people think of it as "fair", despite this potential bias.
No comments:
Post a Comment