Not too long ago, I discovered that I can actually skim the arXiv quite fast with RSS feeds. Skimming the headlines in hep-ex, you can just ignore anything with “unparticle” or “New limits on [unmotivated model]-type models” in the title. For all other fields, if I can’t understand the title. I just skip it. The problem lately has been the physics and statistics feeds, which are often too goddamned funny and/or fascinating and/or horrifying to pass up. In “physics,” which is lots of sociology and bio, half the fun is trying to differentiate crackpots from interesting articles that are just speculative. No, looking at their affiliations is not sufficient. Reading stats, I get pangs of longing from how tech in the last decade has allowed people to measure and model crazy systems like ant social dynamics and human voting behavior.
Take The Nonequilibrium Nature of Culinary Evolution. for example. They take four cookbooks (one being medieval, one consisting of three editions spanning 50 years), and model the frequency of ingredients’ appearances. They further attempt to model the development of recipes as an evolutionary process. Cool idea, but poor execution. First problem is that they only include cookbooks from Brazil and Europe, and then argue “cultural invariance” of some parameters. Fail. They plot everything log-log without stating goodness-of-fit. Sketchy. I’m fairly sure their Zipfs functions stats are insensitive to ingredient outliers, which are really the heart of a culinary tradition. Their evolutionary approach is phenomenological, which is A-O.K., but they draw some feels-too-good-to-be-true conclusions therefrom, ie that cultural specificity is immune to alien encroachment. This is clearly violated by, say Italian cuisine before and after the introduction of the tomato. If you tried this with near and far Asian cookbooks, and a net database like about.com or Carnegie Mellon’s Recipe Database, you’d be getting somewhere.
Much more precise is Bayesball: A Bayesian Hierarchical Model for Evaluating Fielding in Major League Baseball.. I can’t say I’m a fan of Baseball, but I am a fan of strategy development using data generated for other purposes. They take precision camera data generated from MLB 2002-2005, which has a ball-in-play resolution of 4×4 feet. Assuming that a field is a quarter circle with feet, that’s like having pixels. That’s not too shabby. They state 120K balls-in-play, and take into account player information sharing, and co-blame when neighboring fielders fail. They do a model comparison with MLE, but It’s not clear to me exactly how well this comparison is performed. I’d love to hear what informed baseball fans think of the ratings for 10 best and 10 worst.