Why RSS+arXiv=Awesome

Not too long ago, I discovered that I can actually skim the arXiv quite fast with RSS feeds. Skimming the headlines in hep-ex, you can just ignore anything with “unparticle” or “New limits on [unmotivated model]-type models” in the title. For all other fields, if I can’t understand the title. I just skip it. The problem lately has been the physics and statistics feeds, which are often too goddamned funny and/or fascinating and/or horrifying to pass up. In “physics,” which is lots of sociology and bio, half the fun is trying to differentiate crackpots from interesting articles that are just speculative. No, looking at their affiliations is not sufficient. Reading stats, I get pangs of longing from how tech in the last decade has allowed people to measure and model crazy systems like ant social dynamics and human voting behavior.

Take The Nonequilibrium Nature of Culinary Evolution. for example. They take four cookbooks (one being medieval, one consisting of three editions spanning 50 years), and model the frequency of ingredients’ appearances. They further attempt to model the development of recipes as an evolutionary process. Cool idea, but poor execution. First problem is that they only include cookbooks from Brazil and Europe, and then argue “cultural invariance” of some parameters. Fail. They plot everything log-log without stating goodness-of-fit. Sketchy. I’m fairly sure their Zipfs functions stats are insensitive to ingredient outliers, which are really the heart of a culinary tradition. Their evolutionary approach is phenomenological, which is A-O.K., but they draw some feels-too-good-to-be-true conclusions therefrom, ie that cultural specificity is immune to alien encroachment. This is clearly violated by, say Italian cuisine before and after the introduction of the tomato. If you tried this with near and far Asian cookbooks, and a net database like about.com or Carnegie Mellon’s Recipe Database, you’d be getting somewhere.

Much more precise is Bayesball: A Bayesian Hierarchical Model for Evaluating Fielding in Major League Baseball.. I can’t say I’m a fan of Baseball, but I am a fan of strategy development using data generated for other purposes. They take precision camera data generated from MLB 2002-2005, which has a ball-in-play resolution of 4×4 feet. Assuming that a field is a quarter circle with R=375feet, that’s like having 1/4timespitimes375^2/16approx83times 83 pixels. That’s not too shabby. They state 120K balls-in-play, and take into account player information sharing, and co-blame when neighboring fielders fail. They do a model comparison with MLE, but It’s not clear to me exactly how well this comparison is performed. I’d love to hear what informed baseball fans think of the ratings for 10 best and 10 worst.


4 responses to “Why RSS+arXiv=Awesome

  1. You can use Yahoo Pipes to make your own very precise RSS feed. I use it to filter cond-mat and quant-ph for keywords like “josephson junction” and “quantum information.”

    See a nice tutorial here:

  2. Hi Homer,

    interesting article. I just wanted to say I’m a little worried how you say “take into account player information sharing”. They don’t model player interaction (a la game theory), but rather they do a partial-pooled estimate for individual performance. That is, players with less measurements get an estimate that’s mostly a function of more informed estimates of similar players. (i.e. an infielder who’s been in one game and got a safe score of 12 will get an estimated safe score much closer to the mean infielder score, a sort of regression). But maybe you knew that – the phrasing was just a bit odd to me so I decided to jump in.

    It’s cool that they used the camera – it seems like that’s the great frontier in sports statistics, using actual physical data. In my opinion, sports is the metrosexualized version of military statistics, which is way out of fashion compared to anytime up to and including the 70s. In this way, they are finally taking the mantle and using all-out empirics.

    I wonder how they got this data? Ah, yes, they got a grant from ESPN. (yawn) A thousand other people could have done what they did, they just happened to have some political connection. (Which of course means that the real work they didn’t publish; arXiv and even real journals are more or less self-promotion mechanisms for this kind of thing). I have several ideas to extend their analysis, as of course does anyone who knows Bayesian methods, but we can’t. The true, and dreary, way to see the world…

    Their comparison to MLE is gratuitous and silly in any context but one-quarter to one-half of a lecture in a graduate intro stats class – read e.g. ch.5 of _Bayesian Data Analysis_ 2nd ed.; Gelman et al. of you want to learn about it.

    As compared to this, the food article may actually be insane; Zipf’s law always makes me nervous, and even so I do wonder: If it is true, and truly cross-cultural, whether it implies anything interpretable about humanity or if it is just a trivial fact about nutrition value of ingredients. … But it reminds me of something I read in Saveur: how ingredients enter the diet for economic reasons and then become ‘frozen’, and thus we should judge authenticity in cuisine by reference to the context. I think he had an example of an organ meat casserole from rustic Italy which sells for >$20 in some restaurant and would take you cross-town trips even in NY to gather the ingredients. Which is ridiculous since it was peasant subsistence food; making a cheap and rich sauce to accompany and make generic-filler hotdogs edible would be much more “authentic” as far as cooking goes, and the organ dish should be served at a ren faire.

  3. Hey Tim, I was trying to be vague with the “information sharing” term, but I forgot that already has strong game theoretic connotations. I’ll be honest. I work with pretty trivial statistical methods. I can’t pretend to understand the nuts and bolts of that paper.

    The first time I heard about Zipfs law (in context of highway connectivity) the speaker called it “The Signature of Intelligent Activity” or some such nonsense, failing to mention that most the distributions in the universe can be acceptably fit to that family of functions. I get real skeptical when someone brings it out for anything other than a useful fit. In the end, everything looks good on a log-log plot. Without goodness of fit values or relative difference plots, those plots say nothing.

    The Saveur thing is a cool observation. I guess since until the last 50 years or so, food was inseparable from the family, and therefore was subject to the same cultural inertia. I wonder if you could argue some kind of diet diversity increase that would be separate from increases in economic production or transportation speed of goods, just based on the decoupling of diet from the family structure.

  4. Pingback: Bayesball: Update « Imaginary Potential

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s