I went to the weekly computing seminar yesterday, because it was on a statistical data mining tool that is being simultaneously used by physics experiments and marketing firms. The speaker is a physics professor, used to work for the PLUTO Collaboration, DELPHI, and now is variously associated with CDF and CMS. The company, Phi-T, is now totally private, and employs a couple dozen ex-physicists, or physicists, depending on how much of a purist you are. The software is proprietary and closed source, and the speaker was severely vague about what specific tools were actually used, but there is an interface in C++/ROOT/C#/Lisp already made, so its (supposedly) trivial to use, with a discounted academic licence.
So what is it? Basically, you have a vector of measureables, like detector channels, and some target, like say Resonance mass. Or Age, profession, #kids, and your target is “How much will this person cost us in Health Care in the next n years.” You then train the thing on your historic or simulated data, and it generates Bayesian posteriori distributions for new data. This is pretty common in neural computing literature, but this thing seems actually practical.
The only really fascinating thing is the generality of the thing, which was (supposedly) applied with minimal expert consultation on problems like car insurance premiums, to B_s mixing at CDF. Here’s a list of referred journal articles with their stamp. So whats inside? A neural network you say? No! The guy said in most applications they skip the neural net entirely and just use “Other” statistical methods. It’s clear that he was using some kind of input decorrelation like principle component analysis, but he wouldn’t say what specifically. He used a bunch of phrases that were cryptic to me like “zero layer network” to mean something other than a perceptron (I asked), and “zero iteration training” of a network. Maybe these things mean something to yall statters, but nothing to me. Anyways, the output of whatever was a discretized probability histogram that got splined together.
I’m unconvinced that the “default settings” he mentioned could schedule re-stocks for the largest book distributor in Germany AND find the X(3872) resonance, but what do I know? He also said that the companies own stock were controlled by this thing, but that selling it for this purpose is somehow illegal. Anyone know what he was talking about? Here’s a paper on it, by the speaker.
In the end, the talk was a sales-pitch/head-hunt, but if anyone out there needs to solve a highly nonlinear problem and has a cushy grant, go nuts.
Sorry to say this but that tag is quite a monster – it should read “Bayesian methods”. Yeesh.
PCA is a horrible choice for pre-processing step for a few reasons (without reweighting it makes any training afterward rotationally symmetric), but yes it is probably a proprietary something else.
“Zero layer network” sounds meaningless at face value. Zero iteration sounds like it follows immediately from zero layers…
The rest of it made me think that maybe it’s a working version of winBugs (http://www.mrc-bsu.cam.ac.uk/bugs/) which earns its name in both respects (windows only and full of bugs). It is kind-of free.
The free (and supposedly better performing) version is here: http://www-ice.iarc.fr/~martyn/software/jags/
The “gs” in both of these stands for “Gibbs sampling” and as I understand they use slice sampling/importance sampling to implement Gibbs for probabilities with a non-closed form for the conditional distribution.
Anyway, it seems (from your description and the website) like the step up from those proggies to neurobayes is its claimed “density reconstruction” whereby it learns probability distributions ab initio. Quite a claim – it’s easy to see how this is a gruesomely under-specified problem (unless you have \infty data points). I’m sure that this software will find a solution for these problems… Of course I can give you one too: the dirac mass on your observed values.
Then again you can do much worse than just using normal assumptions (maybe autochecking a few heavy-tailed/one-sided/clever alternatives using some information criterion against overfitting) and churning on that. 95% of the result in 5% of the effort…
(btw, regarding the illegal thing: I knew someone who claimed to have been almost kicked off the Floor by his company for running proprietary strategies in perl instead of the standard C/C++. Finance is a weird tribe and I’m sure they live in fear of their regulators giving wide berth to what laws exist, while flagrantly exploiting the _undiscovered_ evils.)