Saturday, December 8, 2012

Brute Force: Missing Data (Part 4)

A Clockwork Orange, Relax by Frankie Goes To Hollywood and Nevermind the Bollocks. Which is the odd-one-out and why?

The answer is Nevermind the Bollocks because the other two were censored from cinema release in the former's case and from Radio 1 in the later. No-one will ever know for sure if Relax would have made it to the top of the pops if this hadn't happened. But it seems like Relax being absent on our radios actually helped it reach the dizzy heights in the charts.

In mathematics, when things go missing then theres work to be done with extra adjustment and ammendments to a full model needing to be made, often at some cost. Dealing with missingness can seriously complicate matters. Filling-in what isn't there needs to take into account what might have been there or how the missing parts affect all of the other parts too.

Removal of some component of a mathematical model can lead to involved and complex stuctures.

Now the problem is, this new missing-model with all of the extra bells and whistles is much harder to handle than the more well-behaved simpler model when we knew all the parts. Because of this situation, often the only solution to getting any work done is to resort to simulation.

Censored data is a common example of a missing data problem. Censored data is data where we do not necessarily observe the true value. In some instances instead of the true value of what ever it is that we're observing we only have the information that it is no bigger or smaller than some threshold value. In the case of time, for example, whats called right-censoring is when we stop observing something at some time so we only have information about it up to that point and not afterwards. For example, if people were monitored whilst in hospital for infection but not once they were discharged then the censored time would be at the time they leave hospital.


Lets suppose that we hadn't heard Relax on the radio upto a few days before the chart count down but then our radio broke, so it may have been played freely for all we know after that. Then if we wanted to make a guess about how well it was going to do in the charts we're missing a bit of important information (we'll leave whether bannning it was a good or bad thing for sales for the time being.)

Censoring means that if we want to calculate some statistics, say, using this data then we need to account for the fact that some of the data is censored. For example, if we wanted a mean average then this is not the simple mean as a sum of all the values divided by how many values there are. This is because some of those values we have are not actually the true value but just a lower limit of this. If we were to calculate an average like described then the resulting figure would be an underestimate. For example, we would estimate that the average time of infection in our hospital example is smaller than it actually is because we wouldn't be taking in to account the time after the patients leave hospital and before they get infected (if at all).
If we do take this censoring into account when we're trying to get our answer this means that things get more involved and complicated. One approach would be to sensibly fill-in the missing values using what information we do have at our disposal. Another is to change how we do the calculation. For example, the mean estimate for the censored time can be calculated using the probabilities of not yet having an infection at certain times, something called a survival time.
When the models are bigger and complex coming up with an alternative method, like the survival time formula, becomes even more difficult and the filling-in or imputing method begins to look more appealling. This can mean imputing lots and lots of missing values, over and over again, so that we get an idea about how the values we're coming up with are affecting the output. If you think this sounds like a job for brute force you'd be dead right.


Simulation Talk



If you read a newspaper last week you may have come across the Twin story. If you didn’t hear about it, let me fill you in.
This was a paired cohort trial across Europe, consisting of nearly 10k recruited-at-birth twin kids. Their parents agreed to raise one child healthy and one unhealthy up to 18 years of age. So, at last, science would be able to definitively answer questions about lifestyle.
The day-to-day habits of each twin were very different. For example, one twin took-up all kind of fitness hobbies like tennis, football, rugby, marathon running; the other took-up Playstation and crisps. One twin would do boxing and the other box-sets.
But maintaining this trial wasn’t always easy. Small children can be really stubborn when they want to be. We all know that kids are selfish and even when involved in the noble pursuit of concrete scientific evidence they can still only look-out for Number One. An example exchange over the dinner table went something like
  •  (twin) Muuuum, but I don’t want any more fags
  •  (mum) Well if you don’t smoke all your fags you won’t have any pudding! Here’s a lighter, now start smoking!!
Some kids were on 50-a-day. That’s fags and Big Macs, not to mention the booze. This was like Supersize Me for 0 to 18 year olds to once-and-for-all prove the effects of bad lifestyle on health.
The results? We had thousands of real-live Danny Devitos and Arnold Schwarzeneggers.
Of course, this isn’t a real trial. No children were forced to chain smoke or maintain a heavy drinking habit, mainly because of annoying things like “ethics” and “morals” getting in the way.
But there is a branch of scientific research that inhabits a world where you can make babies smoke and down flaming shots. No, not in Middlesbrough but the world of computer simulation.

Computer simulations allow us to experiment with “what-if” scenarios. What if I stopped smoking at 30? What if I get off the bus one stop early? What if I eat my 5-a-day? The simulated world is a bit like Sim City or World of Warcraft but without the elfs. We can investigate the effect of different interventions like giving over 60s statins or we can compare the effects of disease prevention against treatment.

Teams of computer scientists, statisticians, public health experts and clinicians develop superfast computer models, harnessing new computing power to produce an answer quicker than ever so we can simulate more and more people in more and more detail giving us better and better answers.
Probabilistic microsimulations can follow someone from day one at birth to inevitable death and can tell individual stories for 10s of millions of people, like the entire UK population.
In this time of austerity these models can help decisions about where to best put the available moneys for maximum impact.
But, like an American info-mertial, this isn’t all. There’s a way that these models can be used more directly to help you and me.
I went to the doctors recently and I swear he was looking up what was wrong with me on Wikipedia. I wouldn’t have minded so much but he said it was a toss-up between the plague and thrush.
But these PCs on GPs desks can also be put to use with the simulation models too. “Informatics” has the aim to provide desktop tools that can help the doctor and patient. Live evidence can be communicated to the patient about the effects of any change (or not) in their lifestyle. Using interactive widgets like sliders, dials and infographics the patient can be involved in the process and take ownership of their health decisions, which will make it more likely to be motivating and effective. They can see what likely outcomes are from what causes of action.
With more data available than ever before, detailed models and powerful, web-based and user-friendly interactive tools to use them then hopefully in the future there’ll be fewer Danny Devitos and more Arnold Schwarzeneggers.

Sunday, December 2, 2012

Brute Force (Part 3): Bruce & Ellie


The three great essentials to achieve anything worthwhile are: Hard work, Stick-to-itiveness, and Common sense.


 Thomas Edison

The book Thinking, Fast and Slow uses the analogy of the Type 1 and Type 2 ways of thinking with human characteristics: Type 1 is the fast thinking, intuitive, snap judgment guy and Type 2 is the slow thinking, logical, thorough kind of guy. But Type 2 is lazy and will prefer to put his feet up and let Type 1 take responsibility for decisions if it can get away with it.

In the same vein, meet Bruce Force and Eleanor Gent (Ellie to her friends).

Bruce is a stereotypical man’s man. He walks around in a lumberjack shirt or leather jacket with the collar up. He is, or at least likes to think he is, an Alpha male. He likes to take charge and order other people around. He thinks he knows the best way to do things and that best way is to get stuck in. Often he hasn’t the faintest idea. His house is full of DIY disasters where he just ploughed straight-in only to find a little later down the line he’s done it upside-down and back-to-front. The common occurrence of a failed IKEA construction is never his fault. Perhaps they’ve packed the wrong part. The very idea of consulting the instructions for Bruce is a laughable idea. He is persistent though. He is the very embodiment of Edison’s Stick-to-itiveness.

Ellie is altogether a different character. She wears thick milk bottle top glasses and a sensible bob haircut, which she cuts herself. She always has an eye for a bargain and can make, literally, pounds of savings by studiously buying what’s on offer at the supermarket and using vouchers and accrued Clubcard points. She prides herself on being prepared and always has a torch, spare batteries and a blanket in the boot of her economical car for when she goes camping, just in case. She is never one to rush in to a decision and hates being pressurised even when she’s deciding what to have on her Subways sandwich. Her home is an efficient, organised machine. She is never left looking for her car keys. And when she moves house everything is boxed-up and labeled for a smooth, alphabetised transition at the other end. She is not one for spontaneity and her social calendar is booked-up months in advance.

So Ellie and Bruce are two peas in very different pods. Neither always does things "the right way" but each have there moments of success and failure. Bruce is probably better on a night out but Ellie would be good to have around the next day to help make cups of tea.

But this isn't to say that Bruce and Ellie are always at odds. In fact, things can really get done when they work together. The Four Colour Problem is an example of when they worked as a team to solve a previously unsolvable problem. Ellie works-out some of the things that could be a possible solution and then Bruce charges-in, like a wound-up Duracell bunny, to try some of them out. Their skills can complement each other in cases like this. It's fair to say that a long-term relationship between the two is unrealistic though. Bruce’s sock draw would drive Ellie mad.

Monday, November 26, 2012

Brute Force (Part 2): hail the monkeys

Great ideas originate in the muscles
Thomas Edison

The phenomenal increase in computing power in recent times has allowed previously fanciful methods of attacking a problem to become feasible. Inspiration is no longer as prized as it once was since the perspiration can be off-loaded to the computer to silently toil away.

In many ways, this is a triumph for the monkey over Shakespeare; a case of the ape getting one over the bard. The famous analogy used to explain the concept of infinity is apt to explain the victory of brute force over sophistication too.

If a room full of an infinite number of monkeys (not a room I'd like to spend much time in) were each to bash away at a key board then they would surely come up with all of the works of Shakespeare, an infinite number of times indeed. This is a demonstration of the limiting case of a brute force approach ad nausium. Monkeys are representative in this scenario of unsophistication, of blind trial and error, or truly random attempts. But given enough monkey power, perhaps not an infinite amount, then we may still get something that resembles a half decent play. Or Jesus Christ Super Star, perhaps. This concession to approximation can be found everywhere in modern mathematics, statistics and computing and the results are often good enough for whatever question they are required to answer. In this analogy, monkeys equate to computing power, the blue collar work force of science. When monkeys are cheap labour why not just ring-up the job centre and hire some more, rather than spending time and effort trying to train one up. Monkey creative writing classes have taken a detrimental hit with this glut in cack-handed typists.


So, you may well ask is, what’s the big problem? Style has been usurped by strength but if it gets the job done as well or even better than before, isn’t that a good thing?

The answer to this question is not black and white, as is often the way.

On one side, the old way of doing things, which we’ll simply call the elegant way from now on, provided a deeper understanding of the problem. It forced us into delving deep and seeing thing that perhaps the cruder approaches wouldn’t. Being made to compare with other problems or think outside of the box has revealed the kind of connections and understanding that only comes from serious brain work.

On the other hand, simply getting down and getting on with it has allowed previously nightmare inducing problems to be combatted and defeated. Scientific impasses are not what they used to be. It’s fair to say that both approaches have their place. Like some slippery politician there really is no straight answer to which one to use. It depends. It’s just important to not get attached to doing things one way over the other without appreciating which tool is best for the job at hand. Sometime cracking a nut with a sledgehammer is just silly, unless the nut is as big as a house. But then maybe in the latter case we should really be worried about how big the squirrel is.

Sunday, November 25, 2012

Brute Force: The End of Elegance (Part 1) Intro

Our greatest weakness lies in giving up. The most certain way to succeed is always to try just one more time.
 Thomas Edison


This is the story of how brawn has got one-over on brains, how the high school jock has come out on top over the class nerd. This is the story of how humans have been defeated by machines. But before you rush down into your bunkers and prepare yourselves for a rationed diet of baked beans and spam for the foreseeable future, I’m not talking about an I, Robot or Terminator kind of robot revolution overthrowing their human oppressors. What I am talking about is how the machine, the computer, has replaced much of the imagination, subtlety and beauty of human thought with sheer, bloody-minded elbow grease- The power to number crunch and calculate at breathtaking speeds that, had anyone suggested only a few years ago, would have been thought loonier than George Loony starring in Loonraker.

Some of the best mathematics, for whatever "best" means, is often referred to as elegant. To call something elegant in the mathematics world is high praise indeed. Elegance can mean the simple solution to a seemingly complex problem, or a surprising route from A to B, by-passing the ugly commuter towns of C, D, E, F and G. An elegant solution is often short, quick and tidy. It’s the mathematical form of efficiency but with style- Imagine a mix of German/Italian stereotypes (without the Fascism). An elegant solution is more likely to arise from some serious head scratching. By working the grey matter before even picking up a pencil or chalk then we hope to save time and resources, as well as creating something with an artistic bent. Like reading the Sky+ manual before plugging it in, the setting-up will be less painful. Less haste, more speed was the mathematician’s mantra. To mix metaphors, step back and survey the landscape before diving in.

In the days when the only thing at the mathematician’s disposal was the pen and paper, they had little choice but to mull over the problem at hand until they could spot a clever way to tackle it. This might include wrapping-up the problem in a form they were more familiar with and could get a better grip of, or approximating bits of the problem so the maths wasn't so impenetrable. Or in some of the most elegant cases, this meant looking at the problem from a whole new perspective, standing on one leg on the table with one eye shut. These kind of insights can produce the "eureka" moment. Unfortunately, such moments are unpredictable and the chin stroking approach can be frustrating and lead to whole estates of dead ends, regardless of how many hot baths you take or apple trees you sit under.

In the world of science in recent years, things have changed. When inspiration has long deserted the cause and perspiration is all that is left there is another option. Elegance and beauty have been replaced by raw power and brute force. Like Linford Christie replacing Carl Lewis or Drogba replacing Messi. Now, often the first port of call for a mathematician is to plunge straight into pummeling the numbers. Act first and think later, like a cop working outside of the law, set on revenge.

Saturday, November 10, 2012

Conditional Life Expectancy


I got a card on my last birthday that said

Birthdays are good for you. Statistics prove that the people who have the most live the longest
Larry Lorenzoni

I googled Larry and it turns-out he has a suspect background with children, but that aside it's a pretty good quote for a card. I then realised that this is actually an interpretation of conditional life expectancy that you get in life-tables. Most life-tables are life-expectancy from birth but as you get old by the simple fact that you've survived up to that point your life expectancy changes. You are more likely to live longer by having made it to another birthday. Its not just the birthday that you've just reached but your expected time of death will be nudged further off into the future too. The gap between now and the fateful day may get smaller but there'll always be a bit more life tagged-on to where you are today.

Looking at the quote another way, its a confusion with cause and effect. Its not that people who have the most birthdays live the longest but rather that people who live the longest have the most. A birthday is a consequence of ageing and not the other way around.

Then of course, you could just say that its a birthday card and stop sucking all the fun out of it. Unfortunately though, with age I think fun-sucking happens more and more.