Saturday, December 8, 2012

Brute Force: Missing Data (Part 4)

A Clockwork Orange, Relax by Frankie Goes To Hollywood and Nevermind the Bollocks. Which is the odd-one-out and why?

The answer is Nevermind the Bollocks because the other two were censored from cinema release in the former's case and from Radio 1 in the later. No-one will ever know for sure if Relax would have made it to the top of the pops if this hadn't happened. But it seems like Relax being absent on our radios actually helped it reach the dizzy heights in the charts.

In mathematics, when things go missing then theres work to be done with extra adjustment and ammendments to a full model needing to be made, often at some cost. Dealing with missingness can seriously complicate matters. Filling-in what isn't there needs to take into account what might have been there or how the missing parts affect all of the other parts too.

Removal of some component of a mathematical model can lead to involved and complex stuctures.

Now the problem is, this new missing-model with all of the extra bells and whistles is much harder to handle than the more well-behaved simpler model when we knew all the parts. Because of this situation, often the only solution to getting any work done is to resort to simulation.

Censored data is a common example of a missing data problem. Censored data is data where we do not necessarily observe the true value. In some instances instead of the true value of what ever it is that we're observing we only have the information that it is no bigger or smaller than some threshold value. In the case of time, for example, whats called right-censoring is when we stop observing something at some time so we only have information about it up to that point and not afterwards. For example, if people were monitored whilst in hospital for infection but not once they were discharged then the censored time would be at the time they leave hospital.


Lets suppose that we hadn't heard Relax on the radio upto a few days before the chart count down but then our radio broke, so it may have been played freely for all we know after that. Then if we wanted to make a guess about how well it was going to do in the charts we're missing a bit of important information (we'll leave whether bannning it was a good or bad thing for sales for the time being.)

Censoring means that if we want to calculate some statistics, say, using this data then we need to account for the fact that some of the data is censored. For example, if we wanted a mean average then this is not the simple mean as a sum of all the values divided by how many values there are. This is because some of those values we have are not actually the true value but just a lower limit of this. If we were to calculate an average like described then the resulting figure would be an underestimate. For example, we would estimate that the average time of infection in our hospital example is smaller than it actually is because we wouldn't be taking in to account the time after the patients leave hospital and before they get infected (if at all).
If we do take this censoring into account when we're trying to get our answer this means that things get more involved and complicated. One approach would be to sensibly fill-in the missing values using what information we do have at our disposal. Another is to change how we do the calculation. For example, the mean estimate for the censored time can be calculated using the probabilities of not yet having an infection at certain times, something called a survival time.
When the models are bigger and complex coming up with an alternative method, like the survival time formula, becomes even more difficult and the filling-in or imputing method begins to look more appealling. This can mean imputing lots and lots of missing values, over and over again, so that we get an idea about how the values we're coming up with are affecting the output. If you think this sounds like a job for brute force you'd be dead right.


Simulation Talk



If you read a newspaper last week you may have come across the Twin story. If you didn’t hear about it, let me fill you in.
This was a paired cohort trial across Europe, consisting of nearly 10k recruited-at-birth twin kids. Their parents agreed to raise one child healthy and one unhealthy up to 18 years of age. So, at last, science would be able to definitively answer questions about lifestyle.
The day-to-day habits of each twin were very different. For example, one twin took-up all kind of fitness hobbies like tennis, football, rugby, marathon running; the other took-up Playstation and crisps. One twin would do boxing and the other box-sets.
But maintaining this trial wasn’t always easy. Small children can be really stubborn when they want to be. We all know that kids are selfish and even when involved in the noble pursuit of concrete scientific evidence they can still only look-out for Number One. An example exchange over the dinner table went something like
  •  (twin) Muuuum, but I don’t want any more fags
  •  (mum) Well if you don’t smoke all your fags you won’t have any pudding! Here’s a lighter, now start smoking!!
Some kids were on 50-a-day. That’s fags and Big Macs, not to mention the booze. This was like Supersize Me for 0 to 18 year olds to once-and-for-all prove the effects of bad lifestyle on health.
The results? We had thousands of real-live Danny Devitos and Arnold Schwarzeneggers.
Of course, this isn’t a real trial. No children were forced to chain smoke or maintain a heavy drinking habit, mainly because of annoying things like “ethics” and “morals” getting in the way.
But there is a branch of scientific research that inhabits a world where you can make babies smoke and down flaming shots. No, not in Middlesbrough but the world of computer simulation.

Computer simulations allow us to experiment with “what-if” scenarios. What if I stopped smoking at 30? What if I get off the bus one stop early? What if I eat my 5-a-day? The simulated world is a bit like Sim City or World of Warcraft but without the elfs. We can investigate the effect of different interventions like giving over 60s statins or we can compare the effects of disease prevention against treatment.

Teams of computer scientists, statisticians, public health experts and clinicians develop superfast computer models, harnessing new computing power to produce an answer quicker than ever so we can simulate more and more people in more and more detail giving us better and better answers.
Probabilistic microsimulations can follow someone from day one at birth to inevitable death and can tell individual stories for 10s of millions of people, like the entire UK population.
In this time of austerity these models can help decisions about where to best put the available moneys for maximum impact.
But, like an American info-mertial, this isn’t all. There’s a way that these models can be used more directly to help you and me.
I went to the doctors recently and I swear he was looking up what was wrong with me on Wikipedia. I wouldn’t have minded so much but he said it was a toss-up between the plague and thrush.
But these PCs on GPs desks can also be put to use with the simulation models too. “Informatics” has the aim to provide desktop tools that can help the doctor and patient. Live evidence can be communicated to the patient about the effects of any change (or not) in their lifestyle. Using interactive widgets like sliders, dials and infographics the patient can be involved in the process and take ownership of their health decisions, which will make it more likely to be motivating and effective. They can see what likely outcomes are from what causes of action.
With more data available than ever before, detailed models and powerful, web-based and user-friendly interactive tools to use them then hopefully in the future there’ll be fewer Danny Devitos and more Arnold Schwarzeneggers.

Sunday, December 2, 2012

Brute Force (Part 3): Bruce & Ellie


The three great essentials to achieve anything worthwhile are: Hard work, Stick-to-itiveness, and Common sense.


 Thomas Edison

The book Thinking, Fast and Slow uses the analogy of the Type 1 and Type 2 ways of thinking with human characteristics: Type 1 is the fast thinking, intuitive, snap judgment guy and Type 2 is the slow thinking, logical, thorough kind of guy. But Type 2 is lazy and will prefer to put his feet up and let Type 1 take responsibility for decisions if it can get away with it.

In the same vein, meet Bruce Force and Eleanor Gent (Ellie to her friends).

Bruce is a stereotypical man’s man. He walks around in a lumberjack shirt or leather jacket with the collar up. He is, or at least likes to think he is, an Alpha male. He likes to take charge and order other people around. He thinks he knows the best way to do things and that best way is to get stuck in. Often he hasn’t the faintest idea. His house is full of DIY disasters where he just ploughed straight-in only to find a little later down the line he’s done it upside-down and back-to-front. The common occurrence of a failed IKEA construction is never his fault. Perhaps they’ve packed the wrong part. The very idea of consulting the instructions for Bruce is a laughable idea. He is persistent though. He is the very embodiment of Edison’s Stick-to-itiveness.

Ellie is altogether a different character. She wears thick milk bottle top glasses and a sensible bob haircut, which she cuts herself. She always has an eye for a bargain and can make, literally, pounds of savings by studiously buying what’s on offer at the supermarket and using vouchers and accrued Clubcard points. She prides herself on being prepared and always has a torch, spare batteries and a blanket in the boot of her economical car for when she goes camping, just in case. She is never one to rush in to a decision and hates being pressurised even when she’s deciding what to have on her Subways sandwich. Her home is an efficient, organised machine. She is never left looking for her car keys. And when she moves house everything is boxed-up and labeled for a smooth, alphabetised transition at the other end. She is not one for spontaneity and her social calendar is booked-up months in advance.

So Ellie and Bruce are two peas in very different pods. Neither always does things "the right way" but each have there moments of success and failure. Bruce is probably better on a night out but Ellie would be good to have around the next day to help make cups of tea.

But this isn't to say that Bruce and Ellie are always at odds. In fact, things can really get done when they work together. The Four Colour Problem is an example of when they worked as a team to solve a previously unsolvable problem. Ellie works-out some of the things that could be a possible solution and then Bruce charges-in, like a wound-up Duracell bunny, to try some of them out. Their skills can complement each other in cases like this. It's fair to say that a long-term relationship between the two is unrealistic though. Bruce’s sock draw would drive Ellie mad.