Random Observations

Sunday, September 1, 2013

I'm out of there

It definitely wasn't them, it was me. I suppose we'd just grown apart but it still hurt that they'd found someone else so quickly. Someone more fun and interesting.

The feeling of getting dumped is one of the worst things. Great writers, poets and musicians have written countless works on the topic. Romeo and Juliet begins with Romeo in the throws of unrequited love. Arguably Bob Dylan's best album Blood On The Tracks is considered his "break-up" album. Nearly everyone (bar the 40 year old virgin types) has felt the sense of rejection and self-pity common to so many painful splits.

Well here's my contribution to these annals of great works. Last week I got unceremoniously dumped...by a hoard of 12 years olds! Or at least if felt like I was being dumped.

I'd been taking part in the event I'm a Scientist, Get Me Out of Here-A free online event where school kids get to hang-out with and quiz scientists. It’s an X Factor-style competition between scientists from all different backgrounds where the students are the judges. Imagine hundreds of mini Simon Cowells. Groups of scientists are split-up into "Zones" of about half a dozen and the students challenge the scientists with all kinds of whacky, irreverent questions in online chat rooms. Each half hour slot with the kids is a non-stop, carpel tunnel inducing, fast-paced frenzy of txt spk and inquisition. A whole event lasts a week and each day someone gets voted out by the students for being their least favourite scientist. Other than the education and fun aspects there's a £500 prize for each Zone scientist to win to go on and communicate their work with the public after they've recouperated and regained the feeling in their fingers.

In my case, it seemed to all start so promisingly.

Like most early stages of a relationship we wanted to know a little about each other. What do you do for a living? Do you like your job? Why do we have two nostrils?

But it wasn't long before someone else caught their eye. And as my jealousy grew, they brazenly shunned me. I would sit in the virtual equivalent to the corner of a room and watch as they would chat and laugh with the other scientist. With every new emoticon a dagger was plunged deeper in to my self esteem as I realised that perhaps I wasn't such a groovy, down-with-the-kids statistician after all.

An d soon enough I'd joined the ranks of Stacy Solemon, Will Young and Girls Aloud. I'd been evicted.

To justify my failure I've convinced myself that it wasn't a level playing field. I'm a statistician. I was in a Zone with 5 other scientist with widely varying fields. In particular, the final 3 scientist we're young attractive and enthusiastic life science types. One of them works in marine science in a lab overlooking Table mountain in South Africa and another works in the jungle doing something with monkeys. In actual fact, the answers we all gave to the question asked of us weren't so different. However, the profile of monkey hugging and swimming with dolphins was clearly better than being the guy with a really good calculator. But of course this is all sour grapes. My Zone winner won because she deserved it.

The I’m a Scientist event was initially run in 2008 and has gone from strength-to-strength. It now runs twice a year, with a smaller event in March leading up to the main June event. I strongly encourage any scientist to take part. After all, its better to have loved and lost than never to have loved at all.

Friday, January 25, 2013

Brute Force: Agent Orang-utan

Accompanying any new Bond film is a cornucopia of branding; from watch sponsorships to cars and mobile phones. Bond is seen modelling sharp suits and designer shades. But there is another kind of agent modelling that has become increasingly popular in recent times due to available computing resources and the preference for our friend brute force.

Agent-based modelling is not something to do with 007. It’s an approach that enables what is called parallelisation, another drinking buddy to brute force.

The idea is to split-up the work-load into neat packets of autonomous tasks. Then divvy-up the task between the available resources and let them each just get on with it. Each worker or “agent” is assigned certain behaviours like always go left say, and, like a worker bee or ant, they knuckle down to the task in hand without worrying what everyone else is up to.

Because agents aren’t very social-types they can do their particular job at the same time as other agents, hence in parallel. At certain times they may report back to base with an update of how they’re doing or what they found-out. This could be when something happens or at a set time agreed with HQ.

Agents get their head down. If you give then a job to do they’ll carry-on doing it happily for however long you like, just don’t make it too complicated.

For example, if you wanted to find something you’d lost, car keys say, then you could set a team of agents on to the task of finding it. They’ll beaver away looking for it, maybe searching a small area each or randomly zig-zagging about the place. The trade-off may be between quantity and quality. Imagine you only had a set amount of cash to pay the search team with. If you were to have one, expensive guy hunting then he may move quicker than any cheap labour individual agent and be more skilled but there’s still only one of him. A separate agent could be a computer or one of the processors on a computer.

Splitting-up the work is not always such a simple thing to do though. Not everything can be parallelised and sometimes the effort of getting things into a fit shape so that it can be is just not worth it. In the lost car keys example it’s obvious how to share the task around, in space. But for other problems it’s not so straightforward. Imagine a task that each step was reliant on the previous, like on a production line, say making a car. Things could be made in parallel but when it comes to putting the car together certain things have to go in first and others later, like say, the steering column before the steering wheel. These things must therefore be done in serial, i.e. one after the other. Even in the searching problem it may be that the agents don’t find anything at first then they all meet-up and have a chat and decide to change how they’re going to search as a group, like focus more on one area over another. Each of these get-togethers would be serial events.

Agent-based systems can be relatively little hassle because once they’ve been set on their merry little way they can be left unsupervised to get on with it. The downside to this is that they’ll only act in the way that you’ve told to them to, which often means in a simple and loner way- A bit like a Goth high school drop-out.

Brute Force: simulation

Simulation has got a pretty negative “rep” in recent years. It has come to be associated with faking it, inauthenticity and downright cheating. Simulation in sport is to feint injury or a foul in order to get an advantage or disadvantage the opposition. Players regularly throw themselves to the floor on a football pitch, arms out-stretched simulating a heinous foul so they can get a free kick or penalty or to get the other player booked or sent-off. Calling this simulation, rather than, say, cheating, is a relatively recent thing. Some football cultures see this drama as being clever, having the skill to convince the referee to be swayed in your favour. But this view point is in the minority, especially in the UK, and for a player to practice simulation is to be a bad boy.

Another use of simulation could be said when a participant is not being enough of a bad boy. By this I’m referring to the When Harry Met Sally sort of simulation. That is, simulation by the fairer sex in the act of love-making; hair swishing, table thumping faking. This particular form of simulation is also bad news, especially if your mates hear about it.

But it is not true that all simulation involves making a scene in one form or another. Faking it can be a good thing, and no-one’s feeling should get hurt in the process.

Simulation in a general sense is to make pretend and human beings do simulation every day, innumerable times. Simulation is what we do in our heads when we imagine something that isn’t real. Usually this is thinking of a fictional future to see what happens. These simulations take split seconds and often we’re not even aware of it. Simulation can be fun- imagining we win the lottery then what would we do with the money- to preventing serous peril- crossing the road in heavy traffic. Simulation in this form is called prediction and is how we can make half decent decisions. Prediction also happens when we want to approach a beautiful woman on the other side of the room. Our eyes meet hers and then “boom”, the minds races with chat-up lines and openers. We imagine what the response might be, the worst case scenario of a glass of wine in the face to the best- a wedding, kids, retirement on the beach.

Friday, January 4, 2013

Stats Apps

I was reading recently about projects which "gamify" your PhD. That means they take a PhD topic and make it into a computer game. The idea is this is a way to engage with the public, part of science communication. The FameLab and I'm a Scientistic Get Me Out Of Here scheme are other ways to get people interested by pitching PhDs in fun and simple terms.
In a similar vein to making computer games out of PhD projects I thought I'd see what apps there are out there around statistics. I have an Android phone so looked on the Google Play site. What I found was rather a lot, more than I expected at least. Generally they are simple tools for recording and dispaying simple summaries and graphs of small data samples. The user enters in the data by hand and then can choose things like descriptive statistics or histograms and regressions.
The other sort of app is the explanation tool. This is really a condensed version of Wikipedia.

I thought I'd try and come-up with some of my own ideas for stats apps:

something to do with public health/cardiovascular disease/lifestyle risk factors. Maybe a calculator that takes how much excercies, diet and weight and gives the risks of things

something that uses census data and makes it specific to you (where you live, age groups,...)

[UPDATE 5/1/13 just found this which looks like a slick Beta version of what I had in mind!]

something like a rumour/infection app. you can only get it by blue-tooth with another mobile (after the intial population has been set-up). so the idea is to gather data as part of an experiment to see how people interact and the spread of a "rumour"

then it returns interesting statistics to the mobile app like what number in the chain you are, how long after initial infection (speed), network diagrams of infected with phone names labels (degrees of separation)
the infection rate and recovery rate could be effected by imposing a time limit on how long the phone is infectious (can transmit app), prizes and incentives to passing on the app (e.g. every 100 people theres a prize draw for an iPad)...
practically, there could be 2 big buttons on the phones home screen: one if the infected person tries to pass it on i.e. mentions it their friend and another to send it
in a time of bird flu, swine flu, etc this could be useful information about how people interact. it could also show how word-of-mouth or gossip works- often considered the best form of advertising.

Saturday, December 8, 2012

Brute Force: Missing Data (Part 4)

A Clockwork Orange, Relax by Frankie Goes To Hollywood and Nevermind the Bollocks. Which is the odd-one-out and why?

The answer is Nevermind the Bollocks because the other two were censored from cinema release in the former's case and from Radio 1 in the later. No-one will ever know for sure if Relax would have made it to the top of the pops if this hadn't happened. But it seems like Relax being absent on our radios actually helped it reach the dizzy heights in the charts.

In mathematics, when things go missing then theres work to be done with extra adjustment and ammendments to a full model needing to be made, often at some cost. Dealing with missingness can seriously complicate matters. Filling-in what isn't there needs to take into account what might have been there or how the missing parts affect all of the other parts too.

Removal of some component of a mathematical model can lead to involved and complex stuctures.

Now the problem is, this new missing-model with all of the extra bells and whistles is much harder to handle than the more well-behaved simpler model when we knew all the parts. Because of this situation, often the only solution to getting any work done is to resort to simulation.

Censored data is a common example of a missing data problem. Censored data is data where we do not necessarily observe the true value. In some instances instead of the true value of what ever it is that we're observing we only have the information that it is no bigger or smaller than some threshold value. In the case of time, for example, whats called right-censoring is when we stop observing something at some time so we only have information about it up to that point and not afterwards. For example, if people were monitored whilst in hospital for infection but not once they were discharged then the censored time would be at the time they leave hospital.

Lets suppose that we hadn't heard Relax on the radio upto a few days before the chart count down but then our radio broke, so it may have been played freely for all we know after that. Then if we wanted to make a guess about how well it was going to do in the charts we're missing a bit of important information (we'll leave whether bannning it was a good or bad thing for sales for the time being.)

Censoring means that if we want to calculate some statistics, say, using this data then we need to account for the fact that some of the data is censored. For example, if we wanted a mean average then this is not the simple mean as a sum of all the values divided by how many values there are. This is because some of those values we have are not actually the true value but just a lower limit of this. If we were to calculate an average like described then the resulting figure would be an underestimate. For example, we would estimate that the average time of infection in our hospital example is smaller than it actually is because we wouldn't be taking in to account the time after the patients leave hospital and before they get infected (if at all).
If we do take this censoring into account when we're trying to get our answer this means that things get more involved and complicated. One approach would be to sensibly fill-in the missing values using what information we do have at our disposal. Another is to change how we do the calculation. For example, the mean estimate for the censored time can be calculated using the probabilities of not yet having an infection at certain times, something called a survival time.
When the models are bigger and complex coming up with an alternative method, like the survival time formula, becomes even more difficult and the filling-in or imputing method begins to look more appealling. This can mean imputing lots and lots of missing values, over and over again, so that we get an idea about how the values we're coming up with are affecting the output. If you think this sounds like a job for brute force you'd be dead right.

Simulation Talk

If you read a newspaper last week you may have come across the Twin story. If you didn’t hear about it, let me fill you in.

This was a paired cohort trial across Europe, consisting of nearly 10k recruited-at-birth twin kids. Their parents agreed to raise one child healthy and one unhealthy up to 18 years of age. So, at last, science would be able to definitively answer questions about lifestyle.

The day-to-day habits of each twin were very different. For example, one twin took-up all kind of fitness hobbies like tennis, football, rugby, marathon running; the other took-up Playstation and crisps. One twin would do boxing and the other box-sets.

But maintaining this trial wasn’t always easy. Small children can be really stubborn when they want to be. We all know that kids are selfish and even when involved in the noble pursuit of concrete scientific evidence they can still only look-out for Number One. An example exchange over the dinner table went something like

(twin) Muuuum, but I don’t want any more fags
(mum) Well if you don’t smoke all your fags you won’t have any pudding! Here’s a lighter, now start smoking!!

Some kids were on 50-a-day. That’s fags and Big Macs, not to mention the booze. This was like Supersize Me for 0 to 18 year olds to once-and-for-all prove the effects of bad lifestyle on health.

The results? We had thousands of real-live Danny Devitos and Arnold Schwarzeneggers.

Of course, this isn’t a real trial. No children were forced to chain smoke or maintain a heavy drinking habit, mainly because of annoying things like “ethics” and “morals” getting in the way.

But there is a branch of scientific research that inhabits a world where you can make babies smoke and down flaming shots. No, not in Middlesbrough but the world of computer simulation.

Computer simulations allow us to experiment with “what-if” scenarios. What if I stopped smoking at 30? What if I get off the bus one stop early? What if I eat my 5-a-day? The simulated world is a bit like Sim City or World of Warcraft but without the elfs. We can investigate the effect of different interventions like giving over 60s statins or we can compare the effects of disease prevention against treatment.

Teams of computer scientists, statisticians, public health experts and clinicians develop superfast computer models, harnessing new computing power to produce an answer quicker than ever so we can simulate more and more people in more and more detail giving us better and better answers.

Probabilistic microsimulations can follow someone from day one at birth to inevitable death and can tell individual stories for 10s of millions of people, like the entire UK population.

In this time of austerity these models can help decisions about where to best put the available moneys for maximum impact.

But, like an American info-mertial, this isn’t all. There’s a way that these models can be used more directly to help you and me.

I went to the doctors recently and I swear he was looking up what was wrong with me on Wikipedia. I wouldn’t have minded so much but he said it was a toss-up between the plague and thrush.

But these PCs on GPs desks can also be put to use with the simulation models too. “Informatics” has the aim to provide desktop tools that can help the doctor and patient. Live evidence can be communicated to the patient about the effects of any change (or not) in their lifestyle. Using interactive widgets like sliders, dials and infographics the patient can be involved in the process and take ownership of their health decisions, which will make it more likely to be motivating and effective. They can see what likely outcomes are from what causes of action.

With more data available than ever before, detailed models and powerful, web-based and user-friendly interactive tools to use them then hopefully in the future there’ll be fewer Danny Devitos and more Arnold Schwarzeneggers.

Sunday, December 2, 2012

Brute Force (Part 3): Bruce & Ellie

The three great essentials to achieve anything worthwhile are: Hard work, Stick-to-itiveness, and Common sense.

Thomas Edison

The book Thinking, Fast and Slow uses the analogy of the Type 1 and Type 2 ways of thinking with human characteristics: Type 1 is the fast thinking, intuitive, snap judgment guy and Type 2 is the slow thinking, logical, thorough kind of guy. But Type 2 is lazy and will prefer to put his feet up and let Type 1 take responsibility for decisions if it can get away with it.

In the same vein, meet Bruce Force and Eleanor Gent (Ellie to her friends).

Bruce is a stereotypical man’s man. He walks around in a lumberjack shirt or leather jacket with the collar up. He is, or at least likes to think he is, an Alpha male. He likes to take charge and order other people around. He thinks he knows the best way to do things and that best way is to get stuck in. Often he hasn’t the faintest idea. His house is full of DIY disasters where he just ploughed straight-in only to find a little later down the line he’s done it upside-down and back-to-front. The common occurrence of a failed IKEA construction is never his fault. Perhaps they’ve packed the wrong part. The very idea of consulting the instructions for Bruce is a laughable idea. He is persistent though. He is the very embodiment of Edison’s Stick-to-itiveness.

Ellie is altogether a different character. She wears thick milk bottle top glasses and a sensible bob haircut, which she cuts herself. She always has an eye for a bargain and can make, literally, pounds of savings by studiously buying what’s on offer at the supermarket and using vouchers and accrued Clubcard points. She prides herself on being prepared and always has a torch, spare batteries and a blanket in the boot of her economical car for when she goes camping, just in case. She is never one to rush in to a decision and hates being pressurised even when she’s deciding what to have on her Subways sandwich. Her home is an efficient, organised machine. She is never left looking for her car keys. And when she moves house everything is boxed-up and labeled for a smooth, alphabetised transition at the other end. She is not one for spontaneity and her social calendar is booked-up months in advance.

So Ellie and Bruce are two peas in very different pods. Neither always does things "the right way" but each have there moments of success and failure. Bruce is probably better on a night out but Ellie would be good to have around the next day to help make cups of tea.

But this isn't to say that Bruce and Ellie are always at odds. In fact, things can really get done when they work together. The Four Colour Problem is an example of when they worked as a team to solve a previously unsolvable problem. Ellie works-out some of the things that could be a possible solution and then Bruce charges-in, like a wound-up Duracell bunny, to try some of them out. Their skills can complement each other in cases like this. It's fair to say that a long-term relationship between the two is unrealistic though. Bruce’s sock draw would drive Ellie mad.