It definitely wasn't them, it was me. I suppose we'd just grown apart but it still hurt that they'd found someone else so quickly. Someone more fun and interesting.
The feeling of getting dumped is one of the worst things. Great writers, poets and musicians have written countless works on the topic. Romeo and Juliet begins with Romeo in the throws of unrequited love. Arguably Bob Dylan's best album Blood On The Tracks is considered his "break-up" album. Nearly everyone (bar the 40 year old virgin types) has felt the sense of rejection and self-pity common to so many painful splits.
Well here's my contribution to these annals of great works. Last week I got unceremoniously dumped...by a hoard of 12 years olds! Or at least if felt like I was being dumped.
I'd been taking part in the event I'm a Scientist, Get Me Out of Here-A free online event where school kids get to hang-out with and quiz scientists. It’s an X Factor-style competition between scientists from all different backgrounds where the students are the judges. Imagine hundreds of mini Simon Cowells. Groups of scientists are split-up into "Zones" of about half a dozen and the students challenge the scientists with all kinds of whacky, irreverent questions in online chat rooms. Each half hour slot with the kids is a non-stop, carpel tunnel inducing, fast-paced frenzy of txt spk and inquisition. A whole event lasts a week and each day someone gets voted out by the students for being their least favourite scientist. Other than the education and fun aspects there's a £500 prize for each Zone scientist to win to go on and communicate their work with the public after they've recouperated and regained the feeling in their fingers.
In my case, it seemed to all start so promisingly.
Like most early stages of a relationship we wanted to know a little about each other. What do you do for a living? Do you like your job? Why do we have two nostrils?
But it wasn't long before someone else caught their eye. And as my jealousy grew, they brazenly shunned me. I would sit in the virtual equivalent to the corner of a room and watch as they would chat and laugh with the other scientist. With every new emoticon a dagger was plunged deeper in to my self esteem as I realised that perhaps I wasn't such a groovy, down-with-the-kids statistician after all.
An d soon enough I'd joined the ranks of Stacy Solemon, Will Young and Girls Aloud. I'd been evicted.
To justify my failure I've convinced myself that it wasn't a level playing field. I'm a statistician. I was in a Zone with 5 other scientist with widely varying fields. In particular, the final 3 scientist we're young attractive and enthusiastic life science types. One of them works in marine science in a lab overlooking Table mountain in South Africa and another works in the jungle doing something with monkeys. In actual fact, the answers we all gave to the question asked of us weren't so different. However, the profile of monkey hugging and swimming with dolphins was clearly better than being the guy with a really good calculator. But of course this is all sour grapes. My Zone winner won because she deserved it.
The I’m a Scientist event was initially run in 2008 and has gone from strength-to-strength. It now runs twice a year, with a smaller event in March leading up to the main June event. I strongly encourage any scientist to take part. After all, its better to have loved and lost than never to have loved at all.
Random Observations
Mostly this and some of that
Sunday, September 1, 2013
Friday, January 25, 2013
Brute Force: Agent Orang-utan
Accompanying any new Bond
film is a cornucopia of branding; from watch sponsorships to cars and mobile
phones. Bond is seen modelling sharp suits and designer shades. But there is
another kind of agent modelling that has become increasingly popular in recent
times due to available computing resources and the preference for our friend
brute force.
Agent-based modelling is not something to do with 007. It’s an approach that enables what is
called parallelisation, another drinking buddy to brute force.
The idea is to split-up the
work-load into neat packets of autonomous tasks. Then divvy-up the task between
the available resources and let them each just get on with it. Each worker or
“agent” is assigned certain behaviours like always go left say, and, like a worker
bee or ant, they knuckle down to the task in hand without worrying what
everyone else is up to.
Because agents aren’t very
social-types they can do their particular job at the same time as other agents,
hence in parallel. At certain times they may report back to base with an update
of how they’re doing or what they found-out. This could be when something
happens or at a set time agreed with HQ.
Agents get their head down.
If you give then a job to do they’ll carry-on doing it happily for however long
you like, just don’t make it too complicated.
For example, if you wanted
to find something you’d lost, car keys say, then you could set a team of agents
on to the task of finding it. They’ll beaver away looking for it, maybe
searching a small area each or randomly zig-zagging about the place. The
trade-off may be between quantity and quality. Imagine you only had a set
amount of cash to pay the search team with. If you were to have one, expensive
guy hunting then he may move quicker than any cheap labour individual agent and
be more skilled but there’s still only one of him. A separate agent could be a
computer or one of the processors on a computer.
Splitting-up the work is not
always such a simple thing to do though. Not everything can be parallelised and
sometimes the effort of getting things into a fit shape so that it can be is
just not worth it. In the lost car keys example it’s obvious how to share the
task around, in space. But for other problems it’s not so straightforward.
Imagine a task that each step was reliant on the previous, like on a production
line, say making a car. Things could be made in parallel but when it comes to
putting the car together certain things have to go in first and others later,
like say, the steering column before the steering wheel. These things must
therefore be done in serial, i.e. one after the other. Even in the searching
problem it may be that the agents don’t find anything at first then they all
meet-up and have a chat and decide to change how they’re going to search as a
group, like focus more on one area over another. Each of these get-togethers
would be serial events.
Agent-based systems can be
relatively little hassle because once they’ve been set on their merry little
way they can be left unsupervised to get on with it. The downside to this is
that they’ll only act in the way that you’ve told to them to, which often means
in a simple and loner way- A bit like a Goth high school drop-out.
Brute Force: simulation
Simulation has got a pretty negative “rep” in recent years. It has come to be
associated with faking it, inauthenticity and downright cheating. Simulation in
sport is to feint injury or a foul in order to get an advantage or disadvantage
the opposition. Players regularly throw themselves to the floor on a football
pitch, arms out-stretched simulating a heinous foul so they can get a free kick
or penalty or to get the other player booked or sent-off. Calling this
simulation, rather than, say, cheating, is a relatively recent thing. Some
football cultures see this drama as being clever, having the skill to convince
the referee to be swayed in your favour. But this view point is in the minority,
especially in the UK, and for a player to practice simulation is to be a bad
boy.
Another use of simulation
could be said when a participant is not being enough of a bad boy. By this I’m
referring to the When Harry Met Sally sort of simulation. That is, simulation
by the fairer sex in the act of love-making; hair swishing, table thumping
faking. This particular form of simulation is also bad news, especially if your
mates hear about it.
But it is not true that all
simulation involves making a scene in one form or another. Faking it can be a
good thing, and no-one’s feeling should get hurt in the process.
Simulation in a general
sense is to make pretend and human beings do simulation every day, innumerable
times. Simulation is what we do in our heads when we imagine something that isn’t
real. Usually this is thinking of a fictional future to see what happens. These
simulations take split seconds and often we’re not even aware of it. Simulation
can be fun- imagining we win the lottery then what would we do with the money-
to preventing serous peril- crossing the road in heavy traffic. Simulation in
this form is called prediction and is how we can make half decent decisions.
Prediction also happens when we want to approach a beautiful woman on the other
side of the room. Our eyes meet hers and then “boom”, the minds races with
chat-up lines and openers. We imagine what the response might be, the worst
case scenario of a glass of wine in the face to the best- a wedding, kids,
retirement on the beach.
Friday, January 4, 2013
Stats Apps
I was reading recently about projects which "gamify" your PhD. That means they take a PhD topic and make it into a computer game. The idea is this is a way to engage with the public, part of science communication. The FameLab and I'm a Scientistic Get Me Out Of Here scheme are other ways to get people interested by pitching PhDs in fun and simple terms.
In a similar vein to making computer games out of PhD projects I thought I'd see what apps there are out there around statistics. I have an Android phone so looked on the Google Play site. What I found was rather a lot, more than I expected at least. Generally they are simple tools for recording and dispaying simple summaries and graphs of small data samples. The user enters in the data by hand and then can choose things like descriptive statistics or histograms and regressions.
The other sort of app is the explanation tool. This is really a condensed version of Wikipedia.
I thought I'd try and come-up with some of my own ideas for stats apps:
In a similar vein to making computer games out of PhD projects I thought I'd see what apps there are out there around statistics. I have an Android phone so looked on the Google Play site. What I found was rather a lot, more than I expected at least. Generally they are simple tools for recording and dispaying simple summaries and graphs of small data samples. The user enters in the data by hand and then can choose things like descriptive statistics or histograms and regressions.
The other sort of app is the explanation tool. This is really a condensed version of Wikipedia.
I thought I'd try and come-up with some of my own ideas for stats apps:
- something to do with public health/cardiovascular disease/lifestyle risk factors. Maybe a calculator that takes how much excercies, diet and weight and gives the risks of things
- something that uses census data and makes it specific to you (where you live, age groups,...)
- something like a rumour/infection app. you can only get it by blue-tooth with another mobile (after the intial population has been set-up). so the idea is to gather data as part of an experiment to see how people interact and the spread of a "rumour"
- then it returns interesting statistics to the mobile app like what number in the chain you are, how long after initial infection (speed), network diagrams of infected with phone names labels (degrees of separation)
- the infection rate and recovery rate could be effected by imposing a time limit on how long the phone is infectious (can transmit app), prizes and incentives to passing on the app (e.g. every 100 people theres a prize draw for an iPad)...
- practically, there could be 2 big buttons on the phones home screen: one if the infected person tries to pass it on i.e. mentions it their friend and another to send it
- in a time of bird flu, swine flu, etc this could be useful information about how people interact. it could also show how word-of-mouth or gossip works- often considered the best form of advertising.
Saturday, December 8, 2012
Brute Force: Missing Data (Part 4)
A Clockwork Orange, Relax by Frankie Goes To Hollywood and Nevermind the Bollocks. Which is the odd-one-out and why?
The answer is Nevermind the Bollocks because the other two were censored from cinema release in the former's case and from Radio 1 in the later. No-one will ever know for sure if Relax would have made it to the top of the pops if this hadn't happened. But it seems like Relax being absent on our radios actually helped it reach the dizzy heights in the charts.
In mathematics, when things go missing then theres work to be done with extra adjustment and ammendments to a full model needing to be made, often at some cost. Dealing with missingness can seriously complicate matters. Filling-in what isn't there needs to take into account what might have been there or how the missing parts affect all of the other parts too.
Removal of some component of a mathematical model can lead to involved and complex stuctures.
Now the problem is, this new missing-model with all of the extra bells and whistles is much harder to handle than the more well-behaved simpler model when we knew all the parts. Because of this situation, often the only solution to getting any work done is to resort to simulation.
Censored data is a common example of a missing data problem. Censored data is data where we do not necessarily observe the true value. In some instances instead of the true value of what ever it is that we're observing we only have the information that it is no bigger or smaller than some threshold value. In the case of time, for example, whats called right-censoring is when we stop observing something at some time so we only have information about it up to that point and not afterwards. For example, if people were monitored whilst in hospital for infection but not once they were discharged then the censored time would be at the time they leave hospital.
Lets suppose that we hadn't heard Relax on the radio upto a few days before the chart count down but then our radio broke, so it may have been played freely for all we know after that. Then if we wanted to make a guess about how well it was going to do in the charts we're missing a bit of important information (we'll leave whether bannning it was a good or bad thing for sales for the time being.)
Censoring means that if we want to calculate some statistics, say, using this data then we need to account for the fact that some of the data is censored. For example, if we wanted a mean average then this is not the simple mean as a sum of all the values divided by how many values there are. This is because some of those values we have are not actually the true value but just a lower limit of this. If we were to calculate an average like described then the resulting figure would be an underestimate. For example, we would estimate that the average time of infection in our hospital example is smaller than it actually is because we wouldn't be taking in to account the time after the patients leave hospital and before they get infected (if at all).
If we do take this censoring into account when we're trying to get our answer this means that things get more involved and complicated. One approach would be to sensibly fill-in the missing values using what information we do have at our disposal. Another is to change how we do the calculation. For example, the mean estimate for the censored time can be calculated using the probabilities of not yet having an infection at certain times, something called a survival time.
When the models are bigger and complex coming up with an alternative method, like the survival time formula, becomes even more difficult and the filling-in or imputing method begins to look more appealling. This can mean imputing lots and lots of missing values, over and over again, so that we get an idea about how the values we're coming up with are affecting the output. If you think this sounds like a job for brute force you'd be dead right.
The answer is Nevermind the Bollocks because the other two were censored from cinema release in the former's case and from Radio 1 in the later. No-one will ever know for sure if Relax would have made it to the top of the pops if this hadn't happened. But it seems like Relax being absent on our radios actually helped it reach the dizzy heights in the charts.
In mathematics, when things go missing then theres work to be done with extra adjustment and ammendments to a full model needing to be made, often at some cost. Dealing with missingness can seriously complicate matters. Filling-in what isn't there needs to take into account what might have been there or how the missing parts affect all of the other parts too.
Removal of some component of a mathematical model can lead to involved and complex stuctures.
Now the problem is, this new missing-model with all of the extra bells and whistles is much harder to handle than the more well-behaved simpler model when we knew all the parts. Because of this situation, often the only solution to getting any work done is to resort to simulation.
Censored data is a common example of a missing data problem. Censored data is data where we do not necessarily observe the true value. In some instances instead of the true value of what ever it is that we're observing we only have the information that it is no bigger or smaller than some threshold value. In the case of time, for example, whats called right-censoring is when we stop observing something at some time so we only have information about it up to that point and not afterwards. For example, if people were monitored whilst in hospital for infection but not once they were discharged then the censored time would be at the time they leave hospital.
Lets suppose that we hadn't heard Relax on the radio upto a few days before the chart count down but then our radio broke, so it may have been played freely for all we know after that. Then if we wanted to make a guess about how well it was going to do in the charts we're missing a bit of important information (we'll leave whether bannning it was a good or bad thing for sales for the time being.)
Censoring means that if we want to calculate some statistics, say, using this data then we need to account for the fact that some of the data is censored. For example, if we wanted a mean average then this is not the simple mean as a sum of all the values divided by how many values there are. This is because some of those values we have are not actually the true value but just a lower limit of this. If we were to calculate an average like described then the resulting figure would be an underestimate. For example, we would estimate that the average time of infection in our hospital example is smaller than it actually is because we wouldn't be taking in to account the time after the patients leave hospital and before they get infected (if at all).
If we do take this censoring into account when we're trying to get our answer this means that things get more involved and complicated. One approach would be to sensibly fill-in the missing values using what information we do have at our disposal. Another is to change how we do the calculation. For example, the mean estimate for the censored time can be calculated using the probabilities of not yet having an infection at certain times, something called a survival time.
When the models are bigger and complex coming up with an alternative method, like the survival time formula, becomes even more difficult and the filling-in or imputing method begins to look more appealling. This can mean imputing lots and lots of missing values, over and over again, so that we get an idea about how the values we're coming up with are affecting the output. If you think this sounds like a job for brute force you'd be dead right.
Simulation Talk
If you read a newspaper last week you may have come across
the Twin story. If you didn’t hear about it, let me fill you in.
This was a paired cohort trial across Europe, consisting of
nearly 10k recruited-at-birth twin kids. Their parents agreed to raise one child
healthy and one unhealthy up to 18 years of age. So, at last, science would be
able to definitively answer questions about lifestyle.
The day-to-day habits of each twin were very different. For
example, one twin took-up all kind of fitness hobbies like tennis, football,
rugby, marathon running; the other took-up Playstation and crisps. One twin
would do boxing and the other box-sets.
But maintaining this trial wasn’t always easy. Small
children can be really stubborn when they want to be. We all know that kids are
selfish and even when involved in the noble pursuit of concrete scientific evidence
they can still only look-out for Number One. An example exchange over the
dinner table went something like
- (twin) Muuuum, but I don’t want any more fags
- (mum) Well if you don’t smoke all your fags you won’t have any pudding! Here’s a lighter, now start smoking!!
Some kids were on 50-a-day. That’s fags and Big Macs, not to
mention the booze. This was like Supersize Me for 0 to 18 year olds to
once-and-for-all prove the effects of bad lifestyle on health.
The results? We had thousands of real-live Danny Devitos and
Arnold Schwarzeneggers.
Of course, this isn’t a real trial. No children were forced
to chain smoke or maintain a heavy drinking habit, mainly because of annoying
things like “ethics” and “morals” getting in the way.
But there is a branch of scientific research that inhabits a
world where you can make babies smoke and down flaming shots. No, not in Middlesbrough
but the world of computer simulation.
Computer simulations allow us to experiment with “what-if”
scenarios. What if I stopped smoking at 30? What if I get off the bus one stop
early? What if I eat my 5-a-day? The simulated world is a bit like Sim City or
World of Warcraft but without the elfs. We can investigate the effect of
different interventions like giving over 60s statins or we can compare the
effects of disease prevention against treatment.
Teams of computer scientists, statisticians, public health
experts and clinicians develop superfast computer models, harnessing new
computing power to produce an answer quicker than ever so we can simulate more
and more people in more and more detail giving us better and better answers.
Probabilistic microsimulations can follow someone from day
one at birth to inevitable death and can tell individual stories for 10s of
millions of people, like the entire UK population.
In this time of austerity these models can help decisions
about where to best put the available moneys for maximum impact.
But, like an American info-mertial, this isn’t all. There’s
a way that these models can be used more directly to help you and me.
I went to the doctors recently and I swear he was looking up
what was wrong with me on Wikipedia. I wouldn’t have minded so much but he said
it was a toss-up between the plague and thrush.
But these PCs on GPs desks can also be put to use with the
simulation models too. “Informatics” has the aim to provide desktop tools that
can help the doctor and patient. Live evidence can be communicated to the
patient about the effects of any change (or not) in their lifestyle. Using interactive
widgets like sliders, dials and infographics the patient can be involved in the
process and take ownership of their health decisions, which will make it more
likely to be motivating and effective. They can see what likely outcomes are
from what causes of action.
With more data available than ever before, detailed models
and powerful, web-based and user-friendly interactive tools to use them then
hopefully in the future there’ll be fewer Danny Devitos and more Arnold Schwarzeneggers.
Sunday, December 2, 2012
Brute Force (Part 3): Bruce & Ellie
The three great essentials to achieve anything worthwhile are: Hard work, Stick-to-itiveness, and Common sense.
Thomas Edison
The book Thinking, Fast and Slow uses the analogy
of the Type 1 and Type 2 ways of thinking with human characteristics: Type 1 is
the fast thinking, intuitive, snap judgment guy and Type 2 is the slow
thinking, logical, thorough kind of guy. But Type 2 is lazy and will prefer to
put his feet up and let Type 1 take responsibility for decisions if it can get
away with it.
In the same vein, meet Bruce
Force and Eleanor Gent (Ellie to her friends).
Bruce is a stereotypical man’s
man. He walks around in a lumberjack shirt or leather jacket with the collar
up. He is, or at least likes to think he is, an Alpha male. He likes to take
charge and order other people around. He thinks he knows the best way to do
things and that best way is to get stuck in. Often he hasn’t the faintest idea.
His house is full of DIY disasters where he just ploughed straight-in only to
find a little later down the line he’s done it upside-down and back-to-front.
The common occurrence of a failed IKEA construction is never his fault. Perhaps
they’ve packed the wrong part. The very idea of consulting the instructions for
Bruce is a laughable idea. He is persistent though. He is the very embodiment
of Edison’s Stick-to-itiveness.
Ellie is altogether a
different character. She wears thick milk bottle top glasses and a sensible bob
haircut, which she cuts herself. She always has an eye for a bargain and can
make, literally, pounds of savings by studiously buying what’s on offer at the
supermarket and using vouchers and accrued Clubcard points. She prides herself
on being prepared and always has a torch, spare batteries and a blanket in the
boot of her economical car for when she goes camping, just in case. She is
never one to rush in to a decision and hates being pressurised even when she’s
deciding what to have on her Subways sandwich. Her home is an efficient,
organised machine. She is never left looking for her car keys. And when she moves
house everything is boxed-up and labeled for a smooth, alphabetised transition
at the other end. She is not one for spontaneity and her social calendar is
booked-up months in advance.
So Ellie and Bruce are two peas in very different pods. Neither always does things "the right way" but each have there moments of success and failure. Bruce is probably better on a night out but Ellie would be good to have around the next day to help make cups of tea.
But this isn't to say that Bruce and Ellie are always at odds. In fact, things can really get done when they work together. The Four Colour Problem
is an example of when they worked as a team to solve a previously unsolvable
problem. Ellie works-out some of the things that could be a possible solution
and then Bruce charges-in, like a wound-up Duracell bunny, to try some of them
out. Their skills can complement each other in cases like this. It's fair to say that a long-term
relationship between the two is unrealistic though. Bruce’s sock draw would drive
Ellie mad.
Subscribe to:
Posts (Atom)
