BBC Question Time has become one of those TV programmes that I now rarely watch without also reading and interacting with the #bbcqt hashtag on twitter. Clearly I’m not alone – last night there were approximately 36,000 tweets on the hashtag over the hour or so that the programme was on. That’s a lot of data about a TV programme – and given the programme’s political nature there must be some really interesting information in there about politicians and the way people react to what they say.
So I built this prototype to play around with the data. (edit – the one for the 31st Jan is here , 7th Feb here, 14th Feb here , 21st Feb here, 28th Feb here, 7th Mar here, 14th Mar here, 21st Mar here, 11th Apr here, 18th Apr here)
Last night I captured every tweet using the #bbcqt hashtag that was made between 10.30pm and 11.45pm (the programme runs for an hour from 10.35pm) from the twitter api (with this volume of tweets you need to be sneaky to avoid crashing into the api limits… but it’s possible)
Before the programme I wrote a quick bit of code so that during the show I could capture which person was speaking when.
Afterwards I put together some code to….
- divide the tweets up into ones that were obviously about the panellists and ones that were just generic and then further divide them up into one-minute chunks
- remove all of the rubbish bits (punctuation, inconsequential words etc) from each tweet
- run each tweet through a naive bayesian classifier to classify it’s sentiment as positive, negative or neutral (classifier code on GitHub).
With the data cleaned up and analysed I then coded up a front end to display the information (for the technical people, it uses D3.js and rickshaw.js for the graphing library).
- I like how you can clearly see how twitter reacts just after someone has spoken – obvious really – but nice to see the data doing what you would expect it to.
- There are some interesting points where clearly one of the panellists has struck a chord on a particular topic – more positive sentiment than negative after particular comments.
Things to Improve
- The classification is trained on some generic good word/bad word data – I reckon a much more accurate sentiment would be gained by training the classifier on actual #bbcqt data (especially as there’s some quite choice anglo-saxon swearing that the current classifier doesn’t recognise)
- I gave up, because I didn’t have time, but theres some really interesting information in analysing word frequencies within the tweets – maybe one to develop later
I’m interested to find out if there is an appetite for this kind of (very niche I know) analysis – do Political parties monitor this stuff ?- is there some valuable feedback in there for them?