Wednesday, January 18, 2017

Brexit 12 objectives

These are the 12 objectives for Britain’s Brexit negotiations, as set out in prime minister Theresa May

Issues Brexiters really care about and will likely get
2. Control of our own laws
5. Control of immigration
9. New trade agreements with other countries

Things that are not measurable
1. Certainty wherever possible
11. Co-operation on crime, terrorism and foreign affairs
12. A phased approach, delivering a smooth, orderly Brexit

Things they had before Brexit
4. Maintaining the common travel area with Ireland
6. Rights for EU nationals in Britain and British nationals in the EU
8. Free trade with European markets

Measurable things I think they won't get
3. Strengthening the United Kingdom
7. Enhancing rights for workers
10. A leading role in science and innovation

I am willing to pick measurable metrics on these last three. % of people in Scotland who want independance. Where UK stands in global metrics of workers rights. Patents or and paper outputs and their are metric of countries innovation.

There are also potential negative economic consequences. Inflation, consumer debt, Sterlings value, GDP growth are all useful metrics.

Friday, January 13, 2017

Irish Election Spending 2016

In the Irish election 2016 who paid the most for each vote and for each seat?
8394832.89 total spending (report here) Electorate: 3305110 so €2.50 was spent on each vote. That is under half what is spend on a US presidential vote.
On a per seat and per vote basis

And on a Per Seat Basis


Party,"Votes,1st pref.",Seats,Spending
Fine Gael,544140,50,2768881.50
Fianna Fáil,519356,44,1687916.29
Sinn Féin,295319,23,650190.38
Labour Party,140898,7,1083718.38
AAA–PBP,84168,6,266942.48
Ind 4 Change,31365,4,51669.18
Social Dem,64094,3,190586.93
Green Party,57999,2,146792.27
and the r package code is

data <-  read.csv("spending.csv", header=TRUE)
datat <- mutate(data, perV = Spending/Votes.1st.pref., perS= Spending/Seats)

q<-  ggplot(data=datat, aes(x=Party, y=perV, fill=Party)) + geom_bar(stat="identity") +      scale_fill_manual(values=c("#E5E500", "#66BB66", "#6699FF", "#99CC33", "#FFC0CB","#CC0000", "#008800", "#752F8B"))
q <-q + theme(axis.text.x = element_text(angle = 90, hjust = 1))
q <-q + theme(legend.position="none")
q <-q + labs(title = "General Election Spending 2016")
q <-q + labs(y = "Euros Per Vote")

q<-  ggplot(data=datat, aes(x=Party, y=perS, fill=Party)) + geom_bar(stat="identity") +      scale_fill_manual(values=c("#E5E500", "#66BB66", "#6699FF", "#99CC33", "#FFC0CB","#CC0000", "#008800", "#752F8B"))
q <-q + theme(axis.text.x = element_text(angle = 90, hjust = 1))
q <-q + theme(legend.position="none")
q <-q + labs(title = "General Election Spending 2016")
q <-q + labs(y = "Euros Per Seat")




Wednesday, June 01, 2016

The Name of the Youngest Ever Modern Olympics Gold Medal Winner is Unknown

In the 1900 Olympics the Dutch rowing team were short a cox. They used a rower in the semifinal, Hermanus Brockmann, but decided his 60kg weight was too much of a handicap.

So the rowers, Françoise Brandt and Roelof Klein, picked a ten year old French boy (25kg) out of the crowd and asked him to cox for them.

They won the gold. And took a photo with the boy. But his identity has never been established.

Thursday, May 19, 2016

Dying at Work in the US

Dataset from the Occupational Safety & Health Administration, OHSA, track workplace fatalities in the US. They have CSVs records of the workplace deaths a year in the US, that they release publicly.

The data contains the date, location and a description for 4000 fatalities over five years. I created columns for state, zipcode, number of people and cause.

The most common interesting words in these descriptions are

  • 813 fell
  • 708 struck
  • 642 truck
  • 452 falling
  • 382 crushed
  • 352 head
  • 263 roof
  • 261 tree
  • 258 electrocuted
  • 244 ladder
  • 238 vehicle
  • 226 trailer
  • 197 machine
  • 186 collapsed
  • 180 forklift

Not common but interesting

  • 10 lightning
  • 48 shot
  • 4 dog
  • 2 bees

and here is a map I made of the states where they happen

I have created a repository to try augment the OSHA data and clean it up when errors are found.

The repository is on github here.

If you use it I'll give you edit rights and you can help improve it

Sunday, May 15, 2016

Handpicked by amazon

Whenever I check some product on Amazon for the next few days I get the product in the advertisements on Facebook

Handpicked?

Why would Amazon lie like this?

Thursday, April 21, 2016

Can you Judge a Book by its Cover?

"they've all got the same covers, and I thought they were all o' one sample, as you may say. But it seems one mustn't judge by th' outside. This is a puzzlin' world." The Mill on the Floss by George Eliot
What is the correlation between peoples ratings of a books cover and the ratings the book receives? This post is about a game devised to get people to rate book covers and gives some great visualisations comparing a books goodreads rating to its cover rating. They gathered over 3 million ratings of 100 covers.

I took their data and got the average rating for each of the covers they tested. I then scraped these 100 books Goodreads average ratings, number of ratings and number of reviews. The Data table and the code I used to scrape and aggregate is here. There are all sorts of accuracy warnings you can imagine around these results. The main ones being that the books and their covers all look pretty good to me. They are not on the self published fan fiction end of the market. The variables here are. num_ratings: Number of Goodreads ratings. rating: average rating of the book. num_reviews: Number of people who have actually written a review. cover_rating: The average rating people gave the cover of the book.

> cor(rating,cover_rating)

[1] 0.1609114

> cor(num_ratings,num_reviews)

[1] 0.9597442

> cor(rating,num_ratings)

[1] 0.2141307

> cor(rating,num_reviews)

[1] 0.2658916

> cor(num_ratings,cover_rating)

[1] 0.3059627

> cor(num_reviews,cover_rating)

[1] 0.3307553

So no you can't judge a book by its cover the correlation in ratings is only .16. You can guess the number of ratings by the number of reviews. You can't guess how highly rated a book is by the number of ratings. Having a good cover might increase the number of reviews your book gets by a bit.

The conclusion is you shouldn't judge a book by its cover. Or by its number of sales (ratings). But people probably do judge books by their cover a bit.

Monday, March 07, 2016

Maps to hide places

Logaskino was a military base in Siberia. Over 30 years Soviet mapmakers moved it around maps to throw off enemies "How to lie with maps" talks about how the Soviets would move around the location of military bases on maps. These maps show one small base (now abandoned) and the local river and how it moved around on maps over 30 years in order to attempt to confuse enemies