Labour stands to gain in the metro mayor elections

As May 21st’s ‘Super Thursday’ elections come into view, all eyes are on whether Nicola Sturgeon will secure her majority for independence, and whether Labour will cling on in Hartlepool. But Super Tuesday also features 8 of England’s biggest electoral tests, in the form of combined authority mayoral elections.

These elections will give individual politicians huge electoral mandates, an unusual occurrence in British politics, so politicos are characteristically pouring over the ‘electability’ and personal qualities of the candidates. But do candidates really make a difference?

In most metro mayor elections so far, the evidence suggests not much. In 4 of the last 13 elections (5 in London, 8 combined authorities), both Labour and the Conservatives have been within 5% of what you’d expect based on the region’s last general election vote. In four more, both have been within 10%. The average deviation of the three main parties from the region’s ‘partisanship’ is just 6.4% – despite this number being dragged up Labour’s 38% collapse in 2000.

LabourConservativeLib Dem
Burnham 2017 (+22%)Johnson 2012 (+17%)Cantrill 2017 (+9%)
Dobson 2000 (-38%)Bowles 2017 (-19%)Paddick 2008 (-9%)
Biggest over and under-performers in metro mayor elections

So what should our expectations be for the upcoming elections?

With no consideration of incumbency at all, the metro mayor map looks very favourable to Labour. Although the party lost four out of the six mayors up for election in 2017, this was in the context of an abysmal national picture, with the party languishing 17% behind the Conservatives. Since then, Labour has pushed the Conservative lead down to 7% and Labour-voting West Yorkshire and London have been added to the 2021 map.

By combining 2019 election results with current polling and some random variation, the expectation would be that Labour hold Liverpool (94% probability), London (88%) and Greater Manchester (86%) pretty comfortably, add West Yorkshire (79%), and gain the West Midlands (70%), Tees Valley (66%) and West of England (66%) narrowly from the Conservatives.

To be clear: this isn’t a forecast. It is a ‘prior expectation’ based only on each region’s partisanship and current polling. But it is useful to compare parties’ performances against.

RegionLabConLibOther
Liverpool94%0%0%5%
London88%9%0%3%
Greater Manchester86%11%0%3%
West Yorkshire79%17%0%4%
West Midlands70%24%0%6%
Tees Valley66%34%0%0%
West of England66%25%0%9%
Cambridgeshire and Peterborough26%72%2%0%
Win probabilities for 2021 metro mayor elections on current polling with no incumbency effect

It is also useful because it can allow us to see what sort of effect an incumbency bonus might have. This is probably one of the most unpredictable elements of the elections, with only three past mayoral elections featuring an incumbent (2012, 2008 and 2004) and all of them in London. This is far too few to make assumptions about other cities, but of these the incumbents (Johnson in 2012 and Livingston in 2008 and 2004) they over-performed the model by +17.4%, +2.6% and -1.1% respectively. This is an average of +6.3%, incidentally the same margin by which Sadiq Khan is currently over-performing in polls.

In the other metro mayor elections, the incumbency effect is likely to be smaller. The mayoral positions are less established, probably have lower name recognition, and are covered by less regional media. We can still safely assume that the figure is higher than 1.75%, the predicted incumbency effect for MPs in general elections (Middleton, 2018).

Between 1.75% and 6.3%, the range of win probabilities for Labour varies a lot:

When the incumbency effect is high (6.3%), Labour goes from being favourites to (narrow) underdogs in the West Midlands (49%) and Tees Valley (44%), whilst all but sealing the deal in Greater Manchester (95%) , London (96%) and Liverpool (97%).

Whether the incumbency effect is big or not, Labour should hope to gain. If it doesn’t, it could signal some serious issues. None of Labour’s incumbent mayors are at any risk, and West Yorkshire should be a relatively simple win. Labour probably have a slight edge in the West of England, which has swung away from the Conservatives since 2017, and are highly competitive in both Tees Valley and the West Midlands.

What about the supplementary vote?

One other question going into the metro mayor elections, which adds a layer of uncertainty, is the effect of the supplementary vote electoral system. While the system might have a significant ‘psychological’ effect, changing the way parties campaign and voters vote, I don’t think it will have a significant ‘mechanical’ effect.

In the 13 metro mayor elections, the first round winner has won the second round every time. In 81 Police and Crime Commissioner elections, which also use SV, the first round winner has won the second round in all but 6 elections. The effect of SV doesn’t seem to benefit or disadvantage any one party that significantly, either, with both Labour and the Conservatives losing out in PCC elections.

In the 13 metro mayor elections, Labour won about 55% of third party voters in the second round, compared with 46% for the Conservatives. This sounds like a big difference but first round winners typically only need around 39% to win, so in practice it makes little difference (only one Conservative candidate received less than 39% of second preferences, in Sheffield City Region in 2018). It is also partly explained by Labour generally having bigger first round leads than the Conservatives, which does appear to have an impact on second round performance.

Comparing the model with and without a second round bonus for Labour shows how few scenarios there are where this makes a difference:

RegionLabour Win Probability (2nd round advantage)Labour Win Probability (no 2nd round advantage)
Liverpool96%96%
London94%90%
Greater Manchester92%90%
West Yorkshire79%77%
West of England66%64%
West Midlands57%56%
Tees Valley52%52%
Cambridgeshire and Peterborough26%25%
Using an incumbency effect of 4% (mid-way between MPs and London Mayors)

Here, the effect is bigger in places where the Liberal Democrats and other parties might get a higher share of the vote (a 4% increase in London, for example), but still very small. With the advantage, Labour wins on average 5.6 out of 8 mayoralties, without it they win 5.5.

So, who is going to win?

As I said above, none of these figures are a forecast. 13 elections is far too small a sample size to make judgements about many of the variables that affect election results. That said, it does seem likely that most metro mayor elections reflect their region’s partisan lean pretty closely. In terms of incumbency effect, my personal guess would be that it’s worth around 4%, but that’s only worth as much as any guess. Given that, Labour is probably on track for four easy wins and four competitive races, if national polling stays where it is.

If I had written a book, I might offer to eat it if Labour loses Liverpool. But other than that we’ll have to wait and see.

Data and code here.

If the Alba Party gets more than 5%, it could be catastrophic for unionist parties (but it probably won’t)

When it comes to political news, there is little more sensational than a prominent politician setting up a new party. Today, Alex Salmond, former First Minister of Scotland, announced he would be starting a new pro-Independence party – The Alba Party – to contest the Scottish Parliament election in May. Will it make any difference?

Before I start, I should point you to Ballot Box Scotland, who are providing incredibly detailed Scottish polling and election analysis, and Ben Walker at the New Statesman, who is modelling polls for the upcoming election. I’m sure BBS and Ben will have lots to say about the Alba Party over the course of this campaign, so I suggest you check them out.

The first instinct of most (English) readers to the news of a new pro-Independence party is probably that it will damage the SNP by splitting the Yes vote, but since Scotland uses a form of Proportional Representation, the effect of spoiler parties is very different.

Scottish Parliament elections are conducted on two ballots, a constituency ballot (using First Past the Post) for 73 local MSPs, and a regional party list ballot, which proportionally allocates 56 MSPs to make the overall result as proportional as possible. Since it seems unlikely that the Alba Party will contest local constituencies, the regional lists are where it really makes a difference.

Since the SNP is so dominant in the constituencies section, winning 59 out of 73 seats in 2016, the party wins few regional list MSPs (4 in 2016). Current polling suggests the SNP will continue to dominate the constituency section. The SNP’s constituency position is so strong, in fact, that the proportional regional lists are unable to correct the overall result to be fully proportional.

This is where the Alba Party could cause trouble. If a significant number of Independence supporters vote SNP on their constituency ballot, but Alba Party on the regional list, the SNP will continue to win their constituency seats, but the Alba Party may be able to pick up a number of list MSPs.

Here, a small number of votes stand between the Alba Party having no impact at all and wreaking havoc. Ben Walker has approximately modelled the number of votes needed for the Alba Party to win a seat in each region, here:

Another way to look at it is to see what would have happened if the Alba Party had stood in 2016. This is a very approximate measure, given the changing political winds since 2016, but by giving The Alba Party a varying percentage of list votes from the SNP, we can begin to see the possible effect:

If the party wins less than 5% of the vote (which is equivalent to 12% of SNP voters defecting), it gets no MSPs and the effect on other parties is minimal (the Conservatives gain one at the SNP’s expense). If the party wins 10% however (24% of the SNP vote), it begins to make significant gains, mostly at the expense of Labour, the Greens and the Conservatives.

The net effect of a successful Alba Party would be a significantly more pro-Independence Scottish Parliament overall, as the SNP keeps its 59 constituency MSPs, joined by a group of Alba Party list MSPs. The Yes/No balance is illustrated below:

These modelled numbers depend on constituency vote shares staying the same, and Alba Party supporters being drawn perfectly from the SNP with no other votes changing, so it is not useful for predicting seats in 2021, but it is an illustration of the disruptive effect the Alba Party could have if it wins more than 5% of the vote.

However, going from a standing start to 5% of the vote in 6 weeks is a very tall order. It has never been achieved in the Scottish Parliament’s history. Excluding the 1998 elections, the best result for a new party was in 2003, when the Scottish Senior Citizens Party won 1.5% and managed to scrape 1 MSP in Central Scotland, where it won 6.5% of the vote.

With Alex Salmond’s high name recognition, it seems possible that the Alba Party could pull off a similar result, winning a single digit percentage of votes and number of MSPs. If Salmond stands as a list candidate, his personal vote might be enough to win him a seat, in a similar vein to Margo McDonald in 2003, re-elected in 2007 and 2011, but at this point it is worth pointing out how unpopular Salmond is.

As Opinium’s Chris Curtis writes, the former First Minister’s net favourability (favourable – unfavourable) is at a catastrophic -60% among Scottish voters. Even among Yes voters, only 22% have a favourable view of him. Given the Alba Party is likely to be the Alex Salmond Party in practice, this suggests a ceiling of support around 10%. Without the campaigning infrastructure of the established parties, it may be hard even to get this 10% to turn out.

While some have talked about Yes voters tactically supporting the Alba Party despite Salmond, to deliver independence, I think this is unlikely to make a significant difference. As Ballot Box Scotland’s Allan Faulds has argued in the past about the Scottish Greens, the effect of people trying to game the Scottish Parliament’s electoral system is probably overstated. Most people are just not so plugged in to the trials and tribulations of the D’Hondt formula.

I would guess that pollsters are currently scrambling to ask Scottish voters about the Alba Party, so we may have more evidence to consider its potential soon, but with a flurry of news coverage in the next few days, poll results may just be a flash in the pan (anyone remember The Independent Group polling at 18%?). The best bet is to wait and see.

Weaver Vale Constituency Labour Party becomes the 166th to back PR, one quarter of all CLPs

Weaver Vale CLP became the 166th local branch of the Labour Party to pass a motion calling for Proportional Representation this week. That means over a quarter of CLPs have passed a motion in recent years, making PR one of the most called for policies in Labour history.

The renewed push for PR among Labour activists comes with the formation of Labour for a New Democracy, a coalition of organisations pushing for PR, including the Labour Campaign for Electoral Reform, Make Votes Matter, Open Labour, the Electoral Reform Society and others. If you would like to push for PR within Labour, you can sign up here.

CLPs in Wales, Scotland and all of the English regions have backed PR. The static map below shows which CLPs have passed motions, or you can explore the interactive map here. You can also see the full list of CLPs which have passed motions here.

CLPs marked in red have passed a motion calling on Labour to back Proportional Representation

The map above shows broad support from diverse parts of the UK, from Dwyfor Meirionnydd to Dewsbury and St Ives to Sunderland South. This includes 28 of Labour’s top 100 target seats in England and Wales – key marginals such as Colne Valley, Hastings and Rye, Pendle and Warrington South – as well as the CLPs of both Keir Starmer (Holborn and St Pancras) and Jeremy Corbyn (Islington North).

Recent polling suggests over three quarters of Labour members back PR. During his leadership campaign, Keir Starmer said “we’ve got to address the fact that millions of people vote in safe seats and they feel their vote doesn’t count”. Labour’s official neutral position on changing the voting system is becoming more and more untenable.

Labour for a New Democracy is building momentum for the next Labour conference, sending speakers to Labour Party meetings, supporting members to pass motions in CLPs and trade union branches, and building alliances across the Labour movement. Please sign up to get involved.

Which states should have the first four primaries? The case for picking the weirdest ones

This week, Democratic National Convention chair Tom Perez declared his support for changing the order of presidential primaries and caucuses so that Iowa and New Hampshire are toppled from their respective ‘first in the nation’ positions.

Perez argues that more diverse states should be prioritised in selecting the Democrats’ future presidential candidates, presumably driven partly by the dismal performance of Joe Biden in 90(ish)% white non-hispanic Iowa and New Hampshire, before going on to win elsewhere with a diverse electoral coalition.

The debate over which states should go first has led a lot of people to argue that the most ‘representative’ states should be prioritised – that is, those states whose demographics match the nation as a whole. NPR created its own ‘perfect state index’, to measure this similarity, with Illinois coming out as the most representative of the country. But is the most representative state the best to go first?

In reality, US social geography means there are very few ‘normal’ states. For example, while Illinois is only 3.9% different from the racial makeup of the US as a whole, the next closest state is Connecticut at 12.8%. The majority of states are over 37% different. When it comes to winning a primary or the eventual general election, the candidate will not have to win a ‘representative’ state, but a mixture of very unrepresentative states. If the first four primary states were representative of the country, they might each be 63% white non-hispanic, which is actually a very poor test of whether a candidate has strength with both white voters and voters of colour.

The same is true for many demographic features – wealth, education, religiosity etc. America is not a country of diverse states, but a diverse country of many fairly homogenous states and some very diverse states.

Leading the primary process with ‘representative’ states also waters down the voting strength of groups which reform is supposed to amplify. While candidates are currently expected to reach out to African American voters in South Carolina and Hispanic voters in Nevada, if the first states were representative of the country, these groups would not form a critical majority in any of the first primaries.

The solution is to choose four unusual (‘weird’?) states to kick off the primary process, ones which will each have a unique test for presidential candidates.

To find which states fit this criteria, I used Principal Components Analysis (PCA). PCA works by converting multiple variables of data into a number of ‘components’, each of which is orthogonal (at a right angle) to the others. This essentially means finding components which explain most variation in the data, simplifying the correlations between the variables into a single component, then adding another component to explain the largest proportion of the remaining variation, until all variation has been explained.

PCA is useful for this analysis because it means we can place the states in a multi-dimensional ‘space’, based on components which best explain the variation in the data. This way, we can see the distance between states and pluck out some of the extremes. My PCA is based on each state’s percentage African-American, percentage Hispanic/Latino, percentage ‘Other’ race (this implicitly means the percentage white non-hispanic is being considered), percentage with a bachelor’s degree or higher, median income, urbanisation, and percentage who attend church.

For illustration, the first two principal components (explaining approximately 63% of variation in the data) are plotted below, along with the effect each variable has on a state’s position on the two axes.

This diagram looks like a bit of a mess at first glance, but to try and simplify: we can see that by moving along the x axis (principal component 1), median income, urbanisation, and percentage with a degree, hispanic or other race decrease, while the percentage who are African American or attend church increase. On the y axis (PC2), increasing the value reduces urbanisation, percentage African American and attend church and increases the percentage of other race. This way, demographically similar states are grouped together, e.g. the deep South in the bottom right, or Maryland, New York and New Jersey in the bottom left.

These are just the first two dimensions but each state has a vector position across 9 principal components together explaining 100% of variation in the data.

There is a strong case that the first four states should be small, so there is not an inbuilt advantage for candidates with more money or access to expensive media markets. This also helps prevent an insurmountable number of delegates being chosen before most states have voted. For these reasons, I excluded states with population greater than 6.6 million. I also (possibly unfairly) excluded Alaska and Hawaii, so that the first four primaries would happen in the 48 states of the contiguous US.

To find the best first four states, I looped through all combinations of four states and summed the euclidean distance (in principal components) between all pairs of states. For example, the first combination is Alabama, Arkansas, Colorado and Connecticut. By adding the distance between pairs (AL-AR, AL-CO, AL-CT etc) the summed distance is 26.1. The four states with the greatest distance are….

*drumroll*

Maryland, Mississippi, New Mexico and Vermont

These states represent the extreme of states’ demographics. Mississippi, for example, is the state with the highest African American population. New Mexico has the highest hispanic population, Vermont is the least religious state, and Maryland has the highest median income.

But beneath these headline statistics too, they represent many of the different types of American life. For example, both Maryland and Mississippi have significant black populations, but in Maryland black people are highly urbanised while in Mississippi they are predominantly rural. Both Maryland and Vermont are highly educated and high income, but while Maryland is diverse and urbanised, Vermont is very white and rural. Both Mississippi and New Mexico have low median incomes, but New Mexico is less religious and more educated.

Each state poses a completely different electoral challenge to primary candidates and as a group they include the largest possible proportion of the US’s diversity. If a candidate can survive primaries in New Mexico, Vermont, Mississippi and Maryland, they can win anywhere.

For reference, here are some of the most diverging and homogenous combinations:

RankStatesDistance
1MD, MS, NM, VT38.47181
2MD, MS, NH, NM37.94593
3MD, ME, MS, NM37.59456
4MD, MS, NM, WV37.15109
5MD, MS, MT, NM36.80862
7,843IA, NH, NV, SC (current)24.32512
31,461IA, KS, ND, NE7.472435
31,462IA, KS, ND, NE7.414509
31,463IA, ID, ND, NE7.375798
31,464IA, ID, KS, NE6.943583
31,465IA, KS, MO, NE6.919990

What’s up with the Sunday Times MRP projection?

All over Twitter today has been discussion of a Sunday Times MRP model which reportedly showed that if an election were held today, Boris Johnson would lose his majority and his seat. It’s a striking result but not totally inconsistent with recent national polls which have shown Labour making gains. It caught my attention, though, because of the vast amount of discussion it was generating, relative to normal polls. At least part of the reason is that this is an ‘MRP’ model.

MRP stands for Multi-Level Regression and Poststratification. In short, the MRP method means taking a large sample of poll respondents (in this case over 20,000) and using a model (usually logistic regression) to predict how each individual respondent will vote, based on a combination of individual-level data (e.g. race, gender, age, education) and information about their local area (in this case, presumably, their parliamentary constituency). These levels are the ‘multi-level’ part of MRP. The final part of the method, poststratification, involves using this model to predict how each demographic group in each local area might vote. For example, the model might predict that a white, 50 year old woman who went to university has a 30% chance of voting Labour.

In the poststratification stage, this percentage is predicted for every demographic group, then multiplied by that group’s population in an area. This gives a final predicted vote share for the party in each area. Under electoral systems which base election results on geographic areas – such as in the UK and US – it is argued that MRP can successfully predict election results on an area-by-area basis. Part of MRP’s popularity was that in 2016 it performed much better than state polls at predicting Donald Trump’s surge in some key swing states. Then, in 2017, YouGov’s MRP model was one of the only predictions to correctly suggest a hung parliament. However, MRP is still just a statistical model and relies on regular polling data.

With that in mind, it’s easy to see why the Sunday Times’ MRP model, produced by FocalData, is getting a lot of attention. I am not subscriber to the Sunday Times, so I could not see the full article, but helpfully FocalData released their constituency data publicly. It was when I started scrolling through these results that I began to have some doubts about their validity. The Liberal Democrats, for example, were predicted to get 8.7% nationwide – a fairly solid result for the party in post-2015 terms – but lose all but two of its seats. Meanwhile, Labour appeared to be making huge gains in many seats it had never been competitive in before (including my home constituency – North Cornwall – where the FocalData model predicts Labour to go from 9% to 30%).

Taking a closer inspection of the data reveals a worrying pattern. For all parties which the model predicts (Conservative, Labour, Liberal Democrat, Green, Brexit, SNP and Plaid), the predicted change in vote from 2019 appears to be massively dependent on the 2019 vote share itself. All parties performed worse in their best seats from 2019, and relatively better in their worst seats.

In the plots below, I use Pippa Norris’s past election results data to compare the FocalData prediction to the 2019 election results. In these plots, the x-axis is the 2019 election result and the y-axis is the FocalData predicted vote share. The black line represents x=y, as a visual aid. Points above x=y are constituencies where parties improved on their 2019 vote share in the FocalData projection. For each plot, a linear trend line shows how the FocalData model predicts parties performing relatively worse in their best seats, whilst relatively better in their worst seats (that is, the blue regression line tilts to the right of the black x=y line).

This is pretty unusual. Although there is often some correlation between election results in one election and party swing at the next, this is usually much smaller. For comparison, the same chart is reproduced for the change between 2017 and 2019, without the Brexit Party which stood no candidates in 2017:

Here you can see a much more typical relationship. Constituency results vary between the two elections, but this variance is fairly similar across the range of seats (from unwinnable to marginal to safe), as is shown by the fact that the linear trend line has a gradient of close to 1 (matching x=y).

To show this in another way, I performed a very simple linear regression for each of the Conservatives, Labour and the Liberal Democrats at elections in 2015, 2017 and 2019 (previous elections are not comparable due to boundary changes), to see what proportion of the variance in vote share change is explained by the preceding election vote share (R2).

Here we can see how different the FocalData predictions are to previous election results. The previous highest proportion of variation explained by vote share was in 2015 for the Liberal Democrats. This has a simple explanation: the Liberal Democrats collapsed everywhere, so in places where they had higher vote shares, they had further to fall. Aside from that, vote share at the previous election typically explains under 10% of change in vote share at the next. For the FocalData model, it explains over 70% for all three parties, and a staggering 95% for the Liberal Democrats.

By way of comparison, I repeated the regression, this time including the estimate Leave vote of each constituency (from Chris Hanretty’s estimates). One might expect that Leave vote has a large impact on vote share changes, given how much it impacts our politics. Instead, the combined vote share and Brexit vote models yielded increases in R2 of just 1% for the Conservatives, 5.3% for Labour and 0.3% for the Liberal Democrats.

What does this mean in practice?

In terms of the validity of these predictions, the relationships I have identified lead to some pretty big questions. While it would theoretically be possible for all parties to perform relatively poorly in their best seats and well in their worst seats, this seems extremely unlikely. On top of this, there are so many examples of constituency predictions which seem wildly unrealistic on the face of it that there must be some underlying issue.

One prominent example is Brighton Pavilion. Here, despite the Greens more than doubling their support nationwide in the FocalData model, Caroline Lucas is projected to have her majority cut by 15%, from 34% to 19%. Although individual constituencies do not always go the same direction as the country, what reason is there to think Brighton Pavilion would swing 15% against the Greens during the party’s best election ever? More so, what demographic or local variables in the MRP model could cause this to happen?

Another is Twickenham, a Liberal Democrat stronghold. Here the Liberal Democrat vote is projected to fall by 24% while the Liberal Democrat vote nationwide falls by 2.9%. This would be a bigger decrease for the Liberal Democrats in Twickenham than their catastrophic 2015 defeat. Part of the reason the FocalData model resulted in only two Liberal Democrat seats is similarly huge declines across the seats they won in 2019. Meanwhile the Liberal Democrat vote stays almost wholly in tact across much of the rest of the country.

There are many more examples which include huge swings, and these swings do not appear to be particularly well correlated with virtually any demographic or political indicators.

Why might this have happened?

Why this might have happened is a question I have been really stuck with. I am not an expert on MRP, although I have briefly looked at the method in the past. My original instinct was that the model used to predict voting intention was underfit. With a sample of 22,000 (1/5th of YouGov’s final MRP model in 2019), the average constituency had just 34 respondents. It might simply be that the model was not picking up constituency-level variation, and was flattening all constituencies towards the national picture (thereby making Labour seats less Labour, Conservative seats less Conservative, and so on). This might be the case.

However, if the model was underfit one would also expect there to be examples of large swings in the opposite of the predicted direction (i.e. there should be areas where demography overstates a party, as well as understates). In fact, swing by constituency is extremely well predicted by previous vote share (residual standard error of just 2.234), to the extent that it almost looks like someone multiplied the result in 2019 in each constituency by some factor to arrive at these predictions, plus some random noise.

Given this lack of variation, it might instead be that the model is overfit, but without full methodology it is difficult to tell how. One potential method which could have caused overfitting is if FocalData used some form of auxiliary model to estimate the distribution of individual-level votes from 2019 across demographic groups. That is, if they modelled current voting intention based on another model of 2019 votes.

(EDIT: my conversation with FocalData’s CEO here suggests this might be the case)

This other model could mean that the variables predicting individuals changing their vote are dominated by their 2019 vote, leading the model to predict similar proportions of voters to change their vote (for each party) for all demographic groups. For example, if the model predicted that 10% of Labour voters would vote for someone else, that would decrease the Labour vote by 0.5% somewhere where the party won 5% in 2019, but 8% somewhere where the party won 80% of the vote in 2019. This would explain why vote share change is so highly correlated with 2019 vote shares.

In any case, since I started writing this post I noticed The Guardian and The Daily Mail also picked up the story on the projection. Clearly, the MRP label and relatively large sample size give the model a credence not afforded to most conventional polls. In that case, it also ought to be carefully scrutinised.

What can mapping a Hungarian by-election tell us about authoritarianism?

In October, voters in Borsod-Abaúj-Zemplén’s 6th district voted in an interim election to replace Fidesz MP Ferenc Koncz, who had died in a motorcycle accident in July. With Fidesz’s two-thirds majority at stake, the by-election became an important test for the opposition’s ability to turn out voters. In 2018, Fidesz did not win a majority of voters in the district (around 49%), so the hope was that a joint opposition candidate could win in a two-horse race. Instead, Fidesz won with an increased share of the vote (51%) while the opposition won 46%.

This interim election shows one of the key weaknesses of the opposition going into the 2022 elections. Although opposition parties can usually rely on narrow majority support from voters between them, in opinion polls, the reality of turning out these voters for joint candidates is much harder. The opposition candidate in Borsod-Abaúj-Zemplén, for example, had been the Jobbik candidate in 2018 and was revealed to have made anti-semitic and racist comments. While he was clearly a weak candidate, that the opposition failed to turn out voters in this interim election may show the difficulties of winning single-member districts, even with joint opposition candidates.

This problem is, of course, by design. The electoral system passed by Fidesz in 2011, through its complicated tier system, penalises fragmentation and increases the importance of single-member districts. These districts are skewed, so that there are a small number of very liberal, left-leaning districts – mostly in Budapest – while the vast majority of other districts have comfortable Fidesz majorities. The electoral system has rewarded Fidesz’s strategy of swinging to the right, devouring the Jobbik base and leaving the liberal opposition to squabble over central Budapest. The only strategy open to the opposition is to run joint candidates and a joint list – as they plan to in 2022.

The centrality of single-member districts in the new electoral system increases the importance of political geography, which is why I was pleased to see that the interim election results were posted on a precinct level (as they usually are) including a map of each precinct (which is new). Precinct level election results – if rolled out to every district – could be key to better understanding voting patterns and potentially spotting cases of gerrymandering.

Mapping Hungarian election results by precinct has been done before (and possibly elsewhere that I have not seen) but the lack of precinct geography data makes the process slow and imprecise. In this case, precincts were shown on Google maps, from which I could scrape latitude and longitude data. Unfortunately, this data was not perfect, as can be seen in this first map.

The precincts do not cover all urbanised areas, they are overlapping in places, and in some cases they have very questionable boundaries. Still, we can already begin to see the broad voting patterns. Tiszaújváros, in the south of the district, voted overwhelmingly for the opposition, in many areas by over 50%. This is the largest town in the district, so makes up a large proportion of the opposition vote share. Meanwhile, the other towns – Szerencs, Tokaj and Szikszó – were more evenly split, while rural areas went heavily for Fidesz (some by upwards of 70%).

This is mostly consistent with what I expected. While Jobbik previously performed well in rural areas, the new joint opposition is mostly confined to towns, which tend to be more educated and younger (although I haven’t looked at census data below district-level for this blog post). Fidesz has maxed out its support in rural communities that tend to be overrepresented in the new electoral system. In this case, for example, the district is very efficient for Fidesz, with rural areas neutralising more liberal towns – a pattern which is seen across the country.

To get a better look at these patterns, and to solve the problems of overlapping precincts (i.e. to make the map look prettier) I use a Voronoi diagram based on the centroids of the precincts used above. The Voronoi diagram means that the whole area of the district is coloured according to its nearest centroid, creating a much more visually pleasing image and allowing the patterns to be parsed instantly.

Again, we can see opposition support concentrated in Tiszaújváros while rural areas are much more pro-Fidesz. However, this is not a particularly accurate representation of the precincts, which generally have similar locations but dramatically different shapes, especially in the largely uninhabited areas in the northern portion of the district. And, obviously, it’s important to remember that land doesn’t vote.

To better represent the voters of the district, I used David Zumbach’s very helpful R script to produce an animated comparison of precinct results, with circles approximately proportional in size to the number of voters, similar to his map of Swiss referendum results.

In Borsod-Abaúj-Zemplén’s 6th district, a large number of votes were cast in the densely populated Tiszaújváros, which made up about 17% of votes in the interim election, and voted heavily for the opposition. In the remaining 83%, Fidesz won a modest majority, handing it the district as a whole.

In general, the precincts were extremely lopsided, with the average margin for the winner in each precinct being over 24%. This means that precincts’ positions on either side of electoral boundaries are extremely consequential. Under the 2010 electoral districts, for example, Tiszaújváros was wholly contained by a much smaller electoral district, making an opposition victory more likely (MSZP won the district in 2006).

Mapping precincts may seem inconsequential, but precinct data can be key to studying electoral trends and electoral systems. In an electoral authoritarian regime like Hungary, the latter is particularly important. More precinct data could help us understand Hungary’s opaque redistricting process and anticipate Fidesz’s electoral strategy. Democracy is about much more than voting, but in a system designed to hide democratic expression in many subtle ways, precinct voting data can be illuminating.

Uniform swing outperformed state polls for the first time in US polling history in 2020

The 2020 election had a huge focus on state polls – far more so than in 2016, when more national polls were commissioned and discussed. This makes sense given the distorting effect of the electoral college. The renewed focus was supposed to ensure another upset would not happen again, with higher quality polls in swing states; journalists running fewer stories solely about national polling; and more polls conducted in the upper midwest, which unexpectedly swung to Trump in 2016. Despite this focus, state polls in 2020 performed even worse than 2016, by some measures.

Average absolute error for state polls and national polls, taken from 538 averages here.

One particularly striking thing about the 2020 election results was the very low variation in swing from Biden and Trump. Aside from a couple of notable exceptions (looking at you, Florida), states moved far more uniformly than state polls predicted. For example, none of the ‘long-shot’ Trump states – Ohio, Iowa, South Carolina, Alaska, Montana, Kansas, Missouri, Utah – swung as heavily towards Biden as polls suggested they might. Meanwhile the supposed gap between Wisconsin and Michigan – where Biden polled very well – and Pennsylvania where he lagged slightly, did not materialise. The majority of states moved between 1% and 5% towards Democrats vs 2016, with the lowest standard deviation of state-level changes since at least 1976 (and probably much longer).

The Ridgeline Plot visualises the change in distribution of state-level changes in Democratic margin. For these, I relied heavily on very useful guides to the JoyPy library here and here.

This Ridgeplot shows the striking consistency of 2020 election swings, having clearly the lowest standard deviation of the density plots. Another pattern is the difference between election years and re-election years. Both 2012 and 2004 have much lower state-level deviation than their preceding cycles. In a more polarised environment, re-election years are far more focussed around opinions on the President, which are best predicted by voting patterns of four years before. If either Biden or Trump runs again in 2024, we should expect little variation in state-level swings.

Given the pattern of falling state-level deviation in recent cycles, it may be the end of huge state swings. In 1976, for example, Georgia swung from a 50% margin for Nixon to a 34% margin for Carter. The Democratic margin increased by 84% in Georgia, while it increased by 25% nationwide. This swing is so huge that it makes the x-axis of my ridge plot look a bit silly, and it is practically unthinkable in 2020.

The confluence of lower deviation in swing and lower accuracy in state polls means that in 2020 uniform swing outperformed state polls in swing states for the first time. By uniform swing, I mean taking the change in Democratic margin nationwide predicted by polls since the last election and applying it uniformly to every state. In election years with significant localised dynamics, this approach fails abysmally. In Georgia in 1976, for example, a uniform swing would have predicted a Carter loss by 26%. But in re-election years, and especially in heavily Republican or Democratic states, uniform swing performs surprisingly well.

As we can see in the plots above, the average error of uniform swing follows the same pattern as standard deviation – lower in recent cycles and reelection years. State polls are usually stronger in swing states while uniform swing is usually fairly consistent across all states.

Of course uniform swing should not replace state polling – this is the first time uniform swing has outperformed state polls in the history of US presidential polling and it seems unlikely to happen again in 2024 – but perhaps uniform swing should be used to anchor our expectations, especially outside of the swing states. In 2020, polls in a large numbers of deep-red states were suggesting huge swings towards Biden – far larger than national polls implied. Weighting our expectations slightly more towards the national picture might have made us more sceptical of presidential and senate polls in states Trump won by huge margins in 2016.

Meanwhile, while state polls have now had above average error for three elections in a row, they are still an important tool for predicting election results. The most important thing is to interpret them as fuzzy/imprecise indicators. As a general rule, we can say any state polling with a lead under 7% has a significant (but still small) chance for an upset, while any state polling with less than a 5% lead should be considered competitive. In 2020, this would have given us a pretty clear indication of where the election was headed, but with a bit less shock when North Carolina (polling lead 1.7%) and Florida (polling lead 2.5%) failed to materialise for Biden.

Results of logistic regression of probability Democrats win a state predicted solely by the 538 polling average on the day of the election

Bijlmer: Back to the City of the Future

This was originally posted as an article for The Outward Bound: Bijlmer: Back to the City of the Future

Crouching over my new bike, despairingly trying to reattach the chain, I hear a shout in Dutch from the side of the road. I look up, confused. Surely it can’t be directed at me. Having only arrived in Amsterdam two days before, I am not used to being spoken to by strangers in the street. What could this man want? Clearly, my confusion betrayed my nationality, as the man on the pavement grinned, switching to English – ‘Can I help with your bike at all? Looks like your chain is loose!’ 

I thank him, embarrassed, and start dragging my bike away, reflecting that the people of Bijlmer, my neighbourhood in Amsterdam’s Zuid-Oost, seem to be much friendlier than the imposing buildings that surround us.

Before long, I find myself on the ground floor of one of these buildings. Built in the 1960s, Bijlmer was originally conceived by modernist architects as a ‘city of the future’, embracing the design principles of Le Corbusier. While elements of this design style exist in cities across the world, Bijlmer was intended to be the model modernist city, embracing concrete, elevated roads and high rises. The tower blocks are organised along hexagonal, ‘honeycomb’, grids – huge, impersonal, and reaching ten floors or more. Inside, the bike mechanic says he can fix my bike but unfortunately ‘I can’t replace your clothes’. We laugh at the bike oil covering my t-shirt and jeans and he offers me soap to wash my hands. I guess that is what you get for buying a rusty second hand bike for only €40.

The cramped bike repair shop is typical of the diverse independent businesses run out of the tower block’s ground floor. Further around the hexagon, a shop window advertises driving courses in Dutch, English, Farsi and Arabic, and a travel agent features a large ‘Surinam Airways’ logo which looks like it was designed in the 1980s. These small shops and businesses provide a streak of colour to the concrete structures. In the center of the hexagon is a wide grassy space spotted with trees, with a cycle path running through it.

This diversity, and particularly the connection with Suriname, is the story of Bijlmer, a neighbourhood which proudly claims to be home to over 150 nationalities. After construction, the modernist dream failed to live up to expectations and many of the blocks lay empty. Meanwhile, after Suriname’s independence in 1975, thousands of Surinamese used their Dutch citizenship to move to the Netherlands, where they struggled to find accommodation. Racist landlords and housing associations in many areas, including Bijlmer, enforced quotas for black tenants – despite Surinamese immigrants having full citizenship and flats being empty. Eventually, black activists took action and began to break into Bijlmer apartments, squatting until the Dutch government relented and offered them full tenancy agreements. These squatters were trailblazers who helped establish a vibrant black Dutch community in Bijlmer, to the chagrin of racist whites.

Sadly, racist perceptions of majority-black Bijlmer coupled with an opioid crisis meant the neighbourhood was increasingly alienated from Amsterdam and in disrepair. Through the 1980s, the area declined steadily. In 1992, further tragedy struck when a cargo plane crashed into a block of flats, killing at least 43 people. However, since the mid-1990s the area has experienced remarkable regeneration. Many of the largest tower blocks were demolished, to be replaced by smaller individual units, and the transport focus of the area was moved from cars to bikes, leading to the removal of elevated roads. To be clear, this gentrification was controversial, with many tenants going through lengthy legal battles to try and save their homes, but unlike many gentrification projects, the new residents of smaller units tended to be 2nd generation immigrants that formed Amsterdam’s black middle class, rather than whites. Indeed, many black residents who left Bijlmer in the 1980s returned, meaning the area retained a distinctive identity.

Walking back to my apartment to wait for my bike to be fixed, I can see why Bijlmer has become one of Amsterdam’s coolest neighbourhoods. The shift in focus from cars to bikes means wide open green spaces, flooded with sunlight, and the remaining tower blocks are divided by cycle paths and canals, softening their harsh exteriors. Here, the importance of public space is obvious. Statues and art installations are everywhere, and murals scale the buildings. Markets sell fresh produce and goods from around the world. The people of Bijlmer, in a very physical way, have reclaimed the concrete concourses and relegated the grey buildings to the background of vibrant community life. Another British student living in the area says it reminds him of Stratford, London: a diverse area encroached on by newer sports arenas (in Bijlmer’s case the Johan Cruijff ArenA – home to Amsterdam’s Ajax football club), shopping and entertainment complexes. But unlike many neighbourhoods, Bijlmer manages to retain its sense of community, punctuated by bustling markets, huge murals and diverse independent businesses. As my British friend puts it, ‘at least here I know I can still buy plantain and shea butter.’

Classifying Parliamentary Constituencies by Petition Signatures

Parliament’s official petition website (petition.parliament.uk) has gained considerable traction in recent years, bringing attention to wide-ranging issues. From 2010 to 2015, the most popular petition on the website received 328,000 signatures, with 39 more receiving over 100,000 signatures. In the 2017-2019 parliament, by contrast, the most popular petition received over 6 million signatures while 75 others received over 100,000 signatures. These petitions are a valuable source of data for measuring public opinion, especially since petition signatures are reported at the level of Westminster Parliamentary Constituencies.

As you might expect, there is significant variation between different constituencies. Some local petitions garner support in concentrated areas while many national issues have unevenly distributed signatures. These patterns can tell us about the nature of public opinion and interests.

Clark, Lomax and Morris (2017) use this data to classify parliamentary constituencies into four groups: Domestic Liberals (N=110), International Liberals (115), Nostalgic Brits (276), and Rural Concerns (149). These groups are closely related to Brexit, with Domestic and International Liberals having high numbers of signatures for anti-Brexit petitions whilst Nostalgic Brits and Rural Concerns have relatively few. In 2020, a very different set of petitions are popular (in a very different political climate). I though it would be interesting to see how constituencies cluster with the new set of petitions (since December 2019). These are the top 50 petitions of the 2019 parliament:

Using the K-means clustering algorithm* on the standardised petition data by constituency, we get four clusters based on the 50 petitions. These clusters are summarised below:

  1. Liberal Towns (N = 114)
    • Typical constituencies: Birmingham Edgbaston, Wycombe, Croydon South
    • Higher support for petitions concerning education (eg 2, 4, 15 and 39)
    • Higher support for international issues such as Yemen and China (8 and 16)
    • Lower support for animal welfare petitions (eg 26, 31, and 36)
  2. Urban and Student Issues (N = 56)
    • Typical constituencies: Putney, Manchester Withington, Hammersmith
    • Much higher support for petitions relating to racism or ethnic minorities (eg 5, 9 and 20)
    • More concern about economic impact of Coronavirus, especially on the arts (eg 11, 13 and 27)
    • Lower support for animal welfare petitions
  3. Animal Welfare and Public Services (N = 386)
    • Typical constituencies: Newark, Fareham, Fylde
    • Much higher support for petitions relating to animal welfare
    • Higher support for anti-immigrant petitions (eg 19, 46)
    • Lower support for petitions relating to racism or education
  4. Devolved Regions (N = 94)
    • All constituencies in Northern Ireland, most in Scotland, many in Wales
    • Lower support for most petitions but particularly those in devolved areas eg education
    • Wolverhampton South East is an unexpected (and only English) member of this group, which may be because of lower engagement with the petitions website generally

You can explore the map of constituencies by their clusters here.

So, how do these clusters compare on the petitions? Box plots for all 50 are below.

Some of the noticeable ones are 15 – a high score for Liberal Towns and virtually no signatures from regions where education is devolved, 3 – in which urban areas (particularly central London) expressed concern about Coronavirus before the lockdown was imposed, 31 – one of the animal welfare petitions that Cluster 3 was named for, and 25 – one of only petitions for which the Devolved Regions group had the highest number of signatures.

Finally, out of interest, I looked at the most prominent petitions in each constituency (that is, the petition which over-performed most in that constituency relative to others). The map to explore this data is here.

There are lots of patterns to be found on the map but I spotted two interesting ones from a quick look. Firstly, the higher number of signatures for “Support the British aviation industry during the COVID-19 outbreak” around Heathrow and Gatwick:

And secondly, higher number of signatures for “Take action to stop illegal immigration and rapidly remove illegal immigrants” for most of Kent and Thurrock.

There are loads of odd/quirky patterns on the map which I’ve probably missed so check them out here and here and let me know!

*Clark, Lomax and Morris use a Gaussian Mixture Model (GMM) to find their clusters, so this isn’t a perfect like for like comparison. I found that because GMM uses soft classification (giving a finite probability that each constituency will belong to a class) and the clusters can be non-spherical, the GMM method made it harder to see some distinctions between clusters in the 2019 Parliament data.

Coronavirus, Individualism and Social Solidarity

This was originally posted as an article for the Social Review: Coronavirus, Individualism and Social Solidarity

After a brief hiatus, politics is back in full flow. Normal disputes have replaced the relative unity of the first weeks of lockdown. But we can be sure that post-pandemic politics will not look the same as the politics we left behind. The coronavirus pandemic has impacted the lives of millions of people and will undoubtedly inform the politics of years to come. For the left, this can be an opportunity. The pandemic revealed the extent to which we rely on each other. This can hopefully be the basis for a new sense of social solidarity and the death knell for Thatcherite individualism. Unlike during the 2008 financial crash, the most recent comparable crisis, the Labour Party is in opposition with fresh leadership. In this context, Starmer’s Labour cannot win by replicating New Labour’s appeal but should forge its own distinctive message based on social solidarity.

In the 1970s and 80s, Margaret Thatcher’s Conservatives embraced the idea of the ‘economic man’ – cold, calculating and concerned only with the wellbeing of himself and his family. Rather than seeing them as members of communities, Thatcherism reimagined the public as consumers, expecting the lowest price for government services. This paved the way for mass privatisation and the decimation of the welfare state. Thatcherism ended the post-war consensus and reconfigured politics around a new common sense: economic liberalism coupled with traditional conservative themes of family, nation and law and order. Though Thatcher is dead, the model whose creation she oversaw  has lived on. While the Blair and Brown governments expanded the role of the state in some respects, their reforms were hamstrung by New Labour’s continued commitment to Thatcherite rhetoric on individualism, embracing the language of efficiency, marketisation, competition and value-for-money.

Covid-19 has shattered this individualistic view of society. While the effects of the pandemic are certainly felt disproportionately by some groups, the coronavirus can be spread by anyone. Almost everyone knows someone who is shielding because of an underlying health condition, an essential worker with increased exposure to the disease, or an older person who is at acute risk. Ironically, social distancing has revealed how interconnected we all are. While it might seem in most individuals’ immediate interests to carry on life as normal, the pandemic requires collective action and collective sacrifice. The Thatcherite model is not equipped for this. As Boris Johnson himself has said, “what the coronavirus crisis has already proved is that there really is such a thing as society”.

Coronavirus can be seen as presenting many with a kind of prisoners’ dilemma. For many individuals, especially the young and healthy, the rational response to coronavirus, at least in theory, is to carry on life as normal, enjoying social interaction and preserving material wealth. But everyone is bound to suffer as a result of individualistic rationalism, from an overwhelmed health system, economic harm (caused by the huge loss of life as well as by plummeting consumer activity), and the suffering of loved ones. The economic man is doomed to failure in light of coronavirus.

Some countries have taken a Hobbesian approach to this problem, concluding that only coercive power can overcome coordination problems. China’s huge state capacity has been mobilised in the service of widespread surveillance and implementing restrictions on potential virus-spreaders. In Hungary, Prime Minister Viktor Orbán announced a state of emergency, granting his government the right to rule by decree indefinitely. While this has now ended, Orbán used the opportunity to cement his power. In Singapore, stay-home notices are enforced with regular texts and calls from the government, with potential penalties of up to six months in jail, fines worth thousands of pounds, or both.

In the UK, we should not confuse intrusive or inconvenient social distancing measures with the reach of a totalitarian state. Frankly, the state here lacks the capacity to truly enforce social distancing. Instead, lockdown was made possible by a sense of social solidarity. Indeed, researchers from UCL and the LSE found that fear of deterrence or catching the virus was not a predictor of lockdown compliance. Instead, people were motivated by social norms and support for the NHS. Researchers found that 87% of people agreed that ‘observing the social distancing laws shows other people in my community that I care for their safety’ and 82% agreed that ‘following the social distancing rules helps me feel that I am part of the collective fight against the pandemic’. This is a massive shift in our expectations of the public, from atomised economic individuals to participants in a broader community.

As yet, it is not obvious how this shift will affect public policy preferences but early research suggests a clear change in perceptions. Polling for More In Common revealed that the number of people who see Britain as a society ‘where people look after each other’ has tripled. BritainThinks found that only 12% of people want life to return to normal “exactly as it was before” once the pandemic is over. Meanwhile, support for a Universal Basic Income (UBI) has surged, with respondents specifically citing that UBI would support those who do not usually rely on welfare. In the wake of the pandemic, people are looking for solutions which recognise a shared experience of the pandemic which has touched on almost everyone in some way.

For the left, this is an unparalleled opportunity. Since 1979, Labour has struggled to adapt to Thatcherite hegemony, governing for only 13 out of 41 years. New Labour’s success was built in part on convincingly adopting in the language of individualism. Now, Coronavirus may finally signal the end of the Thatcherite paradigm, leaving Labour with a real chance to shape a new hegemony as it did in the 1940s. Keir Starmer’s party must resist the temptation to try and turn back the clock to New Labour, and it must decisively reject the language of individualism and citizens-as-consumers. The re-emergence of social solidarity necessitates a renewed commitment to universalism and cooperative ideals. This approach can address the immediate concerns of those most affected by the pandemic – essential and precarious workers, BAME communities, those with underlying health conditions – as well as drawing on the burgeoning social solidarity of more unlikely potential Labour supporters.

Coronavirus presents the Labour Party an unprecedented opportunity to reshape society, and we would do well to grasp it with both hands.