A Wichita State University statistician, Beth Clarkson, has been in the news lately regarding a study she conducted on voting results in several states, and specifically her challenge in her home county to release election records to sort out whether voter fraud took place in the 2014 election. In a nutshell, she replicated an earlier paper by Choquette and Johnson that looked at the 2008 and 2012 elections; the authors sorted precincts in a county or state by the number of votes cast in each precinct, then plotted a “cumulative precinct vote tally”, displaying the change in the vote share for a particular candidate starting with the least populous precincts, and adding in the more populous ones to the running average moving rightward in the charts.

The strange effect that Choquette, Johnson, and Clarkson find is that there is an upward trend in these charts after a certain point in precincts that used certain types of electronic voting equipment, which they take as an indication of vote fraud. While Dr. Clarkson doesn’t specify the exact method of fraud she suspects, Choquette and Johnson theorize that “vote-flipping” is occurring – in more populated precincts, where it is easier to hide and more efficient, votes are changed from one candidate to the preferred candidate of the fraudster, leaving the total number of votes cast the same.

In this post, I’m going to replicate the 2014 Kansas Senate race results Dr. Clarkson presents, as this has become the newsworthy case given Clarkson’s challenge in her home county, and provide a couple comparisons to explain why this trend occurs.

I followed the links Clarkson posted and merged together the election results and voting methods, and do get the same results she does in her charts. For instance, this is the cumulative vote tally for precincts with 500 or more votes cast that used the DRE ES&S equipment, the most common one in her universe of precincts:

This appears to be identical to the chart she has in her paper, and I get a significant positive trend when carrying out a linear regression, as she does.

The first nitpick is with the way Clarkson carries out her linear regression. Because of the way the cumulative vote tally works, if the first couple precincts (i.e., the ones at or closest to 500 votes cast) are outliers – which in this case, as you can see from the graph, they are a lot more Democratic than average – this can make the regression more significant than it really is. Those first couple outlier points drag down the regression line on the left side, making the positive slope greater than it would be otherwise. Indeed, if we change just the first two observations to be as Republican as they are currently Democratic, we get a significant *negative* slope. A clearer picture emerges when we change our cutoff for inclusion to 503 votes instead of 500, which cuts off those first two observations, and we get the following chart, which does not have a significant slope in a linear regression:

Still, though, people may find the positive trend starting about halfway into the graph worrying. It’d be nice if there was an independent check that could be carried out to see if this pattern holds in a non-voting-machine situation.

And it turns out there is something we can look at: Kansas is a closed primary state, meaning that voters need to register with a party to participate in the primaries. More than two-thirds of the precincts that make up the DRE ES&S chart above are located in Dr. Clarkson’s home county of Sedgwick, and they provide voter registration breakdowns by party by precinct. Granted, about a third of the county’s registered voters register with no party affiliation, but graphing the cumulative Republican share of registered voters, sorting by votes cast in 2014 as above, results in this pattern:

That’s essentially the same pattern as found above; in fact, the cumulative vote share for Republicans in Sedgwick County correlates with the cumulative registered voter share at r = 0.88. A vote fraud argument would need to explain both the change in vote tallies *and* the change in party registration in the official voter file. The simplest explanation seems to be the best here: the vote share for Republicans increases as precinct size gets larger because Republicans disproportionately live in larger precincts.

Another interesting question is whether this trend can be explained by some other factor that correlates with precinct size, such as household income. Dr. Clarkson doesn’t rule out this possibility, but Choquette and Johnson quite forcefully claim that there is no such factor, and provide demographic tests on page 6 of their paper. However, the fatal flaw of their income checks is that they do not run a cumulative tally by precinct, as they do for vote totals, but instead by county. Essentially, they’re not testing the possible correlation at all – removing the neighborhood-to-neighborhood variation that precincts provide covers up any potential effects that might be found.

To be fair, one major hurdle is getting income data at the precinct level; the Census just doesn’t provide that information. They do provide it through the American Community Survey at the tract level, which is slightly larger than the precinct – there are about half the number of tracts in Sedgwick County as there are precincts that reported results. Still, though, we can get a rough cut of household median income by assigning to each precinct the figure given to the tract it is contained in, and if it is split two or more ways, weight the figures by the share of the population that falls into each tract.

We can then sort the precincts by votes cast in the same way as the first chart, and keep a running average of household income weighted by the number of voters in each precinct (which will boost the final income figure versus all residents, since people with higher incomes are more likely to vote). This gives us the following figure:

Which again looks like our original chart, and correlates with cumulative vote share at r = 0.69.

There’s not a straight linear relationship between income and propensity to vote Republican – there are certainly lower-income rural areas that lean quite conservative. However, most of these rural voters are in precincts too small to make Dr. Clarkson’s 500-vote cutoff. Only 509 out of 3786 (13%) precincts statewide make her cut, which leaves this largely an examination of urban and suburban precincts, where the income to Republican preference trend may be more linear.

This just reinforces the point hinted at earlier: voters are not randomly assigned precinct sizes, so the expectation that Choquette, Johnson, and Clarkson have of finding random results in their cumulative vote share graphs is starting off on a faulty assumption. While the charts may be explainable through vote fraud, there are other, perfectly innocuous explanations that can be put forward, as well.