No Panaceas

Saturday, October 11, 2003
On the other hand: This is why legislators hate academics. Most of us can't give a definitive answer to anything. While driving home yesterday I thought about the Schwartzman effect some more and decided that there is another way to test for punchcard error. By the way, I think examining Schwartzman's vote is something more than an exercise in trivia. What makes this case interesting is it's a situation where we have reason to believe that virtually no one voted for this guy on purpose. Thus most, probably 99%+ of his vote total is error -- human or machine. This makes this case a better one to examine than, say, the butterfly ballot fiasco in 2000 where Buchanan was a well-known candidate. With the butterfly ballot it was hard to disentangle Buchanan's intended vote from his accidental vote. Here that's not a problem.

On Friday I compared Schwartzman's vote in non-punchcard counties with his vote in punchcard counties. I argued that the difference across the two counties was neither statistically nor substantively significant. Another way to think about the problem though is to develop a model of Schwartzman's vote. I think we can assume that Schwartzman's vote was primarily a function of Schwarzenegger's. Yet, Schwarzenegger's vote varied across counties. Implicitly the analysis I did on Friday did not take this variance into account. So let's model it directly.

[Somewhat Technical Bits - (skip this para if you want to get to the bottom line) - Without getting deep into the technical specifics here is what I did. Using OLS I regressed Schwartzman's county by county vote totals on Schwarzenegger's. This initial result was exactly what you would expect: Schwarzenegger's vote correlated very very highly and positively with Schwartzman's. To get at the real issue though I needed to put in a few more variables. First, I included the number of votes cast in each county as a control. (You'd expect that both Schwarzenegger's and Schwartzman's vote totals will rise as the number of votes cast in the county rises.) Then I included a dummy variable for punchcard/non-punchcard counties. Then I interacted (multiplied) the punchcard variable with Schwarzenegger's vote variable. This let me get at the impact of Schwarzenegger's vote conditional on the presence of the punchcard ballot.]

Bottom-Line: I found two things: 1) Regardless of voting method the more people voted for Schwarzenegger, the more people voted for Schwartzman; and 2) the effect was slightly more pronounced -- and the difference is statistically significant -- in the seven punchcard counties.

The graph that follows shows the predicted Schwartzman vote for punchcard ballots and non-punchcard ballots:

The blue line shows Schwartzman's predicted vote in punchcard ballot counties given particular vote totals for Schwarzenegger. The purple line shows the same for non-punchcard counties. Look at the chart with care. What you should get from the chart is that the slopes are very very shallow. (The vertical axis only goes up to 2700 while the horizontal axis extends out to 800,000.) Namely, the slopes are about .003 and .002 for blue and purple, respectively. Nonetheless, these slopes are statistically significant from each other and each is statistically significant from zero.

What do they tell else? Let's take Los Angeles county, a county that uses punchcard ballots. Schwarzenegger got just shy of 790,000 votes in LA county so I have set that as the upper bound. The model predicts that given 790,000 votes for Schwarzenegger and punchcard ballots, Schwartzman gets 2539 votes. An LA-sized county without punchcard ballots yields Schwartzman 1385 votes, a difference of 1154. Is this a substantial amount? It's banal to say that any error is too much error? How much is acceptable? Obviously it depends on the likelihood of a super-close election.

But let's call it substantial for the sake of argument. Is the effect truly caused by punchcard ballots or are we just picking up a different variant of human error? Namely is their something about these seven counties -- like a lot of English-as-second-language speakers -- that may exacerbate error? I have some ideas for testing this notion, but I'd need more sophisticated data than I have now.

[More Technical Stuff -- Stop now if stats makes your eyes glaze over. First, now that I think about it I did not run a F-test comparing the main-effects and product-term models. So it might be that interacting punchcards and votes does not add much when looked at from an R-square perspective. Second, technically OLS is not perfectly suited for this type of data. (Hey, it's not like I'm getting paid for this.) The data is censored on one end -- there are no negative votes -- so something like tobit might be more suitable. I seriously doubt the results would differ all that much but if I were writing a paper on this I'd do take this issue into account. Obviously the data might be specified differently. One could use proportions, for example, and then eschew the control for total votes. My guess is such a model wouldn't get a very good fit. Plus, since the denominators vary so much you'd probably get a nasty case of heteroskedasticity and I, for one, don't believe that heteroskedasticity should be discussed on a family blog. Finally, the fits are terrific (>90% r-square), but looking at the residuals do show a few counties that stray from the line a bit. These outliers might provide some further clues about the impact of the other ballot types that I've clumped together into one category. I may return to this issue later.]

Friday, October 10, 2003
Turnout update: California's Secretary of State now reports 8,374,681 ballots cast. Given 21,122,481 potential voters that yields an actual turnout rate of 39.6%. The Secretary of State incorrectly portrays turnout as 54.4%.

The Schwartzman Effect: One of the recall elections's curiosities was the ninth place finish of George B. Schwartzman. Mickey Kaus (scroll down to October 8) at Slate raises the pertinent question. Was this because of machine error or was it simply because some voters confused Schwartzman with a well-known actor who was also a candidate? In an update, Kaus notes that one of his readers found that the Schwartzman effect came from the punchcard machines. I'm less sure. Here are some numbers.

First, while it is true that Schwartzman came in ninth, he nonetheless received fewer than 11,000 votes. That is less than 0.2% of the total vote. So the numbers are small and it would take an exceptionally close election for this error to matter. That said, what are the differences across counties? Using data from the Secretary of State's website, I divided Schwartzman's vote totals into two categories: votes from the seven counties that used punchcards and votes from counties that did not use punchcards. Schwartzman's vote totals were as follows:

Punchcard counties: 5336
Other counties: 5621.

So the punchcard counties provided a bit less than half of Schwartzman's votes. Yet these counties -- Los Angeles, San Diego, Santa Clara, Sacramento, Solano, Mendocino and Sierra -- made up more than forty percent of the votes cast for governor. What we need to consider is the proportion of the vote Schwartzman received from the two groups of counties. Did Schwartzman's vote proportions differ significantly across the county types? Here are the numbers:

Punchcard counties: .00166
Other counties: .00125

Thus there is a .00041 difference between two proportions. From a substantive perspective that's getting really small-fry. Across a million votes .00041 adds up to a 410 vote difference. Is the effect statistically significant? Very technically it is since these numbers are populations not samples. But if you treat them like a sample and a run a difference-in-proportions test then the effect is not statistically significant at anything approaching acceptable levels of confidence. (For people wanting to check my math the overall vote totals were punchcard = 3220204 and non-punchcard = 4490601).

I am no fan of the punchcard voting, but I think the more likely explanation is simply that voters confused Schwartzman with Schwartzenegger.

UPDATE: I've looked at the question in a slightly different way this (Saturday) morning. I'll post the results when I have a chance.

Further UPDATE: I've now posted a new analysis. Like a true social scientist I contradict what I posted here (kinda).

Wednesday, October 08, 2003
Turnout: I hate the way the press -- aided by election officials -- report turnout. Today's reports are ecstatically stating that yesterday's turnout rate was huge. It was perhaps 60%, maybe even more than 71% once all the votes are counted.

Uh, no. Let's just say that this is manure of the male bovine extraction.

This business of 60-71% is based on turnout among registered voters. Such a statistic ignores the millions of potential voters who did not register. To use an extreme example, let's say that you have a state with 100 adult citizens. Ten of these people registered to vote and seven of them cast a ballot. By the media's logic this is a turnout rate of 70%, despite the fact that only 7% of the electorate voted.

So what was the turnout yesterday? According to Michael McDonald at George Mason University, the number of potential voters in California (Adult citizens - prisoners) is 21,122,481. Okay. Thus far a little over 7.6 million votes were cast in the recall part of the election. Let's be very generous and say that turnout reaches 11 million. That would be 71% of California's 15.4 million registered voters . But the true turnout rate would be 52%. Looking at the current numbers I would say that eleven million is very very optimistic. A more realistic turnout of ten million yields a rate of about 48%.

In truth, 48% is pretty good, especially for a non-presidential election. But obviously it is a long way from 71%.

This is not just an empty exercise in number-crunching. Reporting numbers in the 60-70% range helps create an illusion that this election truly reflects the will of Californians. The reality is that only about a quarter of Californian voters chose the recall and a bit less than that chose Schwarzenegger to be governor. (By the way, you can say exactly the same type of thing about Davis's previous two victories.)

Just in case you thought intelligence was a prerequisite for getting elected to Congress: Everybody already knows about Rep. Ballenger (R-NC) claim that one reason why his marriage failed is because the Council on American-Islamic Relations was headquartered near his home. But what's getting ignored is the further claim that his marriage failed because of House ethics rules. Check out this bit:

In addition to CAIR, Ballenger told the newspaper that another source of stress on the marriage was the 1995 decision by "holier-than-thou Republicans" in the House to ban gifts from lobbyists. The meals and theater tickets from lobbyists once meant "a social life for [congressional] wives," Ballenger said. His wife agreed, saying, "Just a dinner now and then" would do no harm.

Hmmm. So all this time we've been told that marriages are threatened by things like unfaithfulness, financial problems, no-fault divorce laws, and gays and lesbians.

Now it turns out that the reason why people get divorced is because there is no free lunch.

The Supremes should pop open a bottle of champagne: The people who should be most happy about the California elections results are federal judges, especially the justices on the Supreme Court. The decisiveness of both the recall election and the governor's race means that neither Davis nor Bustamante have grounds to pursue an equal protection claim. Plus the ostrich initiative -- that would ban collection of information pertaining to race -- failed too. Its passage would have provoked a bunch of difficult and contentious court cases. So now the Supremes can focus on the important stuff, like whether Virginia can extend a water pipe farther across the Potomac.

Monday, October 06, 2003
Just in case you thought racism was dead: Here is Pat Robertson talking about Morgan Freeman, in defense, I guess, of Limbaugh's McNabb comments:

He started off playing a chauffeur in 'Driving Miss Daisy,' and then they elevated him to head of the CIA, and then they elevated him to president and in his last role they made him God. I just wonder, isn't Rush Limbaugh right to question the fact, is he that good an actor or not?

A white mule in the garden: The Cajun humorist Justin Wilson used to tell this story about how a farmer was showing off his garden to a neighbor. The neighbor commented that it was indeed a mighty fine garden but don't you know it would be a whole lot better if it had a white mule in it. All the best gardens feature a white mule. The farmer puzzled over why his garden needed a white mule but, since he needed a mule anyway, he decided that he might as well get a white one.

The story goes on from there but the ESPN-Rush Limbaugh saga reminds me of this the white mule story. Basically some really bright executive at ESPN -- who along with Howard Dean's staff probably thinks the Jets are in Philadelphia -- decided that what ESPN needed to spice up their pre-game show was a clown. I imagine that the other executives reacted about the same as the farmer. "A clown, why do we need a clown for a football show?" "All the other shows have them," came the reply. So the executives looked around and decided that they needed a new analyst anyway, so it might as well be a clown.

So they went and hired Rush Limbaugh. He must of been the logical choice since Jimmy Kimball, Jerry Glanville, Deion Sanders, Dennis Miller, and that doughboy now on Fox were already in use or used up. Except somebody forgot to tell these executives that Rush Limbaugh's brand of humor is the type that provokes protest and scares major advertisers away. Oops. You'd think they'd know that his reservoir of "common fan" musings would dry up faster than his supply of oxycontin. (I'm sorry, I couldn't resist. Neither could Arnold, apparently.)

I just hope ESPN fires the executive who came up with this idea before he suggests they get a Playboy model to do weather reports.