| Analysis

2022 Ontario Provincial Election Forecast

Methodology

The Signal is based on the mechanics of a Bayesian dynamic linear model. This type of model forms the basis of forecasting models currently in use for US elections for which many will be familiar, such as those by the New York Times and FiveThirtyEight.

Our variant of the model accounts for two biases in the polling industry. First, we account for the fact that pollsters differ systematically between each other with respect to whether they over- or under-represent certain voters. For example, compared to the polling industry average, some pollsters might over-represent Conservatives party voters; others, NDP voters. The model accounts for these differences dynamically, such that each poll that is released is filtered for our current estimate of that bias. Polls over multiple years are used to calculate these “house biases” so that they themselves are recalculated each time a new poll is released. Second, we account for bias in the polling industry as a whole by using data from previous elections. In the Canadian context, these biases are relatively small, although not insignificant: even small differences in national vote share can have relatively large effects on seat share.

Because there are many days on which polls are not released and because polls contain sampling error, the model uses information about where vote intention stood on one day to inform where it stands the next day. If a new poll is released, vote share estimates for that day effectively become a weighted average of information from the newly released poll and from information about where vote intention stood on the previous day. This means that outlier (and all other) polls are effectively pulled in toward the previous day’s forecast. Visually, this means that vote intention across time will appear relatively smooth, as we would expect it to in reality.

This differs from other forecasting models in Canada where one might see forecasts jump around relatively drastically from poll release to poll release. Unfortunately, this leads many commentators in Canada to speculate that large changes are occurring in the electorate for various reasons related to the campaign even if no substantial changes are occurring in reality. An added benefit is that a new forecast can be released for each day of the campaign, even if no new poll is released, and we are able to estimate vote intention for each day of the campaign.

To estimate regional-level vote share, we run a separate model with the same basic structure as the national model. We then adjust the regional vote share results proportionally such that they match estimates from the national-level forecast. For the riding level, furthermore, we use the vote share achieved by each party during the 2019 Canadian federal election in each riding, adjusting these proportions proportionally to match the estimated regional (and national) vote share forecast. It is worth noting that the degree of uncertainty with respect to these projections is highest at the individual riding level. As riding-level predictions are derived from national and regional vote share estimates and vote share in the 2019 federal election rather than local polling data (which do not exist in sufficient numbers), these should be interpreted with due caution.