# The Conversation: Use machine learning to predict March Madness upsets in your bracket

This post was originally published on this site

“Beware the Ides of March.” Yes, it’s finally that time of year again: when the emperors of college basketball must watch their backs, lest the lowly bottom seeds of the tournament strike.

Before March 15, millions around the world will fill out their March Madness brackets. In 2017, ESPN received a record 18.8 million brackets.

The first step to a perfect bracket is correctly choosing the first round. Unfortunately, most of us can’t predict the future. Last year, only 164 of the submitted brackets were perfect through the first round – less than 0.001%.

Many brackets are busted when a lower-seeded team upsets the favored higher seed. Since the field expanded to 64 teams in 1985, at least eight upsets occur on average each year. If you want to win your bracket pool, you better pick at least a few upsets.

We’re two math Ph.D. candidates at the Ohio State University who have a passion for data science and basketball. This year, we decided it would be fun to build a computer program that uses a mathematical approach to predict first-round upsets. If we’re right, a bracket picked using our program should perform better through the first round than the average bracket.

###### Fallible humans

It’s not easy to identify which of the first-round games will result in an upset.

Say you have to decide between the No. 10 seed and the No. 7 seed. The No. 10 seed has pulled off upsets in its past three tournament appearances, once even making the Final Four. The No. 7 seed is a team that’s received little to no national coverage; the casual fan has probably never heard of them. Which would you choose?

If you chose the No. 10 seed in 2017, you would have gone with Virginia Commonwealth University over Saint Mary’s of California — and you would have been wrong. Thanks to a decision-making fallacy called recency bias, humans can be tricked into to using their most recent observations to make a decision.

Recency bias is just one type of bias that can infiltrate someone’s picking process, but there are many others. Maybe you’re biased toward your home team, or maybe you identify with a player and desperately want him or her to succeed. All of this influences your bracket in a potentially negative way. Even seasoned professionals fall into these traps.

###### Modeling upsets

Machine learning can defend against these pitfalls.

In machine learning, statisticians, mathematicians and computer scientists train a machine to make predictions by letting it “learn” from past data. This approach has been used in many diverse fields, including marketing, medicine and sports.

Machine learning techniques can be likened to a black box. First, you feed the algorithm past data, essentially setting the dials on the black box. Once the settings are calibrated, the algorithm can read in new data, compare it to past data and then spit out its predictions.

A black box view of machine learning algorithms. Matthew Osborne, CC BY-SA

In machine learning, there are a variety of black boxes available. For our March Madness project, the ones we wanted are known as classification algorithms. These help us determine whether or not a game should be classified as an upset, either by providing the probability of an upset or by explicitly classifying a game as one.

Our program uses a number of popular classification algorithms, including logistic regression, random forest models and k-nearest neighbors. Each method is like a different “brand” of the same machine; they work as differently under the hood as Fords and Toyotas, but perform the same classification job. Each algorithm, or box, has its own predictions about the probability of an upset.

We used the statistics of all 2001 to 2017 first-round teams to set the dials on our black boxes. When we tested one of our algorithms with the 2017 first-round data, it had about a 75% success rate. This gives us confidence that analyzing past data, rather than just trusting our gut, can lead to more accurate predictions of upsets, and thus better overall brackets.

What advantages do these boxes have over human intuition? For one, the machines can identify patterns in all of the 2001-2017 data in a matter of seconds. What’s more, since the machines rely only on data, they may be less likely to fall for human psychological biases.

That’s not to say that machine learning will give us perfect brackets. Even though the box bypasses human bias, it’s not immune to error. Results depend on past data. For example, if a No. 1 seed were to lose in the first round, our model wouldn’t likely predict it, because that has never happened before.

Additionally, machine learning algorithms work best with thousands or even millions of examples. Only 544 first-round March Madness games have been played since 2001, so our algorithms won’t correctly call every upset. Echoing basketball expert Jalen Rose, our output should be used as a tool in conjunction with your expert knowledge — and luck! — to choose the correct games.

We’re not the first people to apply machine learning to March Madness and we won’t be the last. In fact, machine learning techniques may soon be necessary to make your bracket competitive.

You don’t need a degree in mathematics to use machine learning — although it helps us. Soon, machine learning may be more accessible than ever. Those interested can take a look at our models online. Feel free to explore our algorithms and even come up with a better approach yourself.

Matthew Osborne and Kevin Nowland are Ph.D. candidates in mathematics at The Ohio State University, the No. 5 seed in the West regional of the men’s college basketball tournament. This was first published on The Conversation — “This March Madness, we’re using machine learning to predict upsets.”

Be Sociable, Share!

## Related Posts

MarketTamer is not an investment advisor and is not registered with the U.S. Securities and Exchange Commission or the Financial Industry Regulatory Authority. Further, owners, employees, agents or representatives of MarketTamer are not acting as investment advisors and might not be registered with the U.S. Securities and Exchange Commission or the Financial Industry Regulatory.

This company makes no representations or warranties concerning the products, practices or procedures of any company or entity mentioned or recommended in this email, and makes no representations or warranties concerning said company or entity’s compliance with applicable laws and regulations, including, but not limited to, regulations promulgated by the SEC or the CFTC. The sender of this email may receive a portion of the proceeds from the sale of any products or services offered by a company or entity mentioned or recommended in this email. The recipient of this email assumes responsibility for conducting its own due diligence on the aforementioned company or entity and assumes full responsibility, and releases the sender from liability, for any purchase or order made from any company or entity mentioned or recommended in this email.

The content on any of MarketTamer websites, products or communication is for educational purposes only. Nothing in its products, services, or communications shall be construed as a solicitation and/or recommendation to buy or sell a security. Trading stocks, options and other securities involves risk. The risk of loss in trading securities can be substantial. The risk involved with trading stocks, options and other securities is not suitable for all investors. Prior to buying or selling an option, an investor must evaluate his/her own personal financial situation and consider all relevant risk factors. See: Characteristics and Risks of Standardized Options. The www.MarketTamer.com educational training program and software services are provided to improve financial understanding.

The information presented in this site is not intended to be used as the sole basis of any investment decisions, nor should it be construed as advice designed to meet the investment needs of any particular investor. Nothing in our research constitutes legal, accounting or tax advice or individually tailored investment advice. Our research is prepared for general circulation and has been prepared without regard to the individual financial circumstances and objectives of persons who receive or obtain access to it. Our research is based on sources that we believe to be reliable. However, we do not make any representation or warranty, expressed or implied, as to the accuracy of our research, the completeness, or correctness or make any guarantee or other promise as to any results that may be obtained from using our research. To the maximum extent permitted by law, neither we, any of our affiliates, nor any other person, shall have any liability whatsoever to any person for any loss or expense, whether direct, indirect, consequential, incidental or otherwise, arising from or relating in any way to any use of or reliance on our research or the information contained therein. Some discussions contain forward looking statements which are based on current expectations and differences can be expected. All of our research, including the estimates, opinions and information contained therein, reflects our judgment as of the publication or other dissemination date of the research and is subject to change without notice. Further, we expressly disclaim any responsibility to update such research. Investing involves substantial risk. Past performance is not a guarantee of future results, and a loss of original capital may occur. No one receiving or accessing our research should make any investment decision without first consulting his or her own personal financial advisor and conducting his or her own research and due diligence, including carefully reviewing any applicable prospectuses, press releases, reports and other public filings of the issuer of any securities being considered. None of the information presented should be construed as an offer to sell or buy any particular security. As always, use your best judgment when investing.