Slots, Simulations, and Statistics with Selenium
Tags: gambling, python, selenium, web scraping,
Categories: Data Science,
As you may have seen in a previous post, I enjoy playing slot machines! They are a losing game for sure, but I admire their complexity and entertainment value. Slots come in a wide variety of styles. Some are simple three-reel machines that only allow wins on a single payline. Others have 720 ways to win. But all the slots result in a win for the house.
A key term to remember is the RTP – or the “return to player”. It is the percentage of the amount of money wagered that one would expect to receive back after playing an infinite number of games. Of course, nobody has an infinite bankroll or an infinite amount of time, so this number is theoretical. But it means that after millions of gamblers cycle through and play a given machine over many years, the house profit should be approximately 100% minus the RTP. Online slots typically have a return in the low-mid 90s. Land-based casinos are more expensive to maintain, so their returns tend to be lower.
While slots are entirely based on luck, the random number generators that determine the outcome of each game are extremely complicated. Some online casinos rely entirely on a pseudo-random number generator, or a pRNG. This means an algorithm generates numbers that appear random but could be predicted if the parameters of the algorithm were known. There are also some pRNG/true RNG hybrids like Fortuna that accept inputs from analog sources (the “entropy accumulator”) as parameters to the pRNG. Either way, it is extremely important for online establishments to do all their calculations server-side.
As a gambler, I am not so interested in the exact nature of the RNG. What I am interested in is the probability distribution of the returns. Two games that have the same RTP may return that money in very different ways. Some games frequently return small amounts of money, while others yield large but infrequent wins. These are termed low volatility and high volatility games, respectively. So the volatility, or variance of a game, is important for determining our playing style:
- Low volatility games: if you have a small bankroll, or want to play for as long as possible
- High volatility games: if you’re looking for a big win and have the bankroll to wait for it
- Medium volatility games: somewhere in between
The only way for us to get a sense of the volatility of a game is to play it. And we’d have to play many games to get a decent estimate of the variance. Luckily for us, there are demo versions of games on websites like Vegas Slots Online that allow us to play for free. My objective was to code something up that could play many many games and record all the results. Unfortunately, a lot of these demos are served in Flash, which is difficult to navigate in a browser. But I discovered that many older games by IGT are easy to scrape using a tool like Selenium. Thus, Slotenium was born!
Slotenium is a tool I developed to repeatedly play slot games, record the results, and save them in a CSV file. Over many games, I could estimate the RTP and the variance of the returns, and get a sense of the shape of the probability distribution.
Siberian Storm by IGT is a fun game that I like to play whenever I’m in Atlantic City. According to VegasSlotsOnline, the RTP is about 96%. I used Slotenium to play the game 11,718 times over several sessions and stored the results in a CSV file. Then I binned the results into win categories, expressed as a range of multiples of the wager, and plotted the PDF:
Unsurprisingly, the probability of winning decreases as the size of the win increases. The probability of losing entirely is 63.6%. This is comparable another game by IGT, the 20-line Cleopatra slot machine, which lost 64.2% of the time over 10 million rounds of simulations (kudos to Casino Guru for their analysis!).
I calculated the average RTP for my simulations as 96.64% – pretty close to the true RTP of 96%! The plot below shows how the calculated RTP evolves as I added more simulations.
As I mentioned earlier, two games with the same average RTP can have very different probability distributions. One way to capture these differences is to break down the average RTP into a product of two measures: the average RTP conditioned on winning (RTPW); and the probability of winning. Low volatility games shift more weight onto the probability of winning, while high volatility games shift more weight onto the RTPW.
For Siberian Storm, the average RTPW is about 2.66 times the wager. We arrive at this measure by dividing the average RTP as a fraction (0.9664) by the probability of winning (0.364). This is considered a medium volatility game, which is also true of 20-line Cleopatra. Compare this to 1-line Cleopatra, which has an RTP of 95.02% but only an 11.3% chance of a hit. This corresponds to an RTPW of 8.4 times the wager – a highly volatile game!
Coefficient of Variation (CV)
The RTPW is a useful measure but it is incomplete, as it does not describe the entire shape of the probability distribution. Another measure of volatility (and the more common one) is the variance of the return, or the average squared deviation of the observations from the mean. If we take the square root of the variance and divide it by the mean RTP, we get the coefficient of variation (CV), which is (in my opinion) the most useful measure of volatility.
Siberian Storm has a CV of about 6.12. Using the probability distribution from Casino Guru, I get a CV for 20-line Cleopatra of about 5.58 – similar to that of Siberian Storm. 1-line Cleopatra has a much higher CV of 14.64, confirming that this game is highly volatile.
You can check out the Slotenium project on Github and try it for yourself! For now, it only works on a limited selection of games by IGT. In a future version, I hope to increase the game selection and even expand to other gaming companies (Ballys, Aristocrat, Konami, etc.).