Using In-Game Player Behavior to Measure the Impact of a New Release

 

Introduction

Digital sales of video games will amount to $7.8 billion this year, with about half of that number coming from digital content downloads and microtransactions. What that means is, gaming companies can't just release a game and wait for the profit to roll in — gamers expect new content, and they expect it regularly. 

In fact, a large amount of additional content revenue (39%) is generated in the first three to six months after a game's initial release. To understand how that additional content is performing, a gaming company needs to be able to track the behavior of players in the game pre- and post-releases.

In this post, I will examine the impact of a World of Warcraft (WoW) game release from November 13, 2008 — Wrath of the Lich King (WoLK) — on the number of daily active users (DAUs) to understand if the players were more engaged after the release. Ultimately, I want to compute the number of new DAUs that can be attributed to the release itself.

About the Dataset

To perform this analysis, I used the Kaggle WoW Avatar History dataset. This dataset contains records of approximately 38,000 avatars collected in 2008. It includes information about character level, race, class, location, and social guild. Using this dataset, I can look at individual sessions when each avatar was active as well as his or her level and location in the game. It is important to note, however, that this dataset was collected from only one WoW server. Therefore, it represents a small subset of the WoW player base.

More importantly, this dataset overlaps with the WoLK release. That means I have DAU data before and after the release (1.5 months) to help me understand the impact it had on player engagement.

In the cell below, I plotted a time series of DAUs in 2008. The vertical bar represents the release of WoLK. You can see immediately that if the release had any effect on that particular server data, it was a relatively small one. There's no major uptick in the DAU count data post-release.


    wow_df = pd.read_csv('data/wowah_data.csv')
wow_df.rename(columns={' race':'race', ' zone':'zone', 
                       ' level':'level', 
                       ' charclass':'charclass',
                       ' guild':'guild',
                       ' timestamp':'timestamp'},inplace=True)

wow_df['timestamp'] = pd.to_datetime(wow_df['timestamp'], format='%m/%d/%y %H:%M:%S')
wow_df['date'] = wow_df['timestamp'].apply(lambda x: x.date())
wow_df['level_bin'] = wow_df['level'].apply(lambda x : str("level {} - {}".format(x//10*10 + 1, (x//10+1)*10)))

# Lets get the total number of users per date : 
dau = wow_df.groupby(['date','char'],as_index=False).first()
dau = dau.groupby(['date']).size()
    

# Let's plot the DAUs time series for 2008. 

dau.plot(figsize=(12,8), title="World of Warcraft Kaggle Dataset DAU Counts")
plt.axvline(x=pd.to_datetime('2008-11-13',format="%Y-%m-%d").date(), color='red', linestyle='--')
plt.xlabel("Date")
plt.ylabel("Daily Active Users Counts (DAUs)")
plt.plot()
   

You can immediately see that there is a strong seven-day seasonality. There may be a hint of a yearly seasonality pattern as well, but I don't have a long enough baseline to evaluate it.

Methodology: Bayesian Structural Time Series

Now, I'll dive into the model I used to measure the impact of the WoLK release on user engagement. First of all, user engagement is represented here by the number of DAUs. A user/player is defined as active on a given day if at least one record of his or her avatar is found in the dataset for that particular day.

I used Bayesian structural time series (BSTS) models to infer what would happen to a time series signal (in this case, a DAU) in the absence of a treatment (the WoKL release that occurred on November 13, 2008). The difference between the measured DAUs post release and the model-inferred no-treatment case is the impact of WoLK on DAUs, or the causal effect. Causal effect is often used to estimate the impact of marketing campaigns on a key performance metric, for example.

I used the R package CausalImpact and a method developed by Kay H. Brodersen and collaborators at Google, which you can read more about here. I'll dive into this methodology in an upcoming post.

In a nutshell, the causal impact of a treatment (i.e., the WoLK release) is the difference between the observed value of the response (i.e., DAU time series) and the unobserved value that would have been obtained under the alternative treatment — in the absence of the new game release. CausalImpact performs posterior inference on the counterfactual model.

The main idea here is to use the time-series behavior of the DAU itself prior to the WoLK release to infer what the series will do post release. Moreover, a benefit of this technique is that you can use unrelated, control time series that are predictive of DAUs prior to the WoLK release to infer what the DAU time series would be in a counterfactual scenario.

Another strong assumption being made here is that the relationship between a DAU and the control series remains the same after the release. For these particular series, I used traffic data associated with Wikipedia articles about sports, celebrities, politics, etc. These series should not have been affected by the treatment because they are completely unrelated to WoLK.


The figure above is from Brodersen's research and the panels illustrate causal impact. Let's suppose you are interested in the time series represented by the black line in Panel A. On 2014-01, an intervention happened — in this case, a new product release. You can see that the time series increased quite a bit after that event. The question is, how much of that increase can be attributed to the release?

The time series in red and green, which are data from other markets, are unaffected by the release. These two series are used to model the black line in the pre-intervention/treatment period. The model is represented by the blue line in all three panels. The idea is to use the model built with these series in the pre-intervention era and apply it in the post-intervention period. This gives you the counterfactual. The difference between observed data (in black) and counterfactual (in blue) gives you the impact (Panel B). The cumulative impact is shown in Panel C.

The assumption being made here is that the control time series (also known as the predictor time series) are not affected by the intervention event. In this case, I'm confident that our time series are not affected by the event, since I picked Wikipedia traffic data from articles that are not even remotely related to WoW.

You can think of the pre-intervention DAU time series as a combination of these predictor time series. In the post-intervention time period, these series are used to model a baseline of what would have happened to the series without the intervention (WolK release). The difference between the observed DAU in the post-intervention period and the baseline model can be attributed to the intervention.

Time Series Predictors

I used Wikipedia traffic data associated with the top 1,000 articles by traffic volume in 2008 as my predictor time series. In total, I used 69 time series to model the counterfactual, all of which are unaffected by the intervention. (If I were performing this analysis using contemporary data, I could use the DataScience.com Platform job scheduler to schedule daily data updates on predictor series and DAUs counts.)

In the cell below, you can see the Python script I developed to scrape Wikipedia traffic data:


    def format_grok_data(data, token): 
    """ Formating the results from the wikipedia data scrape. 
    
    Parameters
    ----------
    
      data: dataframe 
        time series data scrapped from wikipedia 
        
      token: string 
        article name associated with the time series. 
    
    Return
    ------
    
      formatted dataframe
    """
    # Believe it or not, there are bad entries with these dates : 
    bad_dates = ['2008-02-30', '2008-02-31', '2008-04-31', 
                 '2008-06-31', '2008-09-31', '2008-11-31']
    tmp = pd.DataFrame(data['daily_views'].items(),index=data['daily_views'].keys())
    tmp.drop(0,axis=1,inplace=True)
    tmp.rename(columns={1:token}, inplace=True)
    for x in bad_dates:
        tmp.drop(x, inplace=True, errors='ignore')
    tmp.index = pd.to_datetime(tmp.index).date
    
    # When data are missing, 
    # impute token with mean of the month. Better ways to do this but 
    # let's just start with that. 
    tmp = tmp.replace(to_replace=0,value=tmp[token].mean())
    tmp.sort_index(inplace=True)
    return tmp


def pull_token_ts(tokens):
    """ For each element in tokens list, this function will pull 
    pull the daily time series from wikipedia. 
    
    Parameters
    ----------
        tokens (list):
          list of wikipedia articles you want to get a 
          time series of traffic for 2008
    
    Return
    ------
        dataframe:
          dataframe with the page views for all tokens. column 
          name corresponds to the token. 
    """
    
    # All months of 2008 : 
    months = ['200801','200802','200803','200804','200805','200806',
          '200807','200808','200809','200810','200811','200812']
    
    # all_df : contains dataframe of all tokens : 
    all_df = pd.DataFrame() 
    for t in tokens : 
        print('Doing token {}'.format(t))
        t_df = pd.DataFrame()
        for m in months : 
            url = "http://stats.grok.se/json/en/{}/{}".format(m,t)
            print(url)
            jsonurl = urlopen(url)
            data = json.loads(jsonurl.read())
            # formating the output: 
            df = format_grok_data(data, t)
            t_df = t_df.append(df)
        if all_df.empty: 
            all_df = t_df
        else:
            all_df = all_df.join(t_df, how='left')
    
    return all_df
 

Here, I've provided an example of how I pulled traffic time series of articles that were among the most popular of 2008:


     # see : https://en.wikipedia.org/wiki/Wikipedia:Most_read_articles_in_2008 for help 

tokens1 = ['Baseball', 'Hockey', 'Football','John_Kerry', 
          'Barack Obama', 'Michael_Jackson', 'The_Big_Bang_Theory', 'United_States', 
          'MySpace', 'Miley_Cyrus', 'Japan', 'Lost', '2008_Summer_Olympics', 'YouTube',
          'Locus', 'India', 'New_York_City', 'Bigfoot' ]

tokens1_df = pull_token_ts(tokens1)
     

I stored the time series to disk. In the cell below, I am loading these predictor series.


    all_df = pd.read_csv('time_series_data/counterfactuals.csv', 
                         index_col=0)
all_df.index = pd.to_datetime(all_df.index)
    

Each column shows the number of visits for each Wikipedia article. We show a sample of the data in the table below. 

 

date Baseball Hockey Football John_Kerry Barack.Obama    
2008-01-01 1827 985 2125 1261 12883    
2008-01-02 2774 1127 3203 2080 21781    
2008-01-03 3706 1070 2999 2645 35879    
2008-01-04 2866 1139 2721 4429 202338    
2008-01-05 2382 1026 2438 2453 97396    

Running CausalImpact

I'm now ready to run CausalImpact on the dataset. You will notice that I loaded the Jupyter extension load_ext rpy2.ipython. This allows me to switch to R cells by calling the %%R magic command on the first line of R cells.


    
%%R 

options(warn=-1)
library(plyr)
library(CausalImpact)
library(jsonlite)
library(quantmod)
library(zoo)
library(ggplot2)
    

First, I loaded the DAU daily traffic data along with the predictors series into R data frames.


    %%R  

# Read the predictor series from disk: 
predictors <- read.csv('time_series_data/counterfactuals.csv', header=TRUE)
names(predictors)[1] <- paste('date')

# Read the DAU data from disk: 
dau <- read.csv('time_series_data/dau.csv', header=TRUE)
    

Then I performed some manipulations on the data to transform it in a format that CausalImpact will take. 


%%R
# In this cell we'll merge DAU data with predictors # and create a "ci_zoo" object that we'll pass to CausalImpact all <- data.frame() all <- dau all <- merge(all, predictors, by.x='date', by.y='date', all.x=TRUE, all.y=TRUE) # Remove all NA's from the dataframe : no_na <- rowSums(is.na(all))==0 ci <- all[no_na, ] # Remove the date column (first column) ci <- ci[,-1] # Convert to zoo ci_zoo <- zoo(ci)

In the cell below, I set the parameters for the package CausalImpact. The important parameters to set up are the start and end dates of the intervention. I used row ids instead of dates to cut the ci_zoo array into pre-intervention and post-intervention datasets.

I also set optional parameters for niter and nseasons:

  • niter: Number of MCMC samples to draw. 1,000 is the default value.
  • nseasons: period of the seasonal component (7 in this case).

I also varied the parameter prior.level.sd within reasonable bounds [0.1, 0.001]. I found it didn't change the conclusions of my analysis.


    %%R 
# Causal Impact start/end date: 
data_start_date <- '2008-01-01'
# Here we only have a little over 1.5 month of post-intervention data. 
intervention_start_date <- '2008-11-13'
intervention_end_date <- '2008-12-31'

# Convert to row IDs: 
intervention_start_id <- which(dau$date == intervention_start_date)
intervention_end_id <- which(dau$date == intervention_end_date)

model_args <- list(niter=1000, nseasons=7)
   

In the cell below, I ran the causal impact model to generate three plots:

  • Original chart (top): The data are shown in black and the model is indicated by the dashed blue line. The vertical dashed line represents the moment in time when the intervention started.
  • Pointwise chart (middle): In the middle chart, the residuals (data - model) are displayed.
  • Cumulative chart (bottom): In the bottom chart, the cumulative effect is displayed.

In this case, the dashed blue line is indicating a net positive effect on DAU. However, the shaded areas, which represent 95% confidence intervals, encompass the 0 line indicating that statistically, the null hypothesis (no effect on DAU) can't be ruled out at a statistically significant level.


    %%R 

# Let's run the CausalImpact model on the DAU and predictor data : 

pre.period<-c(1,intervention_start_id-1)
post.period<-c(intervention_start_id,intervention_end_id)
impact2 <- CausalImpact(ci_zoo, pre.period, post.period, 
                        model.args = model_args)
plot(impact2)
    

 

I printed a summary of the results in the cell below. You can see that the probability of impact (82%) is not significant enough from a statistical standpoint. In this case, it's hard to rule out the null hypothesis. You can also see that the absolute effect crosses the value of 0, which implies that the null result (no impact) can't effectively be ruled out with enough certainty.

    %%R
# Print the summary of the analysis below 

summary(impact2)
 


Posterior inference {CausalImpact}

                         Average        Cumulative    
Actual                   1880           52634         
Prediction (s.d.)        1827 (58)      51170 (1619)  
95% CI                   [1715, 1944]   [48030, 54440]
                                                      
Absolute effect (s.d.)   52 (58)        1464 (1619)   
95% CI                   [-65, 164]     [-1806, 4604] 
                                                      
Relative effect (s.d.)   2.9% (3.2%)    2.9% (3.2%)   
95% CI                   [-3.5%, 9%]    [-3.5%, 9%]   

Posterior tail-area probability p:   0.1798
Posterior prob. of a causal effect:  82%

For more details, type: summary(impact, "report")

I also performed other analyzes over different time periods in the pre-intervention period. I set the treatment period to be one to two months and cut the data after that intervention period. The purpose of these analyses was to make sure that my methodology mitigates false positives. In other words, I selected periods over which no intervention occurred and CausalImpact confirmed that no intervention occured (at a statistically significant level).

This methodology does not, however, allow me to evaluate how false negatives may plague the analysis. A set of simulations could potentially help me understand false negative rates better.

In Conclusion

Hopefully, you now better understand how a game release (treatment) can affect user engagement as measured by the daily active users counts (DAUs).

Using BSTS techniques in CausalImpact, I was able to infer the posterior distribution of a counterfactual time series. What I found was, there is no statistically significant evidence that the release of WoLK had an impact on DAUs. However, it is worth noting two major limitations of the data I used:

  • The data have a very short time period pre- and post-intervention. There are likely cyclical patterns in the number of DAUs over a year period. I was unable to capture those seasonality effects.
  • I used data from a single game server. The data could represent a biased sample of the player population.