I'm now ready to run CausalImpact on the dataset. You will notice that I loaded the Jupyter extension load_ext rpy2.ipython. This allows me to switch to R cells by calling the %%R magic command on the first line of R cells.

```
%%R
options(warn=-1)
library(plyr)
library(CausalImpact)
library(jsonlite)
library(quantmod)
library(zoo)
library(ggplot2)
```

First, I loaded the DAU daily traffic data along with the predictors series into R data frames.

```
%%R
# Read the predictor series from disk:
predictors <- read.csv('time_series_data/counterfactuals.csv', header=TRUE)
names(predictors)[1] <- paste('date')
# Read the DAU data from disk:
dau <- read.csv('time_series_data/dau.csv', header=TRUE)
```

Then I performed some manipulations on the data to transform it in a format that CausalImpact will take.

```
%%R
```

# In this cell we'll merge DAU data with predictors
# and create a "ci_zoo" object that we'll pass to CausalImpact
all <- data.frame()
all <- dau
all <- merge(all, predictors, by.x='date', by.y='date', all.x=TRUE, all.y=TRUE)
# Remove all NA's from the dataframe :
no_na <- rowSums(is.na(all))==0
ci <- all[no_na, ]
# Remove the date column (first column)
ci <- ci[,-1]
# Convert to zoo
ci_zoo <- zoo(ci)

In the cell below, I set the parameters for the package CausalImpact. The important parameters to set up are the start and end dates of the intervention. I used row ids instead of dates to cut the ci_zoo array into pre-intervention and post-intervention datasets.

I also set optional parameters for niter and nseasons:

- niter: Number of MCMC samples to draw. 1,000 is the default value.
- nseasons: period of the seasonal component (7 in this case).

I also varied the parameter prior.level.sd within reasonable bounds [0.1, 0.001]. I found it didn't change the conclusions of my analysis.

```
%%R
# Causal Impact start/end date:
data_start_date <- '2008-01-01'
# Here we only have a little over 1.5 month of post-intervention data.
intervention_start_date <- '2008-11-13'
intervention_end_date <- '2008-12-31'
# Convert to row IDs:
intervention_start_id <- which(dau$date == intervention_start_date)
intervention_end_id <- which(dau$date == intervention_end_date)
model_args <- list(niter=1000, nseasons=7)
```

In the cell below, I ran the causal impact model to generate three plots:

**Original chart (top):** The data are shown in black and the model is indicated by the dashed blue line. The vertical dashed line represents the moment in time when the intervention started.
**Pointwise chart (middle):** In the middle chart, the residuals (data - model) are displayed.
**Cumulative chart (bottom):** In the bottom chart, the cumulative effect is displayed.

In this case, the dashed blue line is indicating a net positive effect on DAU. However, the shaded areas, which represent 95% confidence intervals, encompass the 0 line indicating that statistically, the null hypothesis (no effect on DAU) can't be ruled out at a statistically significant level.

```
%%R
# Let's run the CausalImpact model on the DAU and predictor data :
pre.period<-c(1,intervention_start_id-1)
post.period<-c(intervention_start_id,intervention_end_id)
impact2 <- CausalImpact(ci_zoo, pre.period, post.period,
model.args = model_args)
plot(impact2)
```

I printed a summary of the results in the cell below. You can see that the probability of impact (82%) is not significant enough from a statistical standpoint. In this case, it's hard to rule out the null hypothesis. You can also see that the absolute effect crosses the value of 0, which implies that the null result (no impact) can't effectively be ruled out with enough certainty.

```
%%R
# Print the summary of the analysis below
summary(impact2)
```

Posterior inference {CausalImpact}
Average Cumulative
Actual 1880 52634
Prediction (s.d.) 1827 (58) 51170 (1619)
95% CI [1715, 1944] [48030, 54440]
Absolute effect (s.d.) 52 (58) 1464 (1619)
95% CI [-65, 164] [-1806, 4604]
Relative effect (s.d.) 2.9% (3.2%) 2.9% (3.2%)
95% CI [-3.5%, 9%] [-3.5%, 9%]
Posterior tail-area probability p: 0.1798
Posterior prob. of a causal effect: 82%
For more details, type: summary(impact, "report")

I also performed other analyzes over different time periods in the pre-intervention period. I set the treatment period to be one to two months and cut the data after that intervention period. The purpose of these analyses was to make sure that my methodology mitigates false positives. In other words, I selected periods over which no intervention occurred and CausalImpact confirmed that no intervention occured (at a statistically significant level).

This methodology does not, however, allow me to evaluate how false negatives may plague the analysis. A set of simulations could potentially help me understand false negative rates better.