It’s an exciting time to be a data professional right now, especially if you’re at a nimble tech startup where you can see the demand growing day-by-day. The variety of business problems, the curiosity demonstrated by stakeholders, and the advent of newer frameworks/technologies/algorithmic improvements continue to raise the bar of what we can expect in the field.
Compared to most other traditional sciences, data science is incredibly business-ready and impact-driven. However, the process of data science itself is quite unevenly distributed across businesses based on factors such as location, nature of the business, complexity of problems, financial standing, drive for innovation, internal hunger for data, etc.
Large tech companies have fairly large data teams and are a lot more mature in terms of data science absorption, scale, complexity, and scope of problems solved by their data teams. Although these teams continue to do very inspiring work and contribute to research and technologies, the target audience for this overview are stakeholders from tech startups/tech enabled SMBs that are relatively new to adopting this field.
What follows are the four steps we took in building our data team from scratch at Instapage, which you may find helpful as you’re starting off.
Step 1: Defining Data Workflow
Instapage helps its customers create great post-click experiences in the digital advertising landscape with its conversion-optimized and fully integrated landing page platform. I joined Instapage in August 2016 as their first data hire and what was most intriguing and promising about the position was the opportunity to build the data strategy from scratch and scale it across the company. From the start, our management team understood that for the data team to be extremely successful we would require more than what any single person could offer.
The problem, though, was that there was just too much work and not enough full-time data minds on staff. So the first thing that we did was partner with Marketing Operations, Engineering, and Architecture teams to develop a data workflow and set expectations for our stakeholders across Marketing, Product, Customer Experience, Finance, and Sales departments. The initial meetings helped us understand the nature of business problems and objectives that teams are currently tackling, the resources we had available, and the deliverables (Reports, Dashboards, Widgets, Snapshots, Presentations, Documents, etc).
This flow-chart represents our Stakeholders-Partners Data Workflow:
Establishing this workflow helped our business teams speed up decision-making, with the data team playing a pivotal role in informing & optimizing decisions with facts and quantitative insights.
Step 2: Software Orchestration and Architecture Setup
To date, our strategy has been to understand and optimize the current software in place, set up new software to scale problem solving and data access, and build out data analytics architecture—all while making sure there’s room to scale the machine learning architecture as business needs arise.
The bulk of our data architecture is a combination of Google Cloud and self-hosted services. We use Segment as our data mediation layer and event pipeline that collects and tracks data from our website/application/cloud sources and sends it to our warehouse, marketing automation, sales, and support stacks.
As a whole, these are the tools our team uses to accomplish a variety of needs:
- Analytics for data team: A combination of Redshift, S3, and self-hosted Postgres
- Self-service analytics at scale for business teams: Heap, Google Analytics, Amplitude, and in-house built Application Admin UI
- Enterprise-wide data visualization: Tableau, ShinyApps, Data Studio, Heap, ChartMogul
- Statistical analysis: R, Python, Bash, and SQL
- Experimentation needs for marketing: A combination of Google Optimize, Optimizely
On top of using these tools, we realized good efficiencies in using ETL tools such as Stitch, Heap SQL, and Segment that sync data to our warehouses. While a solid quantitative stack is paramount to our success, we are also immensely proud of our qualitative and market research efforts.
At a high level, this is how our architecture setup functions together:
What you see above is a 1K foot overview of our data analytics stack and a simplified version at that. The actual stack is incredibly more complex with hundreds of softwares/servers working as an ecosystem.
Of course, the Architecture and Engineering teams have played a significant role in helping us with data unification and setting up data analytics. Pawel Wiszowaty, VP of Engineering at Instapage, describes the transformation as the following:
“It has been very interesting to have this different perspective on the data that we're storing and processing. Instead of thinking about scalability and response times, we had to bend our minds to make data more accessible through BI tools without sacrificing anything in the process. Extracting and transforming data for BI purposes became a project of its own.
Also, sometimes the data that we had just wasn’t enough. In these cases, we worked closely with the BI team to establish a new set of events that we could track and push to Segment for further processing.”
By doing the above, we can successfully collect all the data relevant to our business and products. This data is a one stop shop that helps inform our business decisions at all levels across each and every project.
Step 3: Data Mediation, Event Tracking & QA
While having an established architecture is great, what is even more important is ensuring that your company is accurately tracking and collecting important events and properties. As the maxim goes, “Right data is better than no data, which is better than bad data.” To ensure our success here, we follow a three-pronged approach:
- Doing QA on the data tracking and warehousing side to quantify data reliability
- Maximizing data sharing between multiple cloud apps to improve insights and actionability for our Support, Sales, and Marketing teams
- Defining new events collaborating with Product Managers, Marketing Managers, etc.
We store all of this data in NoSQL databases managed by architecture teams and this data gets securely passed on to our data mediation layer (Segment) for marketing automation and product analytics purposes. Data encryption and anonymisation checks ensures that no user identifiers get exposed to third party softwares.
Head of Marketing Operations, Stefano Mazzalai, has played an instrumental role in this effort. He states,
“Properly QAing all events and properties has played a pivotal role at Instapage in making sure that every team can trust their data and make confident decisions, as well as ensuring proper personalization and data privacy as frameworks such as GDPR becomes effective.”
With all this in place, we are able to securely send our data to multiple applications. This ensures that multiple teams have access to the data they need and that the data is reliable as a whole. This ultimately enables data democratization and autonomous decision-making for multiple business teams.
Step 4: Winning with Statistical Data Analysis
This is the part where the data team is finally able to prove that all the investment, research, implementation, and infrastructure efforts genuinely pays off. From a financial perspective this is of utmost importance, which is why we do extensive analysis and visualization in order for more teams to benefit from the data we are collecting.
Some of the problems we can provide a solution to on our own are:
- Marketing attribution
- Retention modeling
- Product analytics and impact quantification
- General profitability problems
- Enterprise clients analytics
Many teams have also partnered with us to optimize other areas of business, such as:
- Experimentation efforts
- Conversion analytics
- Product pricing (retaining self-serve customers while kicking off Enterprise Sales)
- Financing efforts
- Advertising optimization
- Payments success analytics
In addition, our data team is also responsible for enabling data access, and offering quantitative training to enable self-serve analytics internally at scale. In doing so, we are able to speed up autonomous decision-making amongst stakeholders.
Statistical Analysis, Data Mining and Data Visualization has helped uncover patterns that have been used to alter sign-up periods, change payment processing, change product pricing, and generate novel takeaways for our audience.
The Road Ahead: Synergy of Professionals and Softwares
Today, my team actively works with ETL, querying & visualization, analytics tools, and Cloud to establish efficiency. Our strategy continues to be utilizing softwares for less-differentiated, heavy-lifting tasks so that professionals can spend their time on novel, high-value business problems. It’s a win-win from a finance perspective as well as quality of work.
Building a data team has helped our firm substantially. Stakeholders are able to improve and validate their decision-making much faster and effortlessly by collaborating with us. Our past successes have convinced us that continuing to expand the team will open up even greater opportunities for our industry down the road.