In today’s enterprise landscape, the relationship between IT, data scientists, and data engineers at many organizations is largely dysfunctional. Why? There are many reasons, but one of the primary contributing factors is that data scientists rely too much on IT and data engineers to get the tools or environments they need and put work into production, respectively. This creates a domino effect: Data science work slows down, it takes longer to deploy that work, and ROI ultimately suffers.
The solution? Empowering data scientists with the tools and resources they need to be self-sufficient, which in turn reduces burden on IT and data engineers. Let’s look at two important ways in which a data science platform can help.
As enterprises expand the size of their data science teams, it falls on IT to support those data scientists effectively. As it is right now, many IT teams are struggling to simultaneously standardize and scale data analysis, which, as we discuss further in our recent white paper, can commonly lead to two different scenarios.
In the first scenario, IT provisions remote machines for data scientists to work on, adding the tools, packages, and dependencies they need — a process that can be both time consuming and complex.
In this setup, data scientists often don’t have the credentials needed to customize environments themselves; instead, they must submit a request and wait for IT to make changes, which can hold up work for days or weeks. Alternatively, in the second scenario, data scientists work on individual machines and monitor tools and resources themselves. While this approach provides data scientists with the flexibility to install the tools they need for a particular project, it lacks the potential for scalability. Code written in a data scientist’s personal environment might not run in an alternate environment with different tools or packages; this could become a serious problem if the code is implemented elsewhere in the organization.
When it comes to launching environments, the threefold goal of empowering data scientists, reducing burden on IT, and standardizing data analysis for scalability may seem like a tall order, but it doesn’t have to be.
In the third scenario, which is far more effective than the previous two, IT uses Docker, or another containerization technology, to set up base environments with the packages, languages, and tools data scientists need. Data scientists can launch environments as needed from the base or template that IT sets up. From the IT perspective, containers reduce tool sprawl and time spent maintaining environments. For data scientists, the ability to launch preconfigured environments makes it much easier to run self-serve analyses. As a result, data science work is standardized and completed faster, and everyone is happier in the process.
When a data scientist hands over his or her data model to be put into production, the engineering team has to go through many steps before the model is ready to be deployed. This includes refactoring the model code and rewriting it into a production stack language, among other initiatives.
Currently, at many companies, a problematic pattern is playing out: A data scientist hands over his or her model to a data engineer, and then grows frustrated when it is not put quickly into production. In return, a data engineer might grow frustrated by a data scientist’s inefficient code and unrealistic expectations.
This kind of climate quickly leads to resentment, which Stitch Fix’s Vice President of Data Platform Jeff Magnusson elaborates on in this article. In a culture where data scientists are often viewed, to quote Magnusson, as the “thinkers” and the data engineers as the “doers,” an unfair dynamic is established. Data engineers assume sole responsibility for implementation and receive the brunt of the blame if an initiative is not successful. Meanwhile, data scientists receive the credit if a data science project goes well. This, Magnusson writes, “is at the heart of the contention and misalignment between the teams.”
However, companies are starting to overcome this issue by giving data scientists the ability to deploy models behind REST APIs, a feature of our data science platform. The engineering team can then take the API code and integrate it anywhere, without rewriting it. This increases the data scientist’s autonomy, reduces burden on data engineers, and expedites the process of extracting value from data-driven insights.
Why Self-Sufficiency is Important
With just 22% of companies getting their expected ROI from data science work, now is the time to make adjustments that improve collaboration between teams and drive a faster time to value on data science projects. While there are many approaches to standardizing, managing, and deploying data science work, a data science platform is the only solution that brings together tools that address every step of the process.
Ultimately, using a data science platform empowers data scientists with the autonomy to launch environments and deploy models without deferring to IT and data engineers, improving working relationships and ensuring that data-driven insights can be implemented sooner. Want to see what a data science platform can do for your company? Request a demo.
Like this article? Be sure to check out: Q&A: How to Use DevOps for Data Science.