Challenges With Data Science Adoption

Integrating cloud technology, AI/ML/data science & day-to-day business enablers of an organization into a streamlined value-delivery chain is essential to realize the full potential of resources we have at hand. IT leaders play a key role in making this happen and understanding the challenges will help address these.

This article was originally published on Medium: Challenges With Data Science Adoption by Rajesh Verma

A decade in the making

We are a couple of months away from 2021 and we are still in awe of data science. The ecosystem for all activities in and around data science is more than a decade in the making in one form or another, outside the powerhouses like Google, Microsoft, and Amazon.

Yet, sadly, the adoption and application of data science enabled features in the mainstream delivery pipeline is peripheral at best. The anecdotal use of data science serves well at conferences, blog posts, and thought-provoking intellectual gossip.

Data Science moved from labs straight into the wild, and it’s time we domesticate it and make it useful as a general tool as opposed to a niche.

Current State

Adoption of a new platform, technology, fashion, tool, or the like, needs a conducive environment to flourish. The data science and AI universe is no exception to this. Some of the key components for mass adoption (besides supply and demand) are:

  1. Talent Pool

  2. Ease of Use

  3. Accessibility

Talent Pool

As we all know (and if you don’t, wakeup and welcome to the 21st Century), data is the new oil and as the Harvard Business Review, 2012, said, the “Data Scientist” is the sexiest job of the 21st century.

For over a decade, a thriving industry has developed around building the manpower needed to adopt data science. To name a few: Self-Help courses, old-style physical textbooks, free code repositories, boot camps, Masters & Bachelors degree program, hundreds of articles, blog posts, free tutorials, etc.

Ease of Use

Data science algorithms have matured and have become easy to use. You no longer have to write the logic from scratch, and fairly high-level API’s are available to use, as long as you have an understanding of what tools to use.

Packages like Scikit-learn have become a de-facto standard for data science and machine learning.

Open-source platforms like TensorFlow have pushed the boundary further by simplifying the interface for large scale machine learning and deep learning (aka neural networking) models and algorithms.

Accessibility

All major cloud providers namely GCP, Azure, and AWS have made the virtualization of compute, storage, and network convenient. Regulatory compliance, industry standards, and security concerns were major inhibitors to migration to the cloud. However, cloud service providers are proving to be much more agile and flexible than internal IT departments, in meeting the changing regulatory needs and answering security threats.

With an abundance of supply & demand, along with a ready talent pool, easy to use models, and readily accessible “as-a-service” model for: infrastructure, platform, software, its time to develop a sound adoption strategy.

Adoption Strategies

Until we merge cloud, data science & day-to-day business enablers of an organization into an integrated value-delivery chain, we will not realize the full potential of resources we have at hand. As leaders, beneficiaries, and contributors we should:

  1. Migrate to the cloud

  2. Integrate ML/AI/DS into the standard project plan

  3. Redefine the role of a data scientist

Migrate to the cloud

The ongoing pandemic and the hyperactive hurricane season should be enough to add urgency to your cloud migration plans, assuming that you have not started yet. Using your legacy on-premise platform might still provide value today, but it will not prepare you for tomorrow.

All innovation and the ecosystem around it are being built in the cloud at fraction of the cost, for you to use as a service. To reap the benefits of this rapidly expanding set of services, you have to be willing to move away from the traditional on-premise model.

Integrate ML/AI/DS into the standard project plan

Leaders & Program Managers have to understand the value that ML/AI/DS can bring to the organization. Again, this value has to be more than an ego-boosting or technical excellence exercise. It has to be tangible, measurable, and repeatable to make a meaningful impact on the cost, revenue, and/or services that your organization provides.

Most of the current Program & Project Managers come from the standard application development mindset and follow the waterfall or agile practice. While there is nothing wrong with that, it's simply not designed to integrate seamlessly into the data pipeline.

The key differentiator is mindset: Traditional models are designed to eliminate ambiguity, are heavily rule-based, and stress on being 100% accurate. The data science world on the other hand embraces ambiguity, learns from the data, and moves ahead on a certain degree of (in)correctness. Implementation leaders will have to learn to appreciate the benefits of both and morph them to deliver value to the organization.

Redefine the role of a data scientist

I will use some excerpts from Lakshmanan, Valliappa books Data Science on the Google Cloud Platform, to demonstrate why the data scientist role is not a role in isolation. It is a role that adds value by actively participating in the product development life-cycle as opposed to purely advising.

We look at data engineers as an inclusive term for anyone who can “shape business outcomes by performing data analysis”.

To perform data analysis, you begin by building statistical models that support smart (not heuristic) decision making in a data-driven way. It is not enough to simply count and sum and graph the results using SQL queries and charting software — you must understand the statistical framework within which you are interpreting the results, and go beyond simple graphs to deriving the insight toward answering the original problem.

Thus, we are talking about two domains: (a) the statistical setting in which a particular aggregate you are computing makes sense, and (b) understanding how the analysis can lead to the business outcome we are shooting for.

This desire to have engineers who know data science and data scientists who can code is not Google’s alone — it’s across the industry.

Lakshmanan, Valliappa. Data Science on the Google Cloud Platform . O’Reilly Media. Kindle Edition.

Summary

For an organization to meet the demands of the future, it is critical to facilitate a rapid adoption of data science into the delivery pipeline.

People involved in the delivery chain will need a shift of mindset, the building Process must integrate data science into the project plan and the final Product will have to adopt the innovative toolset by moving to the cloud.

Previous
Previous

Unreasonable Data Science Certifications

Next
Next

Project Management Techniques