To retain market leadership in the algorithm economy, enterprises require new ways to maximize the value of data and AI with citizen data scientists. Don’t think citizen data science is viable? Think again. In my work with DataRobot over the past two years, I’ve seen successful citizen data science programs expanding in some of the world’s largest companies across the globe. Like any new initiative, there are lessons learned. The biggest lesson so far is not to underestimate the positive impact citizen data science can deliver…when rolled out responsibly!

Since the technical barriers to applying data science have been reduced with automated machine learning, I have not seen talent gaps holding customers back. I usually see cultural, change management, and data literacy issues as the most common challenges to overcome. O’Reilly shared similar results in a recent survey of enterprise AI adoption.

I also often hear “we’re too busying moving data to <insert big cloud vendor name here>.”

You need to do both = get started with AI while moving data around.

When someone tells me their staff is too busy moving data, I envision the scene in Titanic of moving deck chairs. Everyone has been too busy moving data around for decades. We’ve been moving data in and out of mainframes, app storage, Excel files, relational databases, data warehouses, data lakes, cloud, and the list could keep going on forever. AI is the next key battle to win. No industry is immune.

What is a citizen data scientist?

Gartner defines a citizen data scientist as a person who creates or generates models that use advanced diagnostic analytics or predictive and prescriptive capabilities, but whose primary job function is outside the field of statistics and analytics. You will most likely find them solving problems with data in business analyst, data analyst, business intelligence, data engineering or software engineering roles within a line of business.

Citizen data science talent will NOT replace your data scientists.

Citizen data scientists will help you scale data science and improve productivity allowing traditionally educated data scientists to focus on more complex projects and serve as a mentor leading the way forward responsibly.

When I talk to groups using citizen data science, the conversations I’m having are similar to the ones that I had during the prior traditional BI to self-service BI movement. In that transition, I didn’t know a single BI professional replaced by self-service BI. If you were a BI professional, you ended up delivering more value, faster by establishing a solid self-service BI foundation for the masses and/or working on many more, shorter projects.

When scaling AI with citizen data science, you’ll see the same rapid value patterns.

What is different? Citizen data science has a much steeper learning curve – that you may or may not even recognize.

Governance is a Critical Success Factor

To be successful with citizen data science tools, governance, ongoing practical training, protections within your tools, and guidance from experts will be needed. From understanding machine learning basics, defining AI use cases, ramping up on the tools, unfamiliar terminology, charts, evaluating, integrating and managing models, there is a deep domain-level learning curve that should not be underestimated. You’ll also need to learn the art of translating and explaining AI effectively to the business.

Art of AI Storytelling

Download: The Introduction to AI Storytelling ebook

Since quality input prep is crucial for machine learning, much of what I shared in the past about self-service data prep governance is extremely relevant. The market and legal climate for data is complex and ever-changing, especially when it comes to personal data legislation such as the General Data Protection Regulation (GDPR). Data ownership, data usage consent, ethical, compliance, legislative, and privacy policies need to be thoroughly researched and clearly understood. There needs to be a careful balance between data risks and business value.

Fundamental concepts of self-service data preparation governance

Current attitudes towards self-service analytics value a reasonable balance between self-service reporting agility and central governance. They also include key data stewardship features to assign data quality accountability.

Ideal citizen data scientist profile

Who should we upskill? Since I am frequently asked how to identify the right talent to train for citizen data scientist programs, I put together a quick summary of ideal candidate educational, logistical, and technical skills prerequisites.

  • Avid constant learner, willing to attend training and adopt new tools
  • “Go to” resource tasked to solve complex, expensive problems
  • Subject matter knowledge of business area processes and data sources
  • Builds reports in Excel, Tableau, Power BI, Qlik Sense, SAP Lumira,
    TIBCO Spotfire, Minitab, Microstrategy, R, Python or similar tools
  • Basic understanding of descriptive statistics
  • Comfortable querying and preparing data sources with SQL, R, Python OR data wrangling skills using tools such as Alteryx, Tableau Prep, Trifacta, Paxata, Datawatch, Informatica, Microsoft Power Query, and other tools

Although programming skills are not necessarily needed for some of the citizen data science tools, I firmly believe basic SQL skills to collect and query data is a necessity. The perfect citizen data science candidate should love gathering, querying and analyzing data.

Designing a citizen data science strategy

Much like the self-service BI evolution, citizen data science also requires enterprise-level planning, protections, governance, collaboration, compliance documentation, auditing, varied deployment options, and robust model management for AI at scale. Few groups have mastered what it really takes to become an AI-driven enterprise. That is one thing I cherish about my humbling DataRobot experience working with hundreds of data scientists. I’m seeing and hearing what it truly takes to get citizen data science right from a wide range of industries – consulting firms, marketing agencies, banks, fintech. retailer, healthcare providers, manufacturing, telecom, sports teams, etc.

In the upcoming weeks, I’ll be traveling to Australia, Singapore, and Hong Kong to share citizen data science lessons learned and discuss this newer area of analytics. I will walk-through early adopter stories, example roadmaps, recommended staffing, upskilling, mentoring and ongoing governance. If you want to participate, please register at one of the following sessions. If you are interested in a session like this in your city or for your organization, please let me know and I’ll see what I can arrange for you.

Asia / Australia In-Person Round Table Events