Want to Be a Data Scientist? Here are the 5 Skills You'll Need

These five skills are roughly ordered from the “hard skills” to “soft skills”.


What are the top 5 skills needed to become a data scientist? originally appeared on Quora, the place to gain and share knowledge, empowering people to learn from others and better understand the world. You can follow Quora on Twitter, Facebook, and Google Plus.

It’s a bit hard to summarize the whole field of data science into five skills (especially since the job “data scientist” means different things at different companies), but I’ll give it a shot here. These five skills are roughly ordered from the “hard skills” to “soft skills”.

Skill #1: Programming

This is perhaps the most fundamental of a data scientist’s skill set - the job of a data scientist is much more applied than that of a traditional statistician. Programming is important in multiple ways, including the three below:

* Being able to program augments your ability to do statistics. If you have a bunch of statistics knowledge but no way to implement it, your statistics knowledge becomes much less useful.

* The ability to analyze large datasets. The datasets you get to work with in industry are not as small and cute as the sample iris dataset - you easily get data that reaches millions of rows and many more.

* You can create tools to do better data science. This includes everything from building systems that your company can use to visualize data, creates frameworks to automatically analyze experiments, and managing the data pipeline at your company so the necessary data can be in the right place by the right times.

The normal software engineering training here will help you develop programming skills (although you typically don’t have to go as far as a usual software engineer would).

Skill #2: Quantitative analysis

Quantitative analysis is heart of a data scientist’s skill set. Much of data science is about understanding the behavior of a particularly complex system by analyzing the data that it produces, both naturally and via experiments. The need for quantitative analysis skills are important in multiple ways, including the three below:

* Experimental design and analysis: Particularly for data scientists working on consumer internet applications - the way that data is logged and the way that experiments can be run gives way to a massive amount of experimentation to test various hypotheses. There’s a lot of ways that experiment analysis can go wrong (ask any statistician), so data scientists can help a lot here.

* Modeling of complex economic or growth systems: Typical models like churn models or customer lifetime value models are common here, as well as more complicated models such as supply + demand modeling, economically-optimal ways to match providers and suppliers, and methods to model the growth channels of a company to better quantify which growth avenues are the most valuable. The most famous example of this is Uber’s surge pricing.

* Machine Learning: Even for the data scientists that don’t implement Machine Learning models themselves, there is tremendous value that data scientists can provide in helping create prototypes to test assumptions, select and create features, and identify areas of strength and opportunity in existing machine learning systems.

The requirement of this skill is why in particular the data science field is attractive to 1. Physicists 2. Statisticians 3. Economists 4. Operations Researchers 5. Many more, who are very used to understanding complex systems through top-down approaches (making models) or bottom-up approaches (inferences from data).

Skill #3: Product intuition

Product intuition as a skill is tied to a data scientist’s ability to perform quantitative analysis on the system. Product knowledge means understanding the complex system that generates all of the data that data scientists analyze. This is incredibly important for quite a few reasons, including:

* Generating hypotheses: A data scientist who understands the product well can generate hypotheses about ways the system can behave if changed in a particular manner. Hypotheses are based on “hunches” about how certain aspects of the system can behave - and one needs to know about the system to be able to have hunches about how it works.

* Defining metrics: The traditional analytics skill set includes defining key primary and secondary metrics that the company can use to keep track of success at particular objectives. A data scientist needs to know about the product in order to create product metrics that both 1. Measure what is intended 2. measure something that is worth moving.

* Debugging analyses: Results that are “incredible” are more often caused by bugs than actual “incredible” features of the system. Good product knowledge can help with quick sanity checks and back-of-the-envelope calculations that can help more quickly identify things that might have gone wrong.

Product knowledge usually involves using the product that your company is creating. If that’s not possible, then at least trying to get to know the people who actually use the product.


Skill #4: Communication

This skill is important to help significantly increase the leverage of all of the previous skills listed. This one is particularly important and can help distinguish a good data scientist from a great one. Good communication can manifest in various ways, including:

* Communicating insights: Some data scientists call this “storytelling”. The important thing here is to communicate insights in a clear, concise, and valid way, so that others in the company can effectively act on those insights.

* Data visualization and presentation: Sometimes theres nothing more effective and satisfying than a good graph at making or conveying a point.

* General communication: Working as a data scientist almost always means working as a team - including working with engineers, designers, product managers, operations, and more. Good general communication can help facilitate trust and understanding, which is incredibly important for someone who is entrusted with being stewards of the data.

Skill #5: Teamwork

This last skill ties together the rest of the 4 skills. A data scientist in particular cannot exist in isolation, and from what I’ve seen does best when deeply embedded in the rest of the company (or at least within the product development org).

Teamwork is important for many reasons, including:

* Being selfless: This includes offering help and mentorship to others, and putting the company’s mission before your own personal career ambitions.

* Constant iteration: A data scientist thrives on feedback, and most parts of the data scientist’s work will involve back-and-forth iteration and feedback with others to reach an impactful solution.

* Sharing knowledge with others: Since the data scientist profession is quite new, there is basically no one with the complete set of skills, especially if you collect together all of the possibly useful statistical techniques, frameworks, libraries, languages, and tools. Because knowledge will be spread out across the data scientists and the organizations, it is particularly useful for data scientists to be constantly sharing their knowledge, methods, and results with each other.

Conclusion

The first two skills: programming and quantitative analysis are perhaps what most people first think about when they think about the skills of a data scientist. While those are important and create the technical foundation of a data scientist’s skill set, I want to emphasize that three of these five most important skills are not technical skills.

The third skill is important in general for any product or service-focused company, and the fourth and fifth skills are critical for any job you do where you work with other people!

Good luck and best wishes on your own path to becoming a data scientist!

This question originally appeared on Quora. More questions on Quora:

* Data Science: How do I prepare for a data scientist interview?


* Jobs and Careers: What meaningful careers exist in data science (stats/ML/optimization)?


* Collaboration: How do you ensure a smooth hand-off between data science and engineering on machine learning projects?


Photo Credit: fandijki/Getty Images