The Engineer's Guide to Hiring Top Data Talent

Hey there, Mr. Founding Engineer. I’ve been where you are right now — the CEO of your 9-person company has just asked if anyone on the team knows “a good data person”. You’ve got a decent amount of user growth, Amplitude is telling you how many people are signing up day-over-day, and you now need to understand exactly how much your marketing costs, where your users are coming from, and what that dreaded churn rate is. The CEO looks at you and goes “Hey, would you mind taking the lead on this? You’re probably the best person to interview them.”

Well, as a founding engineer, there’s a lot of technology out there to know. You need to know the differences between the front and back end, how to manage deployment – not to mention the nightmare of scaling, especially when you recently gained a ton of popularity in Malaysia and you now need to also work out of an AWS region in Asia. How on earth are you supposed to also know how to hire the right first data hire?

Fear not, Mr. “SQL-is-not-engineering” — this article is here to help. We’ve also included some code to give you some context as to what you’re going to be reading here in the coming months.

The Conductor: Data Engineer

For most young startups, a data engineer is probably the largest generalist you’re going to hire. Because they understand software and data, the ideal candidate will be able to help with anything from basic analytics using SQL to deploying and monitoring Airflow and jobs within a container hosted somewhere. They should also understand cloud networking and be able to make sure that your data can move from place to place easily without exposing you to outside risk.

Data Engineers should also be able to work with your software team on designing your schematic for data. They’ll be able to provide support when it comes to designing your event systems, understand how to set up your database structure, and fundamentally will be the gatekeepers of the infrastructure that the rest of your company will use to answer questions – about everything!

For the purposes of this post, let’s do some roleplaying. Assume you are the founding engineer of a content streaming service called “primeflix” - you own the rights to shows such as “How I Saw Your Aunt”, “Broken Badly”, “The Cubicle”, and many others, and have even produced a hit show called “Ned Tasso”. In this world (and once your stack has reached a “mature” age), this is what the data engineer is going to be doing:

Hiring and Interviewing Data Engineers

In the early days, though, you’ll want to optimize for slightly different use cases because your data engineer will have to do a lot more than just copy data over. (If that’s the only thing that you’re looking for, let me introduce you to my good friend Airbyte.)

Here are some requirements that you’re going to want to look for:

Your first DE is going to hire other DEs. It’s beneficial for them to have a background as a Data Engineer where they’ve been in some sort of a leadership position.
They’re going to be designing the initial version of your data warehouse, so they should have some sort of experience with Data Warehousing — anyone that has used DBT, Airflow + SQL, or – better yet – cron + shell scripts should be able to handle this process a-okay.
Your DE should be able to set up culture and standard practices around code styling, in addition to peer review practices and good planning against developing in a dev environment and pushing to production.

For interviewing, I’ve added an example of a question here. The question asks how a user would parse a JSON object into different tables and how they’d deal with relations. It’s a useful question to ask that gives you a lot of information about their database design skills, and will show you how they think through problems.

Example code: Here’s a notebook with some code that is designed to replicate an event stream from a content streaming platform. Special thanks to Andre Sionek for designing the initial system that we worked off of for our own use!

The Manipulator: Analytics Engineer

The Analytics Engineer is a bit of a new title, but the job has been around for a while. Coined by the folks over at dbt labs, it’s often thought that “Analytics Engineer” is a term that is mainly used to describe someone who uses dbt. That’s not exactly the case.

Now, while I recommend that you do use dbt for SQL modeling (there’s really nothing better in the current landscape), Analytics Engineers are responsible for transforming data. If we take “ELT” (extract, load, transform) as the primary workflow for the modern data stack, then anyone who does “transform” part of the workflow can theoretically be called an analytics engineer. Do other titles transform data? Of course they do – and we’ll go over those below — but the general job of an analytics engineer is to maintain the transformation code that takes data from the source and turn it into materializations that are able to be used by other “data consumers” in the organization. In a perfect world, they would maintain the tables that track KPIs, and design and organize how event data becomes normalized for use.

A good way to think about the job of an analytics engineer is to think about where they might play into the stack. If we look at a company that tracks website and product sessions using Segment (and dumps that data into Snowflake), sells good through Shopify, and tracks core app data in their own Postgres instance, it’s the job of the AE to take the data after it’s been delivered and create “usable data” from it. A good example of this would be to use the data ingested from Shopify to create an “e-commerce” data mart, and pair that with the segment data to understand the conversion funnel. The data can either then be used by consumers down the stack or plugged-and-played via a BI tool.

If we take the “Primeflix” example from earlier, you can see how the Analytics Engineer builds off of the work from the Data Engineer. The data is safely copied over, and the analytics engineer can take the various source datasets and merge them to provide additional context.

Hiring and Interviewing Data Engineers

Personally, I actually think that DBT is still a hidden, rare skill amongst folks that have this background, so that would be the first thing that I would ask about (or check for on their resume or LinkedIn). DBT is relatively easy to learn, so instead of asking DBT specific questions, here’s a good post on their website about hiring an analytics engineer.

For a more instrumental understanding of a data warehouse, you can ask them basic structure questions (i.e. “what is the difference between a foreign key and a primary key”) or have them paper-code a complex SQL query (preferably with window functions in your flavor of choice).

A good example of some SQL code that transforms the event stream that we created in the “Data Engineering” section can be found here.

The Campaigner: Data Analyst

The Data Analyst is probably the most versatile and hardest to “place” into a specific box. It is also arguably the role that fits into different categories the most and therefore the most different from company to company and from team to team. At Peloton (where I worked prior), for instance, the data analysts on the data team could do anything from advanced analytics engineering to giving the C-suites at the company weekly briefings to the state of the business. In another part of the business, a Data Analyst owned all of the forecasting and budgeting for the marketing part of the business. Data Analysts are often the most misused term in the industry, and skillsets (and therefore, pay bands) can very extremely different from place to place. Still, I’ll try my hardest to explain what a data analyst does, why you need them on your team, and what skillsets they could have.

There’s actually a world in which the data analyst is the first hire you make on your data team (that’s what Peloton did)! For example, if you’re a company that’s just getting started with the modern data stack, you might want to talk to our friends over at Mozart Data to help you get a basic analytics stack set up. They’ll help you connect to all your services and ingest the data into Snowflake. After that, as long as your data analyst can write basic SQL queries, they’ll be able to set up an account, create jobs, and begin answering questions almost immediately (can you tell that we’re fans of Mozart?).

Hiring a Data Analyst (early in the life of a company)

In the world where you don’t have a complex data model from the start, a good data analyst would probably be the best hire. They have a similar skillset to the analytics engineer, but you will want to over-expose for people that prefer to not only code but also to manage expectations from stakeholders. This hire will have a skillset similar to an Engineer and a Product Manager, and should be compensated as such.

Hiring a Data Analyst (later in the life of a company)

A lot of the same rules from the above paragraph apply, but the diagram continuing our journey along the “Modern Data Stack” suggests a good place for this hire to sit. Most of the time, they “own” a specific set of KPIs (and will be in charge of changing the definitions as your business evolves) and be able to create visualizations, drill-downs, and explain the way their various KPIs ebb and flow.

Many roles have an “analyst” twist to them: a good product manager should also be able to do analyst-level tasks, as should other jobs. In a world where self-service BI tools have gotten quite good, the line between the Data Analyst and the data consumer has evolved — an analyst that was repeatedly pulling KPIs and updating a spreadsheets years ago is now either performing complex manipulations at the scale of an Analytics Engineer, or is using tooling to help the business users understand why a shift has occurred in a specific KPI or metric. For the modern data stack, try optimizing for analysts that are able to communicate with stakeholders, have the ability to perform data manipulations (this can be in SQL, Excel, or in some use cases R and Python) and strong proficiency in using a BI Tool (as of this writing, our favorite is Looker, though that has gotten quite expensive. May we suggest Lightdash?

The Extrapolator: Data Scientist

The Data Scientist role is also often misused and misunderstood, generally by smaller teams who don’t understand the nuances between what a data scientist can vs. should be doing and the technologies for their skillsets. A data scientist at a younger startup might be in charge of basic analytics and experimentation, while may be deploying full-scale machine learning models at larger companies. Because of the variability and noise in the space, skillsets between data scientists can vary widely from technology to technology.

In general, a Data Scientist should and will do the following:

Note: There’s a code repository available with this post, here.

‍

The Engineer's Guide to Hiring Top Data Talent

The Conductor: Data Engineer

The Manipulator: Analytics Engineer

The Campaigner: Data Analyst

The Extrapolator: Data Scientist

Looking For A Clear View of Customer Health?