The Data Science team, as part of the Office of the CTO, works with large amounts of data to improve products, conduct security research, and report on the state of the internet at large.
In this role, you'll apply your expertise to build and maintain the tools that enable, support and distribute the results of this research. You will work with partners across the company to understand and gain access to different types of data. You will work the data scientists on our team to build tools and POCs which demonstrate our findings. The tools and infrastructure you build will drive research, enhance products, and empower the community.
Understand the data
Where data is coming from?
How it is generated?
What does it mean?
Are there any noteworthy characteristics of the data (one column never varies)
Get the data
Work with data owners to get access to data
ETL samples, small slices and full sets of data
Build and enhance tools that analyze the data
Engage with other teams to enable and empower them using our research
Work with engineers on other teams to adopt methods derived from research
Minimum 2 years industry experience
Experience with Scala, Python, and SQL, and distributed processing using Apache Spark, Hadoop or Hive.
Proficient in AWS services, including EC2, SQS, VPC networking, S3, Athena, Glue etc.
Experience automating infrastructure through Terraform or CloudFormation, Chef, and Docker/Kubernetes
Strong communication skills
Strong programming skills in Python, Java, Scala, and Bash
Strong debugging skills, including the ability to reproduce a bug given limited information and/or time
Experience with the Git version control system
Experience with test frameworks across a variety of programming languages