Data Scientist, Machine Learning and Analytics at Color Genomics
Burlingame, CA, US
Color believes that everyone deserves access to high-quality information they can use to improve and maintain their health. We are transforming how clinical-grade genetic testing is done by removing financial, educational, and geographic barriers. Our goal is to expand physician-supported access to genetic testing to help every person, everywhere understand their risk for hereditary disorders.
The data science team at Color is tasked with turning our large, growing corpus of data into actionable insights, using the most advanced methods available, to help individuals improve their health. Some of the work we’ve done recently:
  • Deploy machine learning models throughout multiple components of Color’s stack, both in the genetics domain and for business and operations tasks. Color is the first clinical lab to utilize Deep Learning for variant calling (this is joint work with Google), and one of the few to apply machine learning at scale.
  • Build a unique data platform utilizing BigQuery and AWS to accelerate exploration of computationally-intensive tasks such as genome-wide-association-studies, and a compute engine routinely utilizing thousands of cores for analysis.
  • Build the largest publicly available dataset of genetic variants in cancer genes coupled with detailed phenotypic information.
  • Work with collaborators at Harvard, the National Institute of Health, Google, UCSF, and other leading institutes to share insights with the community.
You’ll find more details on some of these projects in the publication section of the Color website.
As a member of the team, you’ll innovate on similar projects. In particular, you’ll advance the application of cutting-edge ML to genetics and related clinical data -- healthcare is one of the domains where ML will make an outsized impact over the next years. Ultimately, the technology you build will reach individuals around the world and help them live healthier lives.
Members of the data science team have diverse backgrounds in biostatistics, machine learning, business analytics, and engineering. We enjoy working with other teams throughout the stack, including the early data exploration phase, scientific discovery, rigorous implementation, and eventually product integration and follow up.
You might be a good fit if:
  • You’re passionate about using software and technology to help save lives and prevent catastrophic disease. You want to apply your skills in the service of a greater mission.
  • You have deep knowledge of machine learning and have applied it to real-world problems in an industry setting. You feel comfortable driving all components of an ML projects: data collection and cleanup, metric design, development and iteration, post-deploy operations, and so on. You are excited about new technologies, but pragmatic about using common frameworks (TensorFlow, PyTorch, sklearn)
  • You have solid programming skills, good familiarity with Python and Git, and are excited about an environment that mixes science work with production engineering
  • You have a MS or PhD in a quantitative or computational field. 
  • You value strong communication and presentation skills.
  • Familiarity with genetics and the healthcare space is a plus, but not required.