Data Scientist, Genetics at Color Genomics
Burlingame, CA, US
Color believes that everyone deserves access to high-quality information they can use to improve and maintain their health. We are transforming how clinical-grade genetic testing is done by removing financial, educational, and geographic barriers. Our goal is to expand physician-supported access to genetic testing to help every person, everywhere understand their risk for hereditary disorders.
 
The data science team at Color is tasked with turning our large, growing corpus of genotypic/phenotypic data into actionable insights, using the most advanced methods available, to help individuals improve their health. Some of the work we’ve done recently:
  • Develop novel imputation approaches from low-coverage whole-genome sequencing (collected locally, in our state-of-the-art NGS lab);
  • Deploy machine learning models throughout multiple components of Color’s stack, including variant calling and classification;
  • Build the largest publicly available dataset of genetic variants in cancer genes coupled with detailed phenotypic information;
  • Work with collaborators at Harvard, the Broad Institute, Google, UCSF, and other leading institutes to share insights with the community.
You’ll find more details on some of these in the publication section of the Color website.
 
As a member of the team, you’ll innovate on similar projects: develop data-driven insights to guide our product roadmap and reach a broader market; apply state-of-art methods in genomics to improve our assay and offerings; and, ultimately, help individuals around the world utilize the technology you helped build.
 
Members of the data science team have diverse backgrounds in population genetics, biostatistics, machine learning, business analytics, and engineering. We enjoy working with other teams throughout the stack, including the early data exploration phase, scientific discovery, rigorous implementation, and eventually product integration and follow up.
 
You might be a good fit if:
  • You’re passionate about using genetics and statistics to help save lives and prevent catastrophic disease. You want to apply your skills in the service of a greater mission.
  • You're well-versed in the tools of the modern statistical genetics and computational biology: coding in Python, R, or C/C++; common genomics tools such as Plink, bcftools, and GATK; and significant experience working with large datasets.
  • You have deep knowledge of human genetics and the statistics underlying it: association studies, linkage, heritability, and quantitative genetics. You’ve worked on and published about ancestry modeling, genome-wide-association-studies, relatedness, phasing, or similar problems.
  • You have the ability to plan and execute rigorous statistical analysis on population scale genetic data sets.
  • You have a MS or PhD in a quantitative or computational field. 
  • You value strong communication and presentation skills.