Andy Lane

Ph.D. Data Scientist. I organize, analyze and produce insights from biological, genomic and health data.

A project developed during my time at Insight Data Science to predict cholesterol content of items on local restaurant menus
A bioinformatics toolchain developed during my postdoc work at Berkeley to plan and analyze the output of CRISPR-EATING, a method I developed for generating complex CRISPR sgRNA libraries from physically isolated gDNA/cDNA samples
A quick tool to convert Pandas DataFrames to CSS-styled rasterized tables for incorporation in printed reports. See the blog post for more details
 

Experience

Google via Adecco

Molecular Biologist

February 2017 – September 2017

Mountain View, CA

  • Grew Accelerated Science molecular biology team from two to five
  • Built reagent and inventory tracking system using Google internal software tools (Google AppScript, Google Cloud Platform)
  • Developed lab data aggregation infrastructure for experimental data extraction and quality control
  • Worked with machine learning Research Scientists to custom-design new molecular biology approaches optimized to produce training data for TensorFlow sequence-function model building
  • Developed automation of molecular biology pipelines on liquid handling robotics systems (Agilent Bravo, Formulatrix Mantis)

Insight Health Data Science

Data Science Fellow

September 2016 – December 2016

San Francisco, CA

  • Aggregated data from allrecipes.com, from chain restaurant menus and from local restaurant menus to train, validate and deploy a cholesterol prediction model that estimates the cholesterol content of menu items at local, non-chain restaurants
  • Employed natural language processing approaches to train and evaluate several types of machine learning models, choosing an interpretable logistic regression model that generates easy-to-understand menu analyses, including providing data on why menu items were predicted to be in a particular cholesterol category
  • Deployed www.menusights.com, a web tool that helps users find predicted low cholesterol items on the menus of local restaurants

University of California, Berkeley

Postdoctoral Fellow

September 2012 – September 2016

Berkeley, CA

  • Designed, engineered and patented a combined computational and molecular biology method to generate CRISPR tools at >100-fold lower cost than previous methods. Altmetric scored the publication in the top 5% of of all articles in 2015.
  • Developed a computational toolchain to model combinatorial DNA-modifying enzyme activities, predict CRISPR reagent genome-wide specificity and design and output PCR primers used in accompanying downstream wet-lab procedures
  • Published a manuscript in Developmental Cell demonstrating the use of this method in generating sequence-specific CRISPR probes for use in fluorescence microscopy of large regions of chromosomes in living samples, achievable for the first time using this method Pubmed | PDF
  • I wrote an intro to CRISPR-EATING for Benchling

Education

University of Minnesota

PhD, Molecular Biology and Genetics

September 2006-May 2012

Minneapolis, MN

A​dvisor​​ Dr. Duncan​​ Clarke
Project​​​​ ​The​ ​extreme​ C-terminus​ of​​ human​​ Topoisomerase​​ IIα​​ defines​​ a ​novel bi-modular​ ​DNA​​ tether​​ essential​​ for​​ the​​ formation​​ of​​ mitotic​ chromosomes Pubmed | PDF

University of St. Andrews

BSc, Biology

September 2001-May 2005

St. Andrews, Scotland

  • Minor in French
  • Exchange program at Purdue University, West Lafayette, IN during 2003-4; worked in lab of Dr. Changlu Wang on urban pest management study in Gary, IN.
  • Founding member of French-Science society for scientists studying French at St Andrews

Languages and Tools

Fundamentals


Python, Linux/UNIX & bash, git
  • 5 years' Python experience
  • 10+ years' UNIX/Linux experience
  • I manage projects in git via GitHub.

Genomics and Scientific Computing


BioPython, NumPy, R, BLAST, Genome Analysis Toolkit (GATK)
  • BioPython library for managing and manipulating DNA sequence data on CRISPR-EATING project
  • NumPy evidenced in MenuSights and elsewhere for managing/processing encoded NLP data (word vector embeddings etc.)
  • R for data exploration and visualization (plyr, ggplot) and dose-response curve fitting (drc) for protein/peptide binding data affinity constant calculation
  • Significant experience with configuring BLAST queries and processing results; built BLAST-based PCR amplicon specificity tool and CRISPR off-target scoring tool as part of CRISPR-EATING (VirtualEATING) toolset
  • GATK for various experimental projects around predicting the action of CRISPR-EATING on cDNA samples

Machine Learning


scikit-learn, TensorFlow
  • In building the MenuSights project, built models initially in scikit-learn (final model a logistic regression) and later extended to TensorFlow models using word-vector embeddings

Data Visualization


Plotly, ggplot, matplotlib
  • Published scientific articles using ggplot visualizations during PhD work and using matplotlib to describe CRISPR-EATING libraries
  • Routinely plot machine learning metrics during model building using matplotlib
  • Use Plotly for interactive exploration of e.g. outlier datapoints in ML model building

Databases


SQL (MySQL/PostgreSQL/SQLite), SQLAlchemy ORM
  • Built CRISPR-EATING sgRNA specificity scoring database with versions in MySQL and SQLite
  • Built MenuSights training and test databases in Postgres, handling data using the SQLAlchemy Python ORM

Web Development


Amazon Web Services (AWS), Flask, CSS, HTML, Google AppScript
  • MenuSights.com and personal websites designed around and hosted on AWS. Leveraged AWS infrastructure for parallel processing of millions of sgRNA specificity scores on CRISPR-EATING project and threaded deposition into MySQL database
  • Built MenuSights.com in Flask
  • Spent 2005-2006 working building internal websites with CSS and HTML at Citigroup. Personal site is 80% hand-coded CSS/HTML...
  • While at Google Accelerated Science, built a lab inventory and barcode scanning system using Google AppScript, a JavaScript-derived language for interacting with Google Apps products. In the system, barcode scanner events are processed with triggered scripts to update the quantity of barcode-tagged lab consumables and record their storage location to produce an inventory and history of their usage

Other details

🇺🇸 US Permanent Resident/Green Card holder