BillionToOne | Staff Software Engineer | SF Bay Area | Hybrid or Remote
BillionToOne has developed a DNA molecular counter that increases cfDNA diagnostic resolution by over 1,000x. BillionToOne's first product, UNITY, is the first and only non-invasive prenatal test that directly screens an unborn baby for the most common and severe genetic disorders using only a single tube of blood from the pregnant mother without the invasiveness of amniocentesis.
BillionToOne is ranked at the top 5% of Y Combinator companies and has raised $300M+ in funding from prominent VC firms.
We are hiring a Staff Software Engineer to scale up compute-intensive bioinformatics workflows and build infrastructure tools that enable scientists, bioinformaticians and other technical teams at BillionToOne to robustly write and deploy code.
Tools we use include: python, Django, AWS, Terraform, Sentry, and Datadog.
I'm a CTO as well and have been fascinated by biotech field. Could you please describe briefly what's it like to be CTO in biotech? What kind of problems do you solve?
Not parent, but have been involved in getting patents. The process itself is not that complicated, but it can be difficult to get a patent which has some value (very easy to end up with a worthless one). I recommend reading the Nolo book on patents, if only to help you understand what's going on, and help you select a good attorney.
BillionToOne (YCS17) | Staff/Senior Software Engineer | Remote (US) or Onsite (Menlo Park CA) | www.billiontoone.com
BillionToOne develops innovative diagnostics tests that can affect the lives of millions of patients. Our QCT technology platform improves the resolution of cell-free DNA testing by >1000x fold and enables novel tests for both prenatal and oncology care. We've raised over $30M and have launched multiple clinical products in the past 2 years, including an FDA authorized COVID-19 test.
We are hiring a senior engineer (5+ years experience) to build internal APIs, bioinformatics procecessing pipelines, laboratory automation tools, and help manage CI/CD. We use python, django, rabbitmq, circleci, dbt, postgres, heroku, aws, and a variety of other tools.
If you have experience in full stack web development, love seeing your work positively affect your colleagues, and thrive in a fast-paced entrepreneurial environment, this could be a great opportunity for you.
Please contact me directly at david@billiontoone.com if interested.
I completely agree with the point about integrated REPL/IDE, and wanted to share some of the combinations I have used in the past, since it can be a concrete getting started point for those who are curious. Some of these are not literally repls, but IMO give a similar experience.
- ClojureScript with Figwheel and the web browser
- Clojure with Emacs Cider, Clojure with Cursive
- R and Rstudio
- Matlab
- ipython jupyter notebook
- Pycharm debug breakpoints that are triggered by unittests (Running the unittest to initiate a python repl at the breakpoint)
3) is almost right--these sanger sequencing instruments are already widely available across the country for research use. Here in the Bay Area, I can choose from at least 4 different Sanger sequencing services that will run 10,000 samples at $2/sample in 24 hours. For example, see: https://www.mclab.com/DNA-Sequencing-Services.html
Sample collection and accessioning (accessioning is unpacking test tubes one by one and aliquoting them into plates in the lab) is definitely going to require a lot of manpower. I'm hopeful that patients "self swabbing" can help alleviate some of the manpower needs. (Self-swabs are not allowed currently under FDA guidance).
My guess is that self-swabbing is allowed in the Seattle SCAN study because it is a research study. The SCAN study is super fascinating because it would be crazy unusual under normal times for a research study to return results back to patients; I'm very happy they are able to do that, and it speaks to the severity of this pandemic.
> The SCAN study is super fascinating because it would be crazy unusual under normal times for a research study to return results back to patients; I'm very happy they are able to do that, and it speaks to the severity of this pandemic.
The Seattle Flu Study wasn't allowed to communicate back to patients but "By Feb. 25, Dr. Chu and her colleagues could not bear to wait any longer. They began performing coronavirus tests, without government approval. What came back confirmed their worst fear. They quickly had a positive test from a local teenager with no recent travel history." https://www.nytimes.com/2020/03/10/us/coronavirus-testing-de...
I'm glad to hear they're allowed to do the SCAN study!
> The FDA has recently made clear that no at-home self-collection tests are allowed
I think rightfully so. I was on a government video conference where a doctor showed the current CDC testing procedure, which involves stick a swab in the nose all the way to the back of the throat. They explained that the further the sample is taken from the lungs the less accurate it is.
For clarity, a 500-patient study in WA (not yet peer-reviewed) showing self-swabbing to be as effective as swabbing by health care workers has prompted FDA to allow patient to self-swab at clinics, just not at home: https://medcitynews.com/2020/03/fda-says-patients-can-self-a...
This is a test to see if someone has a current COVID-19 infection. The antibody tests (serological) tests are also important, but since it is estimated that only ~1% of the US has previously contracted COVID-19, it will be a while before serological testing becomes useful at a population level.
Our initial data show no false-positives and no false-negatives out of all specimens assayed. However, it is early days still and none of the leading tests have real-world data on false-positive and false-negative rates. The crucial parameter here to compare test performance is limit of detection (LOD). We showed we could detect as few as 10 molecules of virus, which is on par with the best RT-qPCR tests.
Cost is definitely an important consideration for roll-out of a widespread test. We anticipate that the cost will be about $15 per test.
They say they have not had false positives or false negatives. If that's not a lie, then that's not a lie. It does not mean the test is perfect, but it likely means the false result rate is low.
That he was simply making a claim about what their initial data show. That is not a claim to perfection. Indeed, the whole point of saying 'initial' is to leave open the possibility that later data show something different.
Personally, I don't think your interpretation is more charitable. If they think that their initial results possibly aren't correct then launching a business and claiming things like "Extremely Accurate" and "on par or better than other COVID-19 tests available" is fraudulent.
Their initial results may be correct, but it is unlikely they’ve had the chance to do enough of them to pin down a failure rate with any precision. They also claim that other available COVID-19 tests haven’t undergone enough verification to establish a solid error rate, and if that’s so “on par or better” on the evaluations that have been run is fair as well.
Dismissing the value of serological testing seems like a self-serving move.
Such tests are immediately of very high value because they allow us to understand immunity to Covid-19, and to actually validate the estimates of cases amongst those who haven’t sought treatment.
These are both of huge value regardless of the percentage of the US population estimate to have previously contracted Covid-19.
By all means market your test which seems like an awesome contribution, but please don’t do so by devaluing other important tools.
[edit: the parent post has been edited to be less dismissive without acknowledgement since I made this comment]
[edit: looks like I’m wrong about the post being edited. Sorry for that. I stand by everything else I say here:
Serological tests are useful and needed right now, not at some future stage. It’s not hard to google to verify this, and it’s irresponsible to downplay the value of a test we need now.]
I don't think we are in any way undervaluing the serological tests. A great serological test would be very useful. However, because of the indirect measurement, they tend not to be very sensitive or specific, so it is fair to say that they become more important not at the peak of the pandemic but at the post-pandemic period.
By the way, as a company, we really don't have much to gain from this test. If it does not get adopted, we'll go back to building our core business which has been growing 100% quarter over quarter. We have poured resources into the current development, because it is the right thing to do. No investor in our Series B round will take into account any non-recurring sell-almost-at-cost revenues that we get from a once-in-a-century pandemic.
For what? "Google for answers" isn't very helpful. It sounds like they're saying serological tests are valuable, but not viable for immediately testing a large population.
Part of the problem with serological testing is false positives. I'm no expert, but my understanding is that the tests available may have false positive rates on the order of 0.5% to 2%
If only 1% are infected, and the false positive rate is 1%, it's quite hard to determine what the test result actually means. On the other hand, some places like New York City are already at 0.7% per-capita known positives, and they're not even able to test everyone with symptoms. Their true infection rate could very well be 10%, which serological testing could confirm.
Yes, which is why a better serological test would be extremely useful, the sooner the better. In no way does this undermine the value of BillionToOne’s test, but it is irresponsible for their CEO to be saying that serological tests are not useful yet when they are.
Serological tests are useful and needed right now, not at some future stage.
It takes about a month before an antibody test is accurate in someone who has had a virus, so using them now would only detect cases where the person had the virus in early March. That limits the value of serological testing considerably.
I'm the co-founder and CTO at BillionToOne. I'm happy to answer any questions here. I've also posted a slightly more technical explanation of how the test works and why it can scale here: https://twitter.com/dtsao/status/1247642005510873088?s=21
Edit: Since our site seems to be overwhelmed at the moment, here's a recap:
We’ve been working hard at BillionToOne on a new COVID-19 test that scales testing to everyone in the US. Our test (1) re-purposes existing infrastructure, (2) eliminates time-consuming RNA extraction, and (3) enables a distributed system for COVID-19 testing.
The first thing we figured out is how to run COVID-19 tests on existing automated Sanger sequencers. One sequencer can process up to 3840 samples per day. There are hundreds of sequencers of excess capacity because they were built for the Human Genome Project over 20 years ago.
It would take only 2 sequencers to surpass the current test capacity for all of California. There are far more than 2 sequencers in California (some individual labs have 10 or more).
We tweaked the protocol so COVID-19 could be detected from sequencing data using linear regression. Basically, we add ~100 copies of a known DNA sequence to help us calculate how much virus nucleic acid is in the specimen. It works just as well as gold-standard RT-qPCR.
Lab workflow for COVID-19 testing is traditionally 1. Specimen accessioning, 2. RNA extraction, 3. RT-qPCR 4. Reporting. RNA extraction, in particular, has been a huge bottleneck in terms of reagent shortages and labor-intensiveness.
We showed that we can skip RNA extraction entirely without affecting test sensitivity and limit of detection.
By skipping RNA extraction and using automated Sanger sequencers, we think we can get to an additional 200,000 samples per day test capacity in existing clinical labs.
A distributed system is often the only way to operate at massive scale. A fully distributed system could have different sites and labs responsible for each process and dynamically re-allocate resources based on availability and capacity.
The Broad institute COVID-19 lab has already started doing this. They are asking for specimens to be submitted in a standardized tube format and pre-barcoded. They have essentially distributed the specimen accessioning work.
Because there is a highly developed service industry for Sanger sequencing with <24 hour turnaround, there is an opportunity to further scale up testing by distributing the work to their (currently) idle sequencers.
Distributed testing could scale from 200k to >1 million tests per day, but would require a change in regulations that currently prohibit it.
Thanks to the BillionToOne team for pulling this work together! Next step is to start manufacturing test kits and obtain Emergency Use Authorization from the FDA. We’re eager to work with clinical Lab Directors and contract kit manufacturers.
Hey, lowly Bio undergrad here, but how are you able to skip the RNA extraction step? I read the paper and you use the viral transport medium, but wouldn't you have to also purify RNA from that (or is it just much easier to extract RNA from that medium)? I also dived into the paper behind the "skip the RNA extraction step" methodology and it basically seems to swap out one RNA extraction kit for another (Qiagen RNeasy Mini kit and the Qiagen RNeasy Micro kit). Couldn't shifting kits from one provider to another introduce supply chain strain? (or am I just oversimplifying it?)
Thanks for the question! The goal of skipping RNA extraction is to decrease the amount of labor necessary for processing samples and also to eliminate a dependency on RNA extraction reagents that have recently become difficult to find. The FDA is very strict about the specific brand and model of kit you use, so showing that you can swap out one RNA kit for another is actually very useful because you will have alleviated some of the supply chain strain (although I agree at high enough load both supply chains will then become limiting).
The way currently available COVID-19 testing works is by detection of viral RNA. Since the amount of viral RNA in a patient sample is too low to detect directly, we first need to amplify it by PCR. However, this viral RNA is packaged within all sorts of proteins and lipids that could make it inaccessible to amplification unless they are first purified away. Furthermore, the sample is shipped in "viral transport medium", which is essentially a cocktail of chemicals designed to preserve the virus. Unfortunately, these preservatives often have the side effect of interfering with PCR amplification, so these too need to be purified from the sample.
However, since RNA extraction is usually the most laborious part of the assay, there has been a lot of interest in optimizing the amplification so that it is resilient to all of these impurities. The preprint referenced in our manuscript (https://www.biorxiv.org/content/10.1101/2020.03.20.001008v1) gave us the initial idea that this could be possible, and much of it comes down to the choice of amplification method (e.g. choice of enzymes and buffers) that you choose.
However, even when you choose a "good" enzyme and buffer, you will still suffer an amplification penalty, and this will cause you to return a false-negative on some affected samples because there was so little virus in the sample to begin with. The innovation we have is to spike-in a correspondingly low level of DNA to the reaction mixture. That way, if you see the low level of DNA without seeing any viral signal, you can be assured that the amplification still worked and that there truly is no virus in the sample.
In the UK they're saying there's a shortage of swabs and pipettes even, do you not need these too?
Also, in the UK our independent and uni labs have been saying for almost a week they could extract the RNA differently but the NHS have a fixed approved way that they won't change.
- Are the chemicals you're using more common or would there just be a new shortage of different chemicals?
- Is there a risk you'd be creating a test that didn't work very well, and the US would end up with a bunch of useless tests (e.g. Italy had to abandon a bunch of Chinese sourced tests, UK's anti-body tests are ineffective)?
Our technique would still be affected by shortages in specimen collection (like swabs).
Purely speculative, but I think if swabs remain an issue for too long, alternatives could start coming online, such as even using qtips + saline (no idea if it works, it's just an example). The current swab + Universal / Viral Transport Medium combo is optimized for flexibility; it is designed to work across a very broad range of viruses and bacteria that have different viral loads and shedding characteristics. The current pandemic is pretty much COVID-19 only, so I think it's a priori feasible that a specimen collection procedure can be found that uses common materials. We did try early on to see if saline or other buffers affected the performance of the assay, and it worked fine in those conditions.
We use fairly standard chemicals. I haven't heard from our suppliers about shortages for the chemicals we use. Chemicals and enzymes tend to be relatively fast to scale up for bulk manufacturing.
There's always manufacturing risk that a product will not work as expected. In fact, the first COVID-19 test developed by the CDC did not work as expected, and this delayed testing by several weeks. We de-risk this as much as possible by performing experiments as early as possible, akin to the fail fast mentality of checking for the highest risk failure modes first. Since we don't have a national healthcare system in the US, the manufacturer takes on the vast majority of the risk of a defective product.
There are companies out there working on swabs. e.g. Formlabs designed an autoclavable, 3D-printed nasopharyngeal swab using biocompatible Dental resin, in concert with local hospitals. They received FDA Class 1 Exempt status from the CDC and are printing some 150K per day.
I'm not associated with the company, I just own some of their printers. They've also got some 2000+ volunteers who own printers or have CAD expertise signed up and looking for ways to contribute. Apparently we can't make medically-approved swabs (most of us aren't ISO 13485-certified or FDA-registered), but there's other stuff (e.g. hands-free doorknobs). I'm even contemplating shipping one of my printers back to them to help in the effort.
if the virus is known to live on cardboard or plastic for 48-72 hours, is the viral transport medium even necessary, assuming rapid shipping and processing?
Let’s say it’s 50/50 whether it lives 24h without help. That’s would be a pretty bad false negative rate for your test, but a 50/50 of potentially getting infected by your mail is pretty high.
To be more precise, last info I read modeled the virus with exponential decay, with half-life measured in minutes to hours. After an hour (or 3h or 0.5h), half of the virus is already inactive¹.
Even after ~6 half-lives, remaining 1% of viral load is still potentially dangerous, but it's not a good basis for a test if you want it to be sensitive.
¹ Inactive does not mean destroyed. It may be possible or even easier to detect a partially decomposed virus, even with the current tests. Or not.
It's standard practise in lots of kinds of sequencing experiments to use a spike-in. Makes perfect sense to use it here - in fact all the other sequencing based SARS-CoV-2 tests I've seen also use spike-ins.
We have the data both with extraction and without and show that it does not make a difference with qSanger. In Figure 4, we add VTM directly to PCR reactions. Seracare VTM samples has SARS-CoV-2 viral RNA in a different capsid to prevent infections in a research setting, but otherwise it reflects real-world VTM samples (and much more realistic than even what EUA requires).
By the way, this robustness is completely expected, as any impurities in VTM would impact spike-in and endogenous viral amplification equally for end-point PCR (so their ratio stays the. same). This is not necessarily true for qPCR where an impurity (caused by lack of RNA extraction) can potentially cause a positive sample look like a negative when the viral RNA does not RT-PCR.
The RNA extraction can also be done without reagents by using a heat reaction on the sample similar to boiling an egg. It is a 5-minute process.
According to an older scientist, Anders Fomsgaard at the Danish Serum Institute, this is how they did it ”in the old days”. He is the father of one of the authors.
This eliminates supply chain problems for reagents and was shared quickly to help in Spain.
Hi, I read through your paper. Interesting method.
In figure 1A, the workflow includes a standard PCR step before Sanger. Workflow-wise, wouldn't it still have the same bottleneck as qPCR test, i.e. limited by 96/384-well instrument runs?
Thanks; that is a good question. In our laboratory, we have a ratio of 10:1 for PCR to qPCR instruments. In the new laboratory that we are constructing, the ratio is 50:1. It was similar at Stanford academic laboratories during my PhD. Standard PCR instruments are inexpensive and very common. qPCR instruments are definitely not as common, as they are very specialized instruments for a few use-cases.
Most clinical laboratories would have 10 to 50 PCR instruments that they can use to run the initial amplification reaction in parallel before Sanger sequencing. Also, Sanger sequencing uses a plate feeder, so you can add new plates on top as the second round of PCR reactions finish.
But, more importantly, the qSanger can by-pass RNA extraction, which seems to be an important bottleneck in the RT-qPCR workflow.
Thank you for what seems to be a breakthrough with implications beyond even the current outbreak.
I had a few questions though:
How do you compare this test to the Abbott machines? Obviously that test is faster, but how does that impact what we can do with it?
For 1m/day to be sufficient, do we need contact tracing programs to be able to find everybody who needs to be tested? How hard will it be to scale these programs?
The Abbott machines are point-of-care devices that typically sit in doctor's offices. One really interesting use case I've heard of for the Abbott machines is to test all OB patients who are coming in to the clinic for routine care to make sure that they are COVID-19 negative. This allows the clinical staff to conserve PPE and use less burdensome precautions.
I think that where the Abbott machines might hit a wall is that they are one at a time, and they require Abbott's consumable test cartridge and device to run (think printer ink / printer). I don't have any firsthand knowledge, but I would anticipate that it is difficult to scale-up manufacturing of the devices rapidly enough to keep pace with the pandemic growth.
We absolutely need contact tracing to find everybody who needs to be tested. We're not working on scaling up contact tracing, but I think several people in the tech community are working on making that easier to perform at scale.
Abott's test is cartridge based, isothermal and modular. There should be no technical reason why they cannot build a high throughput, random access version.
Whether this is the direction the company wants to spend their resources is another story.
Your company has filed a US patent application (and it appears also a PCT) on the qSanger method, and this application is reference 8 in your document.
If it is specific to COVID-19 testing, we will not seek anything, as long as the end-user is not financially benefiting from it or importantly, selling qSanger kits.
If they need our bioinformatics automation & help with set-up, we would license the method for COVID-19 testing for $3-$5/sample as part of each sample that is being put through our pipeline.
If they ask for 96-well plates with all reagents that are ready to use (so that they just need to add VTM), we would work with manufacturers to produce the reactions and plates, and the price of kit (~$15 per test) would include limited license to use our automated bioinformatics calling pipeline.
When you say 'as long as the end user is not financially benefitting' - is the end user the lab conducting the test?
You said in an earlier comment that the reimbursement for testing is too low to justify buying expensive equipment. You are also proposing to charge half the reimbursed rate for it to run on someone else's equipment.
Are the current equipment owners expected to donate this crucial equipment, because if they are the bottleneck, shouldn't they be the ones compensated to encourage more equipment to be made available?
$15 is half the price of even the bare minimum qPCR kits (e.g., TaqPath). We need to buy the reagents from NEB, IDT, and others and work with a contract manufacturer to mix it into a reaction. Reagent, manufacturing, quality control, and fulfillment cost already add up to ~$11/reaction. That does not take into account any costs associated with developing the assay, supporting the assay, getting it through EUA, customer service, bioinformatics help. And we have to pre-pay for all of the reagent costs in the anticipation of the volume. I anticipate that we will likely end up net negative with this work, and even if it ends up being slightly net positive, it will not impact our valuation in a positive way.
The current equipment owners are already the clinical laboratories. It is unused capacity for them. Other owners are sequencing service providers. The full cost of running an end-to-end Sanger reaction as a provided service is $2-$6, so at the $50 reimbursement price, the laboratories will still be incentivized.
This equipment isn't something a hospital has unless they have a serious desire to do top notch genetic disorder testing, and that kind of hospital is going to use the equipment for this cause. The Abbot machine is something practices with much harder financial constraints have to seriously worry about paying for.
Your company is an inspiration and are doing a tremendous service to the world. I am a federal consultant and have a couple of companies in the medical space that I help find and write solicitations for. I'd love to help your company free of charge on anything and everything. I have alot of infrastructure built such as pricing tools, automated solicitation writing, a ton of outreach lists, etc. One such outreach list is of clinical labratory's nationwide with emails to executives and a ton of other information. I also work with a company that sells reagents and contract kits. I've also worked at Tesla on the Supply Chain, Capital Equipment team in the past and can help you procure any reagents or automated Sanger sequencers/ labs who have them. Additionally, I am fairly adept at SQL and Python (Pandas for data) and why I am a regular on Hacker News and how I found the comment. Let me know if there is anything I can do as I would be very proud to contribute anything to your mission.
Hi David, what about Rosche Cobas 6600/8800 high-throughput diagnostics machines? Are they using the traditional RT-qPCR assay? How can they are able to test 400k samples per week?
Roche 6600/8800 instruments are state-of-the-art automated RT-qPCR machines. We need them as well as other COVID-19 testing instruments.
That said, it is easier to ship the test kits than scale the instrumentation. Both Roche and Abbott still need to build hundreds of their instruments before the kits that they are shipping out this week can be used on the daily rate that they are trying to get to. I am not sure with Roche, but Abbott estimates the end of June to have enough machines shipped to achieve 50K per day capacity on their instruments.
Another potential problem with new instrumentation is that reimbursement for COVID-19 tests is very low ($30-$50), so it becomes financially difficult for hospitals and laboratories to buy very expensive instruments and also pay for test kits that cost $30-$50 per test, on par with reimbursement.
We try to avoid both issues by utilizing a currently unused Sanger install base and low-cost reagents.
> We try to avoid both issues by utilizing a currently unused Sanger install base and low-cost reagents.
Good thought! Thank you.
Additional question: are Sanger instruments also very sophisticated and manufactured by those limited few pharmaceutical companies/medical equipment manufacturers? Or is there a need or necessary to scale up the production for Sanger instruments too? I am asking that is because lack of testing is a global issue, not only in the U.S. Africa/India together has 2 billion people are the testing hurdles there are even more challenging.
Those systems process up to 384 or 1056 samples in ~8 hours. The proposed method should support a similar volume (even a bit higher) than the Roche 8800, but with a different large installed instrument base. So this should add a lot of capacity to that qPCR.
"When you’re fundraising, it’s AI
When you’re hiring, it’s ML
When you’re implementing, it’s linear regression"
The core of our machine learning is Ax=b :grins:
More seriously, the main reason why traditional sanger sequencing can't be used for COVID-19 testing is because it would be unclear whether a lack of signal is truly due to lack of virus, or if it is just because the assay failed (happens all the time!)
What we've done is introduce a reference sequencing signal that is biochemically very similar to viral RNA, but produces a distinct vector of electrical signals that is different from the signals emitted by viral RNA. Since we know what both the reference and viral signals look like, we can perform linear regression analysis to fit the linear combination of viral and reference signals that best match our data.
> Since we know what both the reference and viral signals look like, we can perform linear regression analysis to fit the linear combination of viral and reference signals that best match our data.
Does this mean you assume a linear relationship between the quantity of viral RNA and the strength of the signal?
I know that when back i used to draw calibration curves from my positive controls, there was usually a sublinear relationship, across all sorts of different assays, at least at the upper end.
For HN it's better to drop the fundraising language and use the implementing language, so I've s/machine learning/linear regression/'d your text above.
~~Please change it back. I appreciate ~dvdt's humorous yet candid explanation that they're using the simplest of machine learning techniques, but unless they explicitly say that it is indeed simple linear regression, I think it risks inaccuracy to describe it so specifically.~~
ETA: never mind, ~dvdt signed off on the change. Appreciate the collaboration!
I don't make them sign a contract or anything. It's part of the coaching that we do with YC founders on HN (example here: https://news.ycombinator.com/item?id=22808556). Most of that is at https://news.ycombinator.com/launches. We act as the editors of those posts and work closely with the founders for quite a while beforehand. Today was different, but it's a similar relationship.
I wouldn't make a major edit without letting them know. In the present case, I just let the whole community know, because the edit was a substantial one. Had it been a trivial one, I wouldn't have bothered. The priority is interesting discussion, not punctilious bookkeeping, and you can't necessarily have both.
In the flurry of these threads, the founders are focused on responding to substantive questions. That's what they should be doing: satisfying community curiosity in their area of expertise. It's my job, not theirs, to prevent the discussion from getting derailed by things like the unfortuitous use of a buzzword which they have no clue the community has been tired of hearing for years already.
If the users weren't YC founders that we have an explicit coaching relationship with, I'd always consult with them before offering help. That would happen by email. By the way, if anyone wants this sort of coaching, we're happy to help as much as we can via hn@ycombinator.com. I give this sort of advice to startup founders, project creators, and article authors all the time, and everyone is welcome to it. Just don't worry if you don't hear back right away. The inbox has gotten to the point where it piles up regularly.
I would not love it if he improved my post, so I am glad to have the clarification that this is only done to YC-affiliated commenters and with at a minimum some kind of implicit buy-in from them.
They get basically two superimposed chromatograms from Sanger sequencing, one from the control and one from the patient sample. They need to sort out which chromatogram peak belongs to which and calculate the relative abundance of the sequences in the sample from the relative peak intensities.
The machine learning angle isn't the exciting part here, it's all the rest. Great idea!
I read the paper and I'm still a bit unsure about how the process works. How specific is it to COVID-19?
Given that a Sanger sequences does sequence all kinds of mRNA, would you also find other RNA-based viruses like the flu? How much modification would be necessary to diagnose all kinds of RNA viruses?
It sounds like your work has amazing potential. How far have you come already, and what are your biggest challenges moving forwards? How many tests did you manage to process today?
I would not have even considered trying to make use of the existing Sanger sequencing machines. This opens up a whole extra set of hardware to increase testing bandwidth. Thank you.
All the material on your page talks about throughput but not turnaround time which makes me quite suspicious that it's not very good. Obviously, however, if it takes 2 weeks to do the test then it has very limited utility as the person will have recovered (or had an adverse outcome) anyway. Since it cannot be administered at the point of care I am imagining that you are already up against a day or so of logistics just to get a sample to the testing location.
Can you please comment on the actual realistic turnaround time for the test?
A turn around time of 24-36 hours should be easily achievable for the performing lab, depending on the time of day the specimen arrives (morning vs afternoon). It takes about 1 hour for the initial RT-PCR amplification, and sequencing takes about 12 hours.
BillionToOne | Senior Software Engineer | Full-time | Onsite | Menlo Park, CA
Do you want to develop prenatal diagnostics that can affect the lives of millions of expecting parents? BillionToOne (Y Combinator S17) is looking for a Senior Software Engineer. We transform diagnostics to be truly grounded in quantitative principles and improve resolution of cell-free DNA testing by >1000x fold. As engineer #1 you will work directly with the CTO to build backend infrastructure, bioinformatics data processing pipelines, laboratory automation tools, and web-based tools to communicate genetic results to patients. This is a highly impactful position with the opportunity to own engineering end-to-end from internal prototypes to widely deployed products directly affecting patients.
BillionToOne detects genetic disorders in the baby through a simple blood test of the mother. Our first prenatal test for beta-thalassemia and sickle cell disease is already in clinical trials. Over 100 million people are carriers for these disorders.
We are hiring a senior scientist to lead R&D activities. BillionToOne integrates nucleic acid biochemistry and machine learning to make DNA testing more affordable and accurate. The ideal candidate has DNA wet lab bench experience and is comfortable devising custom analysis pipelines in Python or R.
BillionToOne has developed a DNA molecular counter that increases cfDNA diagnostic resolution by over 1,000x. BillionToOne's first product, UNITY, is the first and only non-invasive prenatal test that directly screens an unborn baby for the most common and severe genetic disorders using only a single tube of blood from the pregnant mother without the invasiveness of amniocentesis.
BillionToOne is ranked at the top 5% of Y Combinator companies and has raised $300M+ in funding from prominent VC firms.
We are hiring a Staff Software Engineer to scale up compute-intensive bioinformatics workflows and build infrastructure tools that enable scientists, bioinformaticians and other technical teams at BillionToOne to robustly write and deploy code.
Tools we use include: python, Django, AWS, Terraform, Sentry, and Datadog.
Apply here: https://boards.greenhouse.io/billiontoone/jobs/4203247005