Hacker News new | past | comments | ask | show | jobs | submit login
Medical Breakthrough in Spinal Cord Injuries Was Made by a Computer Program (fastcoexist.com)
169 points by fahimulhaq on Oct 15, 2015 | hide | past | favorite | 59 comments



The substance of the article is quite interesting, but the headline and premise--that it was a computer program and not humans who found the result--is ridiculous. The computer program did not collate and index the raw data and notes. The computer program did not choose the relevant inputs from the sum of all knowledge. And most importantly, the computer program did not write itself.

Software is a tool that humans create and use, not an entity in itself. Even if you think true AI is near at hand, this article describes nothing of the sort.

Houses are far easier to build with saws, hammers, and nails than by manipulating wood, earth, and metal with our bare hands, but that does mean the tools built the house.


I agree that this wasn't done by the computer (did computers uncover the Higgs Boson?) but I also do not believe humans can take most of the credit: this was the result of a Man Machine System team up—trying to disentangle credit assignment is not a worthwhile activity. Roughly and from a quick reading of a paper thickly frosted with jargon I am unfamiliar with, the method works by creating networks—which highlight key relationships—for visualization by searching for stable clusters in a reduced dimensionality space of the variables.

Humans are there to explore the visualizations, interpret the network structures and understand the clusters and variables. The machines are intelligent too; they do the heavy work of comparing large numbers of points in a high dimensional space, factorization and searching for a way to express the data in a manner that makes it easier to uncover promising research directions and hypotheses.

Scanning this, it seems the most valuable contribution are their network visualization and exploratory tools. I think they should be proud of those and see no need to stretch so mightily to connect this to Stronger AI. As Vinge notes, "I am suggesting that we recognize that in network and interface research there is something as profound (and potential wild) as Artificial Intelligence."

http://www.nature.com/ncomms/2015/151014/ncomms9581/full/nco...


>I agree that this wasn't done by the computer (did computers uncover the Higgs Boson?) but I also do not believe humans can take most of the credit: this was the result of a Man Machine System team up

You realize that they're using software made by a team of mathematicians and software developers, right? If you want to give credit to the software, give credit to the people who wrote the code and discovered the mathematics. This isn't any different than how physicists would use Mathematica.


> And most importantly, the computer program did not write itself.

Perhaps even more importantly, you didn't create yourself either :-)


Whether or not the program wrote itself is the least relevant aspect of its ingenuity.


I think the second big take away is that this was only made possible because the scientists willingly shared their "dark data"-- data and lab notes from failed experiments. I wonder how much data is hoarded privately and never opened up and analyzed like this.


Frequently a lot of this hoarded data is flawed or defective due to improper setup or execution of the experiment. That isn't to say the information in this "dark data" is useless, but it needs to be taken in context. The cleanest data with the best results are put forward into a paper; the chaff is not.


It is also possible that the data is not understood and therefor thought to be flawed. The article talks about how the black box approach removes human bias from the initial findings.


Shouldn't this be noted? Isn't providing the "best" data risking making data fit your hypothesis?


Not if your dark data is "I forgot to autoclave an instrument and contaminated my samples." In that case without an unexpected positive result it's just error and not worth reporting.


biology is not my forte but- i imagine "dark data" (with context) is always more valuable than no data.


That's simply not true. One of the most, if not the most, common ways to fail an experiment is through contamination and there are at least a dozen different types of bacteria in the average lab that are brutally efficient at outcompeting whatever is in your sample and probably thousands more that are problematic at best. Once your sample is contaminated it is useless because the number of variables out of your control grow several orders of magnitude in an already poorly bounded experiment.

Even if you have the best biosafety hood with proper airflow to pull things away from your samples, even the simple mistake of taking off your gloves in the hood or wafting your hands over petri dishes is enough for some skin cells carrying bacteria to wipe out an entire experiment.


gotchya. i'm understanding the argument now. which was probably my gross misuse of the word "always".

i'm seeing there are many more instances than i expected where failures are so systematic or negligible they are not worth sharing.


That failure should still be documented somewhere.


They are, in millions of lab notebooks around the world that will never see the light of day, and for good reason. There are so many more experiments that end with the unqualified final note "samples contaminated" than successful ones that if biologists spent time tracking down the source or even the type of the contamination, we probably still wouldn't have modern medicine.


Is there any kind of survey of all failed experiments and the causes? What are the numbers? What percentage of experiments fails? How can we be sure the successful trials weren't random if the failures aren't reported in any way?


You seem to be conflating all possible modes of failure under the simplistic designation "failed experiment" and setting up impossible standards for record keeping. The clinical trial equivalent of sample contamination is a patient getting hit by a bus minutes after they receive the first treatment of a trial. Sure if its a psychoactive drug you would investigate if it contributed and you can include this tiny little blip of data in the thousands of other pages you give to the FDA but what's the point? The trial was ruined by an unpredictable act of nature and your resources can be much better spent focusing on the other patients than investigating if the driver was intoxicated or if the hospital needs more stop signs, which are entirely irrelevant to whether or not your drug works.

I am by no means advocating that well thought out and executed experiments that fail to provide evidence for the experimenter's hypothesis should be locked in a dusty file cabinet forever closed to study, but those are few compared to the total number of experiments that ended due to clumsiness, sleep deprivation, or too many undergrads in the lab. Science is all just human error, through and through.


Last time your build failed because you made a typo, did you package it up and release that version anyway?


Except in this case your build system is configured in such a way that a failed build triggers bundling your /usr/bin as a release. The result is mostly the same everywhere, with slight differences per programmer, and is utterly worthless for anyone.


This could be useful information to someone learning not to make typos!

Or it could not. Probably that one.


a paper is not a black box with inputs and outputs.

would i release a package? no. would i release the source code? why not?

code isn't irrelevant or useless just because it can't build. this is why we can teach code on a whiteboard.


Bad analogy. There's a lot more to learn from a failed medical experiment than a failed build.


Depends on whether the lessons have already been learned or not.

For stuff like "we fucked up our culture, therefore our cells couldn't do whatever we wanted" is well understood.

In clinical trials where some patients may respond to a treatment and others not, there's definitely a lot more to learn there, if you have a large enough data set and a plurality of controls.


Wrong data is much much worse that no data, since it may lead you down the wrong path. Think what the false news of a Russian nuclear strike in US soil would have done during the Cold War.


"could" my ass! Think what was literally moments away from happening when Russia actually got false news of a US nuclear strike.

https://en.wikipedia.org/wiki/Stanislav_Petrov

We were one sane man away from world war 3.


this is why i said, "with context".

you'd have to be a little stupid to head down the wrong path after someone explained clearly to you why it is in fact wrong.


A recent interesting commentary in nature suggests researchers should "blind" themselves to their data, and instead analyse a similar but altered data set. When they are happy with their anaysis, the steps are then applied to the real data. The aim is to prevent confirmation bias.

http://www.nature.com/news/blind-analysis-hide-results-to-se...


A lot of data is hoarded because scientists often have to compete for limited research funding (at least from what I've seen in the US).


next up: "let's depend funding on how much data is shared and other people use it"

after that: "researchers overshare data and use each others data for no reason but to bump numbers"

it's interesting how every system tends to sooner or later be gamified by its players


I was looking for an SMBC strip that captures this (there's got to be one) but this is the closest I could get: http://www.smbc-comics.com/index.php?id=1624


I'm, unfortunately, very familiar with the LPU (Least Publishable Unit) after having been to grad school. It's actually a thing, especially among pre-tenure people.


What metrics you decide to keep are ultimately what you will try to optimize too. We are what we measure.


Example from the programming world: github profiles are used as a hiring tool, programmers start dumping thousands of undocumented useless "projects" into github.


occasionally, I wonder how much more advanced our world would be if scientific and government data were simply available as a default choice. then I smile at my own optimistic viewpoint and come back to reality.


Here is the actual paper: http://www.nature.com/ncomms/2015/151014/ncomms9581/full/nco...

Coming from medicine I think the title of "medical breakthrough" is too generous. It's a great proof of concept but all this says is that in rats, high Bp in thoracic spinal cord injuries was associated with worse outcomes. Id like to see a follow up on human data from perioperative Bp recordings next. If it still holds true, then you can research whether an intervention in Bp control makes a difference. I'm not a neurosurgeon but I'm sure the correlation btw Bp and sci outcomes has been looked at before


Preliminary evidence has been found in humans, looking at the other end of the spectrum with hypotension.

http://online.liebertpub.com/doi/10.1089/neu.2014.3778

We are now looking at the hypertension relationship in humans, as well as mechanistic studies in rats.


The Nature paper uses the phrase "data-driven hypothesis generation," which sounds pretty accurate and consistent with what you're saying.


"The process was outlined in a paper published today in Nature, and hints at the possibility of medical breakthroughs lurking in the data of failed experiments."

If there was some way to make sense of data from negative-results experiments reliably, it would be absolutely revolutionary and certainly turn our ideas about what constitutes a successful experiment onto its head. I am very hopeful for fruitful results from the methods outlined in this article.

I worry about the usage of "failed" experimental data here, though. I've "failed" a lot of experiments for reasons other than not finding the effect I was looking for in my data. Any exploration of data from negative-results experiments needs to be taken very narrowly, with a deep understanding of exactly what effect is being examined.

Experiments are frequently designed incorrectly for studying the effect they want, and are almost always not suitably controlled for examining non-primary effects. Try to find a trend throughout non-primary effects over a large swath of experiments, and I'm sure you will-- but it may be noise.


> Any exploration of data from negative-results experiments needs to be taken very narrowly, with a deep understanding of exactly what effect is being examined.

I disagree. There's value in data mining previous experiments, just not conclusive value. As long as the results of such data mining are limited to generating new hypotheses (which are then tested by experiments explicitly designed to do so), I think this methodology can have great value. In this particular case, I don't think it's a surprise that perioperative hypertension is associated with worse outcomes, but the hypothesis that controlling BP with medication before surgery and on through recovery might produce better outcomes is worth investigating.


I'm a frontend developer at Ayasdi and we're hiring!!!

http://www.ayasdi.com/company/careers/

Come be a part of future breakthroughs.


I can't see your opportunities because the iframe embedded is having too many redirects. Just a heads up. :/


Topological data analysis is amazing, too bad all the hype around DL is leaving it in relative obscurity.


Hey HN folks - I am the co-founder and CEO of Ayasdi. If you have questions about the math/CS aspects of this, happy to answer.


Do you recommend any good primers on topology? I thought this (https://colah.github.io/posts/2014-03-NN-Manifolds-Topology/) was an interesting article and I see what looks like some great papers and videos available at http://www.ayasdi.com/approach/data-scientist/, but I don't know the difference between homotopy and homology (yet) :) .

What kinds of infrastructure/tech do you think will have the most utility for topological data analysis in the near future? E.g., GPUs, Apache Spark, FPGAs, etc.

Any thoughts on an Ayasdi public offering? I'd like to consider investing but I don't have millions of dollars (yet) :) .

Thanks for your time.


Hey,

Some reading material: A very general blog about philosophy : http://radar.oreilly.com/2015/07/data-has-a-shape.html

		A slightly more in-depth blog : https://shapeofdata.wordpress.com/2013/08/27/mapper-and-the-choice-of-scale/

		A very accessible book about topology (especially from an algorithms perspective) : http://www.amazon.com/Computing-Cambridge-Monographs-Computational-Mathematics/dp/0521136091/ref=sr_1_1?ie=UTF8&qid=1444971634&sr=8-1&keywords=topology+for+computing

		Blog exposing persistent homology : https://normaldeviate.wordpress.com/2012/07/01/topological-data-analysis/

		Videos exposing persistent homology : 
			https://www.youtube.com/watch?v=CKfUzmznd9g
			https://www.youtube.com/watch?v=CKfUzmznd9g

	Some free software:
		Python Mapper by Daniel Müllner : http://danifold.net/mapper/index.html

		JPlex library by Harlan Sexton : http://www.math.colostate.edu/~adams/jplex/index.html

		Dionysus by Dimitriy Morozov : http://www.mrzv.org/software/dionysus/

		Topological Data Analysis in R : https://cran.r-project.org/web/packages/TDA/vignettes/article.pdf

	Infrastructure
		Our tech stack is:
			Backend
				HDFS for storage
				Our ML and Math code is hand-rolled C++ and Assembly(7% LOC)
				All coordination/distributed systems code is in Java
				ZMQ for communication
				Protocol Buffers for protocol
			Frontend
				D3
				Backbone
				Hand-rolled webGL graph visualization (we open sourced it at https://github.com/ayasdi/grapher)

		We currently don't use GPUs or any other fancy hardware primarily because today, our customers use commodity hardware and getting F1000 companies to buy cutting-edge hardware is just plain horrible.

		We have an awesome GPU rig at our offices that we test algorithms on and it can really make our algorithms scream, but again, none of our customers have/are willing to invest in GPUs.

		Apache Spark - it is interesting that in our experience, making it work for ML algorithms is really too much work unless you invest the time to understand the framework and its fundamentals. It performs very well for ETL type tasks, which is what we use it for.

	On a public offering: no comment :)

	If you have more questions - I am easy to find :)
Gurjeet


I'd love to read a couple of journal articles that you recommend to learn about TDA. I do large scale data analysis on health care data at my university and am always on the look-out for interesting techniques.


Gunnar wrote a review article a few years ago called Topology and Data (http://www.ams.org/journals/bull/2009-46-02/S0273-0979-09-01...). It is an amazingly well written and accessible paper for a technical audience.

Pair it with Afra's book (http://www.amazon.com/Computing-Cambridge-Monographs-Computa...)


Thank you!


Hidden relationship mining has taken a few different paths from TDA to LDA graphical modelling (Michael Jordan, David Blei) to Vector Space driven (Berkeley Lab). Extracting hidden relationships in datasets and using these to form new hypothesis and enable or even make new discoveries is certainly the future...

Lawrence Berkeley National Laboratory vector space for hidden relationships: http://newscenter.lbl.gov/news-releases/2008/07/09/berkeley-...

A Search Engine that Thinks http://newscenter.lbl.gov/feature-stories/2005/03/31/a-searc...

Statistical modeling of biomedical corpora: mining the Caenorhabditis Genetic Center Bibliography for genes related to life span - Blei DM1, Franks K, Jordan MI, Mian IS. - http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1533868


One immediate question is whether this might be a result of overfitting -- when enough hypotheses are tested, some will surely be confirmed at any given significance level. Still quite interesting of course; now what is needed is a follow-up study on other datasets (preferrably human), or an experiment to confirm. A better (but less exciting) title would be "A computer program suggests a promising avenue of research".


So my main take away is that scientist should publish their original data. Frankly, I'm amazed they don't already!


Things like this why I feel something like google deepmind can be a game changer for the sciences if the research data of all human research data was available to them. They might never reach the point of true AI but they would still beat all humans and finding relationships between data humans cant even remember.


"The process was outlined in a paper published today in Nature Communications". NO LINK TO THE PAPER ARGGGHH!!!!!


Could someone please offer a TLDR; The article seems very clickbait and full of teases.


This is the breakthrough: In the case of the spinal cord injury data, Ayasdi’s TDA-driven approach mostly confirmed what researchers already knew: The drugs didn’t work. But the discovery of high blood pressure’s detrimental effects on long-term recovery has immediate implications for human patients, namely whether the use of hypertension drugs immediately after their injuries and before surgery could improve outcomes,


From the article: "In the case of the spinal cord injury data, Ayasdi’s TDA-driven approach mostly confirmed what researchers already knew: The drugs didn’t work."

How this "Ayasdi" company's analysis probably works (based on "Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival" and the original "Mapper" paper "Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition"): They take point cloud data and connect each point with its neighbors (the distance metric that is used is probably domain-specific) to build a proximity graph that approximates a simplicial complex. As input to their algorithm, they also have one or more scalar functions defined on the point cloud data that contain information which is relative to the problem at hand. For example, each point could be a gene, and maybe the scalar function value at that gene could be probability of association with some disease, and the distance between two genes might be the Levenshtein distance between their genetic codes.

With data in this form, they approximate the Reeb graph of one of the scalar functions, which is a sort of "data skeleton." They can do potentially interesting/useful things with it.

The approximation of the Reeb graph reveals zero-cycles (connected components of the simplicial complex) and some one-cycles (handles/tunnels in the graph, sort of like holes in a donut). This "skeleton" of the data allows them to do a variety of things, such as segment the data into components that are (approximately) topologically "simple" (they do not contain any 1-cycles), identify local maxima/minima, find saddle points where forks in the data merge together, and locate "essential saddles" which constitute the high points and low points of handles/tunnels. They can also remove "topological noise", which helps them to separate spurious topological thingies from features that might be important.

Their technique doesn't necessarily recover "true" topological information since a lot of what they do is approximate. There are actually more accurate techniques (e.g., simplicial homology, or fast Reeb graph algorithms) for getting an exact answer, albeit with potentially higher computational cost.

Topological data analysis is a big field, and this Ayasdi company appears to mainly use this one approach (but I could be wrong). I think they are trying to lay claim to the term "topological data analysis" and get people with money excited about it.


One quick edit to this description : We (Ayasdi) have generalized the notion of Reeb Graph's - such that it is no longer limited to single scalar functions. While in the single scalar function the mapper algorithm is an (extremely efficient) approximation to the Reeb Graph, in the multiple scalar function case, it has no direct theoretical analogue (although the notion of Reeb Spaces is similar).

We are generally not trying to lay claim to the phrase "Topological Data Analysis" and not going around suing people for using it. In fact we still support research in academia and actively publish in the field. TDA is the basis of what we do so it is the most efficient way of describing it.


Auto Summarized Content (Algorithm: Tuatara GS1)

Before Ferguson had thought to use it for probing spinal cord injuries, Carlsson and others researchers had successfully employed TDA to find a unique mutation in breast cancers hiding in data sets that had been publicly available for more than a decade...\" In the case of the spinal cord injury data, Ayasdi 's TDA-driven approach mostly confirmed what researchers already knew: The drugs didn 't work...

Auto Extracted Ranked Tags (Algorithm: Tuatara GS1)

ferguson, injury, spinal, cord, ayasdi, variable, paper, researcher, tda, study, human, team, drug, long-term, carlsson, approach, trial, published, probing, hidden, big, implication, pattern, complex, pressure, analysi, hypertension, experiment, conducted, medical, colleague

http://52.11.1.7/TuataraSum/app/tuatarasum


made possible




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: