Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is random-forest supervised learning from a set of 4000 historical experiments

(Lots of feature engineering based on domain expertise. This is not end-to-end DL)

Do a smaller set of new experiments to explore a small subset of the solution space.

Retrain the model with these new experiments.

Perform another smaller set of experiments, this time over a more varied sample of the solution space.

Overall, a x10 improvement in predicting the glass property of an un-tested sample (although the entire process is biased toward positive samples)

Conclusion: classical ML still rocks.

I really dont see any reason why this could not have been done 10 or even 20 years ago.



> I really dont see any reason why this could not have been done 10 or even 20 years ago.

The advancements in tooling, infrastructure and accessibility of ML in the last 3 years alone have made the difference. That's seems obvious.

Maybe your point is that the underlying techniques haven't changed, and thus it would have been possible to have made this discovery decades ago. But isn't that true of even the greatest inventions? Much of what's created or discovered is a function of the environment and conditions surrounding it.

In other words, it's not surprising to see a halo effect in other sectors as a result of tech investment in ML.


I agree that it is exactly this. New tooling has made machine learning easier to use. As a result, people with deep domain knowledge but less machine learning expertise are starting to apply ML to the problems they understand that best.

One of the biggest roadblocks to this happening more today is that people don't know how to perform feature engineering to prepare raw data for existing machine learning algorithms. If we could automate this step, it would be a lot easier for subject matter experts to use ML.

For example, I work on an open source python library called featuretools (https://github.com/featuretools/featuretools/) that aims automated feature engineering for relational datasets. We've seen a lot of non-ml people use it make their first machine learning models. We also have demos for people interested in trying it themselves: https://www.featuretools.com/demos.

I expect to see a lot more work in the automated feature engineering space going forward.


That's interesting! So could we see a gui based machine learning for non-programmers becoming a reality soon ?

And how close in performance this could get vs code based solutions ?


Yes, I think so. Featuretools is actually the core of my company's commercial product.

Performance is tricky thing to answer. If you care about machine learning performance such as AUC, RSME, F1, then I think the answer would be 80%-90% of coding. If you care about building a first solution, then I think the automation would be 5-10x better.


+1 for this tool.


Yeah, the grandparent is hung up on the theory vs. application delay.

By the same logic, nothing in modern CMOS logic or its production process requires physics or chemistry of a vintage later than the 1940's to explain, so why did it take us three quarters of a century to get where we are? Because it's hard. Knowing how it works and figuring out how to do it are two different things.


Traditional engineering has been using machine learning for years for condition monitoring... http://infolab.stanford.edu/~maluf/papers/flairs97.pdf


Shameless plug: I published almost the same thing in a very nearby field 2 years ago:

https://www.nature.com/articles/nature17439


Not shameless at all.

Your paper was referenced after all. [17]


That’s a terrific idea. Would this be applicable to drug discovery?


Widely used in drug discovery


> Lots of feature engineering based on domain expertise

Exactly. This is what is required to make machine learning work well.

For most people, this issue with machine learning isn’t that it doesn’t work but that it’s hard to use.

I suspect that if we gave domain experts who often don’t know how to code more power to do feature engineering than we’d see a lot more applied machine learning research like this.


Ultimately, yes, more power means time aka money to pursue a target freely while messing with feature engineering. Brute force a la full DL stack is not there yet for two reasons: on one side, the space domain to search for novel materials is immense; on the other side, novel materials found through ML methods must be stable somewhere in their physics state diagram, synthesizable to be manufactured properly and cheap enough to be worth engineering deployment. The x10 process acceleration (from 20-30 years to 2-3 years) is actually in the space domain search thanks to ML methods working through several thousand experiments like in the linked article, not in the engineering readiness protocol for a candidate novel material from the lab confirmation to the real application. Outsiders can help as well by implementing their own pipeline after collecting their niche-specific datasets through journal papers, conference contributions and meeting minutes. I for example am interested in novel alloys or steels for Gen IV nuclear and now creating my own dataset for a first shot, having got a benchmark already from a known, valdated and successfully deployed material.


>I suspect that if we gave domain experts who often don’t know how to code more power to do feature engineering than we’d see a lot more applied machine learning research like this.

With a lot of talk about high paying AI whiz kids recently I wonder whether it is not much more promising to try to bring basic ML techniques into a really wide field of day-to-day business, given how many small businesses are still completely left out.


Do small businesses have enough data to do something useful with ML/DL ? what ?


https://cloud.google.com/blog/big-data/2016/08/how-a-japanes...

I liked this example very much. A small family business of a handful of people used standard ML to automate their process of classifying cucumbers for their business.

Just imagine how many people we could free from manual labour to seek higher education if even only a fraction of family businesses had a use case like this and every one of those farmers or small shop owners who is bogged down by repetitive classification tasks could free up the time of a family member or two. That must be tens of millions if not more people on the whole planet.


> I really dont see any reason why this could not have been done 10 or even 20 years ago.

Because nobody knew enough about both subjects to build an experiment.

That's the good thing about ML getting popular. The easiest it is to use, more people can try to solve multidisciplinary issues with it.


I'm sure that there is a treasure trove of ready to be applied knowledge spread out over many sciences.

Example: tue release candidate of the newest version of GIMP added a "new" type of smart blurring: symmetric nearest neighbor, which is surprisingly effective. I looked it up: it is a super simple algorithm, original paper was from 1987, yet for some reason the only mention of it that I found outside of the GIMP page describing it was a wiki for "subsurface science", so a specialisation within geology I guess.


It also has a German wiki article, and has had one for 6 years: https://de.wikipedia.org/wiki/Symmetric_Nearest_Neighbour


Which is still odd:why German but not English?


That's not odd. German Wikipedia is one of the largest, it's about even in quality with the English one, and so you'll just s frequently find an article that's only in english as you'll find one that's only in German.


I meant that the paper is originally by English-speaking authors, meaning one would expect it to be more well known in English speaking scientific circles


German science is mostly done in english - so german scientists are typically very fluent.


I agree with the sentiment (nothing new methodically) but have a thought: these methods were in the field of computer science and operations research (maybe). The popularity of ML and data science is taking place in the same 20 yrs that every non-beta science is becoming more quantified. It takes a novel generation of researchers to combine the old with the new. ML's popularity, and ease of entry (in a broad sense, with tools and information easily available) is only helping the spreading.


What is non-beta science?


Sorry, that might be too local: Beta = natural science, alpha = humanities, gamma = social science.

So my point is even humanities and social sciences are becoming more empirical (at least in subfields, and the retort that a lot of statistics got founded in humanities is well taken) and they are using the tools that are popular and widely known.


My guess was “established” science, like something taught in schools.


Thank you so much for mentioning random-forest supervised learning, I did some duckduckgo'ing and came across this [1] and am excited to try it out.

[1] https://labs.genetics.ucla.edu/horvath/RFclustering/RFcluste...


> I really dont see any reason why this could not have been done 10 or even 20 years ago.

"They started with a trove of materials data dating back more than 50 years, including the results of 6,000 experiments that searched for metallic glass. The team combed through the data with advanced machine learning algorithms developed by Wolverton and Logan Ward, a graduate student in Wolverton’s laboratory who served as co-first author of the paper."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: