> I really dont see any reason why this could not have been done 10 or even 20 years ago.
The advancements in tooling, infrastructure and accessibility of ML in the last 3 years alone have made the difference. That's seems obvious.
Maybe your point is that the underlying techniques haven't changed, and thus it would have been possible to have made this discovery decades ago. But isn't that true of even the greatest inventions? Much of what's created or discovered is a function of the environment and conditions surrounding it.
In other words, it's not surprising to see a halo effect in other sectors as a result of tech investment in ML.
I agree that it is exactly this. New tooling has made machine learning easier to use. As a result, people with deep domain knowledge but less machine learning expertise are starting to apply ML to the problems they understand that best.
One of the biggest roadblocks to this happening more today is that people don't know how to perform feature engineering to prepare raw data for existing machine learning algorithms. If we could automate this step, it would be a lot easier for subject matter experts to use ML.
For example, I work on an open source python library called featuretools (https://github.com/featuretools/featuretools/) that aims automated feature engineering for relational datasets. We've seen a lot of non-ml people use it make their first machine learning models. We also have demos for people interested in trying it themselves: https://www.featuretools.com/demos.
I expect to see a lot more work in the automated feature engineering space going forward.
Yes, I think so. Featuretools is actually the core of my company's commercial product.
Performance is tricky thing to answer. If you care about machine learning performance such as AUC, RSME, F1, then I think the answer would be 80%-90% of coding. If you care about building a first solution, then I think the automation would be 5-10x better.
Yeah, the grandparent is hung up on the theory vs. application delay.
By the same logic, nothing in modern CMOS logic or its production process requires physics or chemistry of a vintage later than the 1940's to explain, so why did it take us three quarters of a century to get where we are? Because it's hard. Knowing how it works and figuring out how to do it are two different things.
> Lots of feature engineering based on domain expertise
Exactly. This is what is required to make machine learning work well.
For most people, this issue with machine learning isn’t that it doesn’t work but that it’s hard to use.
I suspect that if we gave domain experts who often don’t know how to code more power to do feature engineering than we’d see a lot more applied machine learning research like this.
Ultimately, yes, more power means time aka money to pursue a target freely while messing with feature engineering. Brute force a la full DL stack is not there yet for two reasons: on one side, the space domain to search for novel materials is immense; on the other side, novel materials found through ML methods must be stable somewhere in their physics state diagram, synthesizable to be manufactured properly and cheap enough to be worth engineering deployment. The x10 process acceleration (from 20-30 years to 2-3 years) is actually in the space domain search thanks to ML methods working through several thousand experiments like in the linked article, not in the engineering readiness protocol for a candidate novel material from the lab confirmation to the real application. Outsiders can help as well by implementing their own pipeline after collecting their niche-specific datasets through journal papers, conference contributions and meeting minutes. I for example am interested in novel alloys or steels for Gen IV nuclear and now creating my own dataset for a first shot, having got a benchmark already from a known, valdated and successfully deployed material.
>I suspect that if we gave domain experts who often don’t know how to code more power to do feature engineering than we’d see a lot more applied machine learning research like this.
With a lot of talk about high paying AI whiz kids recently I wonder whether it is not much more promising to try to bring basic ML techniques into a really wide field of day-to-day business, given how many small businesses are still completely left out.
I liked this example very much. A small family business of a handful of people used standard ML to automate their process of classifying cucumbers for their business.
Just imagine how many people we could free from manual labour to seek higher education if even only a fraction of family businesses had a use case like this and every one of those farmers or small shop owners who is bogged down by repetitive classification tasks could free up the time of a family member or two. That must be tens of millions if not more people on the whole planet.
I'm sure that there is a treasure trove of ready to be applied knowledge spread out over many sciences.
Example: tue release candidate of the newest version of GIMP added a "new" type of smart blurring: symmetric nearest neighbor, which is surprisingly effective. I looked it up: it is a super simple algorithm, original paper was from 1987, yet for some reason the only mention of it that I found outside of the GIMP page describing it was a wiki for "subsurface science", so a specialisation within geology I guess.
That's not odd. German Wikipedia is one of the largest, it's about even in quality with the English one, and so you'll just s frequently find an article that's only in english as you'll find one that's only in German.
I meant that the paper is originally by English-speaking authors, meaning one would expect it to be more well known in English speaking scientific circles
I agree with the sentiment (nothing new methodically) but have a thought: these methods were in the field of computer science and operations research (maybe). The popularity of ML and data science is taking place in the same 20 yrs that every non-beta science is becoming more quantified. It takes a novel generation of researchers to combine the old with the new. ML's popularity, and ease of entry (in a broad sense, with tools and information easily available) is only helping the spreading.
Sorry, that might be too local: Beta = natural science, alpha = humanities, gamma = social science.
So my point is even humanities and social sciences are becoming more empirical (at least in subfields, and the retort that a lot of statistics got founded in humanities is well taken) and they are using the tools that are popular and widely known.
> I really dont see any reason why this could not have been done 10 or even 20 years ago.
"They started with a trove of materials data dating back more than 50 years, including the results of 6,000 experiments that searched for metallic glass. The team combed through the data with advanced machine learning algorithms developed by Wolverton and Logan Ward, a graduate student in Wolverton’s laboratory who served as co-first author of the paper."
(Lots of feature engineering based on domain expertise. This is not end-to-end DL)
Do a smaller set of new experiments to explore a small subset of the solution space.
Retrain the model with these new experiments.
Perform another smaller set of experiments, this time over a more varied sample of the solution space.
Overall, a x10 improvement in predicting the glass property of an un-tested sample (although the entire process is biased toward positive samples)
Conclusion: classical ML still rocks.
I really dont see any reason why this could not have been done 10 or even 20 years ago.