Yes, I have used AutoKeras in practice, with mixed results. I have also written in-house hyperparameter search tooling to spread parametric architecture search in a distributed training environment with about the same mixed success. I have done this for both large-scale image processing networks and natural language processing networks.
Using AutoML in practice is beyond foolish, given the pricing, except for a really small minority of customers. Let alone that neural architecture search is not a silver bullet and frequently is totally not helpful for model selection (for example, say your trade-off space involves severe penalty on runtime and you have a constraint that your deployed runtime system must be CPU-only.. you may trade performance for the sake of reducing convolutional layers, in a super ad hoc business-driven way that does not translate to any type of objective function for NAS libraries to optimize... one of the most important production systems I currently work on has exactly this type of constraint).
Interesting. I agree, it is not trivial to estimate the runtime of the model on a target device. I wonder how Google does it. They've been boasting about precisely this ability - to optimize for architecture under constraints of precision AND runtime for a target device. And then, claiming that they've been able to get an architecture better than one optimized by a team of engineers over a few years.
It’s all hype coming out of Google. Most of this stuff is meant for foisting overpriced solutions onto unwitting GCP customers who get burnt by vendor lock-in and don’t have enough in-house expertise to vet claims about e.g. overpriced TPUs or overpriced AutoML.
Using AutoML in practice is beyond foolish, given the pricing, except for a really small minority of customers. Let alone that neural architecture search is not a silver bullet and frequently is totally not helpful for model selection (for example, say your trade-off space involves severe penalty on runtime and you have a constraint that your deployed runtime system must be CPU-only.. you may trade performance for the sake of reducing convolutional layers, in a super ad hoc business-driven way that does not translate to any type of objective function for NAS libraries to optimize... one of the most important production systems I currently work on has exactly this type of constraint).