This came up yesterday[1] but if you'd like to know more, Kathy Kleiman's "Proving Ground: The Untold Story of the Six Women Who Programmed the World's First Modern Computer" includes the point of view of Jean Bartik along with the other ENIAC programmers. https://www.hachettebookgroup.com/titles/kathy-kleiman/provi...
I highly recommend "Proving Ground: The Untold Story of the Six Women Who Programmed the World's First Modern Computer" for more context about how the ENIAC came together. It includes the point of view of the programmers of the ENIAC who are often left out of or diminished in other accounts.
Your links sent me down a rabbit hole of Iot stuff and now I'm super excited about it. Had no idea that lte-m is so widespread.
It's only a matter of time before it's easier and cheaper to get any kind of networked devices onto a carrier network than the users wifi. The implications are kinda scary though: does this mean it's only a matter of time before the majority of personal devices talk to their users with a telco/govement middle man network?
Thank you for the link. Interesting, they used the graph resampling and r-tree approaches already in use for certain kinds of machine learning. (Esp. approximate eigenvalue based classifiers)
Very well written too. Unfortunately the approach described only works for dense countable data that is linearly separable (eta spectral clusters, in fact even stronger assumption) - which still makes it quite useful as a database index.
I'm still a little confused why the story posted the other day fell off the front page so quickly, there was some good discussion and a link to a youtube meta-review that gave a pretty interesting view in to the last few days of the company:
Not to be self-serving, but I've always been fascinated by Therac-25. I ended up doing a deep dive a few months back and put together a short 5ish minute podcast episode about it:
Good call. The dataset was large enough that it didn't feel silly to use something like MapReduce but the same thought has been in the back of my head the whole time.
Thanks for the tip, I didn't think to copy the data to ephemeral storage like that. That'll probably speed things up a lot.
I ended up splitting the data in to a relatively small number (~200) of ~30MB gzipped files in order to initially saturate the mappers and speed things up. If that's not necessary after moving to ephemeral storage that's fine by me!
It's not necessary once your files live on ephemeral storage, but it would be necessary if you want the distcp operation to be fast. But again, the s3 block filesystem will not have this problem.
With 8 instances and the number of files the input was split in to there were definitely both map and reduce tasks waiting for a runner. I don't know exactly how much but I'm pretty sure I'm paying a pretty heavy IO tax by using S3.
[1] https://news.ycombinator.com/item?id=40317375