More

tromobne8vb · on June 11, 2018

This is even more unfortunately named than the Spark Web UI project.

tromobne8vb · on Feb 6, 2017

I found it interesting that part of the vulnerability is that the PRNG takes the time the machine 'spins' as a parameter, thus introducing an attack vector.

lolc · on Feb 6, 2017

I don't know whether it's mandated, but I think that a lot of gamblers would want it that way. They want to be able to influence the game. The outome will still be completely random if the other sources of randomness are good, but the outcome still depended on the gambler's timing.

cxseven · on Feb 7, 2017

There has to be something that advances the PRNG to the next output. What's the alternative, advancing a single step each time a random number is actually needed? That also has (potentially much easier to exploit) vulnerabilities.

tromobne8vb · on Sept 21, 2016

This one is also relevant: https://xkcd.com/793/

tromobne8vb · on July 8, 2016

In a lot of companies, including mine, Senior Developer is close to entry level. People with 3+ years experience are being hired at the Senior Developer level.

dozzie · on July 8, 2016

Indeed. I hope this changes when we have more and more people with ten or fifteen or twenty years of experience.

tromobne8vb · on June 20, 2016

As I've been reading about tensorflow lately I feel like I'm missing something regarding distributed processing. How can Tensorflow 'scale up' easily if you are outside of Google? We have big datasets that I want to run learning on but it seems awkward to do with tensorflow. We're big enough that the team managing our cluster is separate than development and it is a huge pain if we need them to go install tools on each node. Even with Spark support it seems like the tensorflow python libraries need to be set up on each machine in the cluster ahead of time.

Am I missing something?

jamesblonde · on June 20, 2016

No, you're not. Google did this with their build engine (Blaze, internally - Bazel is the open-source API, lacking a distributed build platform). Google are doing this with Apache Beam (the API to Google dataflow) - releasing an API for local testing but not releasing the distributed engine.

If you have your data in a Hadoop cluster and are doing image recognition, Yahoo's Cafe on Spark is the only truly distributed engine out there. It uses MPI to share model state between executors.

agibsonccc · on June 20, 2016

Keep in mind there's different kinds of parallelism though. If you mean model parallel, a lot of shops are doing that via RDMA as well as MPI. It depends on how you handle state though.

There's also data parallelism with parameter averaging which we've been doing in deeplearning4j for the last few years. We also support ALOT more than just images. We have the ETL pipelines (kafka etc) to go with it. Watch for a blog post from us on parallel for all (nvidia's blog) where we explain some of this.

I gave a framework agnostic view of the concepts you should consider when looking at distributed deep learning as well:

http://www.slideshare.net/agibsonccc/brief-introduction-to-d...

netinstructions · on June 20, 2016

Have you read through https://www.tensorflow.org/versions/master/how_tos/distribut... ? Since version 0.8 of TensorFlow they've had a way to do distributed processing.

Their blog post in April mentioned it - https://research.googleblog.com/2016/04/announcing-tensorflo...

That said, I haven't actually attempted any distributed processing, but it looks possible. If anyone has actually tried it and can speak to it I would be curious to what people with experience have to say about it.

tromobne8vb · on June 20, 2016

I have read that. I even re-read it before making my post.

That implementation requires starting individual tasks on each node in your cluster.

>To create a cluster, you start one TensorFlow server per task in the cluster. Each task typically runs on a different machine, but you can run multiple tasks on the same machine (e.g. to control different GPU devices).

I'm used to using tools that can roll out to a cluster with more finesse than that. The Spark wrapper seems to provide some capability to do this automatically, but even the Spark wrapper requires installing python libraries on each node.

bhntr3 · on June 21, 2016

Yeah, I'm trying to figure this out too. TensorFlow needs Yarn support. Ideally, Yarn would allocate resources and inform the processes of the various workers in the graph, etc. etc. I see that as the harder part. If you use mesos, then there is some preliminary support for that. https://github.com/tensorflow/tensorflow/issues/1996

Since TensorFlow has native dependencies on CUDA stuff for GPU support, I don't think there's much of a way to get around installing things on every machine. You might be able to package a python env without CUDA for spark to run using conda. Here's an interesting blog post about that: https://www.continuum.io/blog/developer-blog/conda-spark

But I'm not sure I see the point in running TensorFlow without GPU support. And if you're hoping to run GPU machines on an existing spark cluster and intelligently allocate the GPU stuff to the right machine. . . that's gonna be tough. Here's an interesting talk on that from the last spark summit: https://www.youtube.com/watch?v=k6IOWblLQK8&feature=youtu.be

Ultimately, you're probably better off just running your own gpu cluster strictly for your TensorFlow model on ephemeral AWS spot instances.

Or just use Google Cloud Machine Learning. That's what Google wants and expects you to do anyway. Borg is the Borg. You will be assimilated.

nl · on June 21, 2016

TensorFlow without GPU support is very useful for inference.

nl · on June 21, 2016

I think what is going on here is that what we see as complications are actually features, but that doesn't become clear until you are operating with your NNs in production, at scale.

What you want to be able to do is control which devices (CPUs, GPUs or co-processors[1]) execute which part of your model (eg, GPU for training, co-processors for inference, who knows what else).

Yahoo released some code to deal with similar issues, but with Caffe on YARN[2].

[1] https://cloudplatform.googleblog.com/2016/05/Google-supercha...

[2] http://yahoohadoop.tumblr.com/post/129872361846/large-scale-...

tanlermin · on June 20, 2016

would something like conda or anaconda help?

https://docs.continuum.io/anaconda-cluster/index https://www.continuum.io/blog/developer-blog/getting-most-ou...

tromobne8vb · on April 12, 2016

Storm is a few years late. It has largely been eclipsed by the projects you mentioned and others.

In theory, Storm can be set up to run with much lower latency than the others, but it is rare that an app will need that kind of response time.

fludlight · on April 12, 2016

What's the range of latencies here? What types of projects really need the lowest latencies?

anonymousDan · on April 12, 2016

Financial trading algos where low latency is important were the initial big market for stream processing/cep. As another example, imagine you're trying to do real-time fraud detection based on click stream analysis. If your latency is low enough you can potentially prevent suspicious transactions ever happening instead of having to allow them and then recover them somehow later.

d3fun · on April 12, 2016

Storm is actually few years earlier. Its the first real-time processing framework before other projects showed up.

tromobne8vb · on April 13, 2016

True, it showed up first. And then it stagnated for several years. Now it is behind the other alternatives.

tromobne8vb · on Oct 23, 2015

This always cracks me up. In reality they got the price correct and maximized the money that the company got for doing the IPO in the first place. The problem is that there is an expectation that IPO stocks will 'pop' and allow wall street to line their own pockets. Since they didn't, they consider it a failure.

tromobne8vb · on Sept 1, 2015

>You're going to give me nightmares. I still get emails like this from a six-year-old version of my resume.

The first time I read that, I thought it said "emails like they were from a six-year-old". Which is probably also true.

Kalium · on Sept 1, 2015

The six-year-olds I know have a better grasp of grammar.

tromobne8vb · on Aug 20, 2015

My biggest question is what is a 'close call'? Is it something that a layperson would also agree is a close call? Some of the examples in the article don't sound too scary.

"Whizzed underneath aircraft as it approached a runway" leaves a lot of room for interpretation. Other examples in the article do sound more scary, but is that just cherry picking or typical?

acveilleux · on Aug 20, 2015

Technically, the FAA defines a "near mid air collision" as an incident where there was a possibility of collisions with a proximity of less then 500 feet.

Keep in mind that's an airbus A380 is approx. 240 ft. long, 260 ft. wide and 80 ft. tall.

In that context, 500 ft. is actually quite close.

Dylan16807 · on Aug 21, 2015

Though in the context of drones being limited to 400 feet of altitude, the whole flight could be considered a near miss with the ground...

mannykannot · on Aug 21, 2015

The way some people fly, it is.

tromobne8vb · on April 13, 2015

Oh, I wish it were like this everywhere.

We have Product Owners. And PMs. And Dev managers, and QA managers. It's a giant ball of red tape and regret.

At one point the official 'scrum coach' actually convinced everyone that he had a magic formula for converting points to hours and that he could take the pointed backlog and produce a project plan out of it to produce familiar reports for management.

Everyone says we are doing Scrum, even the agile coaching team. But its really just waterfall done using Rally.