Hacker News new | past | comments | ask | show | jobs | submit login
How Powerful Are Microsoft Azure’s Free Jupyter Notebooks? (walkingrandomly.com)
112 points by ptoniato on May 21, 2017 | hide | past | favorite | 20 comments



Nice to see this on HN! For folks that don't know what this is... it's a free hosted Jupyter service, provided as a "thank you" to the Python, Jupyter, and general OSS community by the Microsoft Python team.

Regarding the perf #'s - just a heads up that you were probably on that VM all by yourself :). Though once enough people sign in or cpu threshold gets to a certain level, new VMs are allocated.

PS On a related note, we are wrapping up at PyCon in Portland, and one of the hit swags was the "Jupyter Notebook Notebook" :). See pics here:

https://goo.gl/photos/L9C4fq6AsPxU7bfq5

/disclaimer: team lead/


The benchmark here was probably calling into Intel's MKL, which automatically determines the number of cores to use based on matrix size and CPU type.

How well does it behave when you've multiple VMs running on the same hardware?

Also, is it possible to setup prebuilt environments on your platform?


Hi -

* It's possible - Anaconda comes with an MKL enhanced version of its math libs.

* There are multiple VMs, and each VMs hold multiple docker containers, one for each Library/user (library == collection of notebooks)

* Right now, no, but we are working on it. You can have a "prep" notebook where it readies your environment with !pip, !wget, ... etc. and then actual work notebooks. We'll soon have an initial "install.sh" that will be run upon start to run any prep steps you might have.

thanks!


Sort of off-topic, but for me Jupyter is 90% of the way there to dominating my data analysis / ML work-flow, but that last 10% kills it for me:

- Weird behavior when disconnecting / reconnecting to sessions, especially from multiple computers.

- Tendency to flake out on long running jobs, i.e. 2 hours of the way through a 4 hour algorithm something dies and I have to restart or run from terminal.

Unfortunately this relegates it to exploratory viz for me, but maybe that's the intended use case anyway. But when I've wanted to build semi-persistent dashboards or check in on running jobs I've had better luck with ssh+screen and then dumping pdfs of results with matplotlib to files that I serve from a webserver with a little auto-refresh javascript wrapper.


What versions are you running? Jupyter or at least its predecessor IPython notebook have always been rock solid for me, running for days and days.

That said, I always made sure to save before disconnect and refresh on reconnect.


This is so true. I love jupyter notebooks, but running a program that will run for 5 days is a nightmare, you don't know when the kernel will just crash. Also, if there are lots of constant output/logging, you have to make sure that it comes on the terminal, if it is in the notebook itself, it will just freeze.


Jupyter Notebooks are great since you can have put your experimental data and share it in an interactive format. They have the potential of becoming the de facto standard for scientific collaboration.

But what if you add a library where you can only load data from Azure ML Studio? then you cannot share your notebook anymore. Your notebook got tainted with proprietary stuff from a specific vendor...

Science is about being able to universally reproduce experiments, and vendor lock-in prevents that. We already have enough problems in scientific publishing with journals.

So if you like Jupyter, and you want it to become the standard science needs, avoid proprietary extensions. Let's not go back to share stuff in paper or its modern equivalent, PDFs.


Your concern is valid. One easy way to avoid depending on a particular cloud's APIs is to either wrap them with your own, or just grab data from other neutral locations, such as github using !wget, requests, etc. There are pluses/minuses to each approach.


You can actually run shell commands with '!' in the ipython notebooks on Azure and do a dmesg or lscpu.


Also, click on the Jupyter logo, then Select New, then Terminal, and you get a full bash prompt into your instance.


I wonder how Microsoft treats the GPL3 licensing of ZeroMQ (used by Jupyter for client-server communication)?

In my workplace, Jupyter needs special approval and can only be installed in limited environments because the company is afraid some developer will inadvertently do something that contaminates our code with GPL3.


Pure FUD. Your workplace attorneys should talk to the FSF; it's vanishingly unlikely that you would end up accidentally licensing anything with the GPL. Besides, the GPL provisions only take effect on distribution /anyway/.


Why would the GPL apply to works created with Jupyter? I understand if you're redistributing a commercial version of it, but using it would be akin to using GCC no?


My understanding is that works created with Jupyter wouldn't have a problem, but that the admins are just trying to minimize the footprint of GPL 3 libraries on our systems. It seems pretty knee-jerk to me, but then I'm not an expert.


In a technical audit during acquisition, for example, any use of GPL warrants extra scrutiny. Even when you're complying it is an extra bunch of work, often with lawyers reviewing, so many people would rather avoid that if possible.


/team lead/

IANAL - but note that we also have Microsoft R (and enhanced version of CRAN R) which provides local multi-threading, and cluster level parallelization + distributed memory support. It and the stock R interpreter have lots of GPL code. And given that both R & Python are now integrated with SQL Server, it seems that the lawyers have become comfortable regarding the separation lines. SQL Server for example calls an external script for R/Python.

https://www.microsoft.com/en-us/sql-server/sql-server-r-serv...

https://blogs.technet.microsoft.com/dataplatforminsider/2017...

Thanks!


ZeroMQ is LGPL, not GPL. http://zeromq.org/area:licensing


Good catch! I misremembered. Still the same reasoning from management, unfortunately.



Thanks! Figured there was too much traffic when the main link was unavailable a few min earlier. Was wondering whether it was a direct view of a notebook like via nbviewer (in which case, would be ironic.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: