OK, did a quick scan on free and paid tiers and wondering about ability to bring...

mark_l_watson · on Oct 13, 2019

You linked to the OpenCog linkparser. Just out of curiosity, what are you using it for?

I used to also keep a beefy EC2 or GCP instance all set up with Jupyter, custom kernels (Common Lisp, etc.), etc., etc. Being able to start and stop them quickly made it really cost effective.

I ended up, though, getting a System76 GPU laptop a year ago and it is so nice to be able to experiment more quickly, especially noticable improvement for quick experiments.

That said, a custom EC2 or GCP instance all set up, and easily start/stop-able is probably the cost effective thing to do unless just vanilla colab environments do it for you.

MarkMMullin · on Oct 14, 2019

I just hate n-grams is the short answer - tear apart the input with link, use arc and modifier info as input to the statistical processes - so far, too early to say it works better, but sure as hell its more interesting . :-) . Sadly I have to keep my instances running all the time, they eat all of the public Reuters news feeds - side note - additional info can be obtained by taking the top N parse candidates for a given sentence and collapsing them into a graph - I have a pretty good signal out of that for 'the parser has gone insane'

DTE · on Oct 13, 2019

Yes, one of the differences in how we handle notebooks is that everything is actually run in a container behind the scenes. This means that is isn't just the .ipynb that we are hosting and if you install and dependencies, libraries, etc it will actually persist all of it inside of a container. This makes it much easier to share your work with others so that i.e. I could fork your notebook if you made it public and get all of the installed libraries and compiled dependencies by default. Hope that helps!

Edit: we also have another tool called GradientCI (https://docs.paperspace.com/gradient/projects/gradientci) that might also be of interest. Basically it lets you connect a GitHub repo directly to a project and you can use it to build your container automatically.

MarkMMullin · on Oct 13, 2019

Gotcha - yah, I can containerize, it's not like I'm screwing with drivers or whatnot on the AMI - I'll keep an eye on you, not ready for GPU yet, as I can't even rationally define the vectors I'm extracting from the Link stuff. Best of luck to you, lot of people piling into that battleground, and the Dunning-Kreuger effect is rampant. :-)

Side question, if you're willing to entertain it. Tired me tells a notebook to checkpoint, wanders off to bed, comes back next day and wonders why its taking an aeon to open the notebook..... oh yeah, damn, I've got gigs of images and panda crap in it . -- do you wrangle this problem, i.e. please don't save your notebooks as gihugic files representing a mental state you can't possibly remember ?

DTE · on Oct 13, 2019

I should also mention you can just as easily run these on CPU-backed instances as well. The GPU is not a hard requirement.

As for checkpointing data, that is still a relatively difficult problem to solve and our current recommendation is to use a combination of the persitent /storage directory and the notebook home directory. There are definitely issues with doing 100K+ of small files and committing those to the primary docker layer.

When you get to testing it out don't hesitate to reach out to use and we can try to see what the best solution is for your particular project. To date there isn't a "one size fits all" solution but we are working hard on making more intelligent choices behind the scenes to unblock some of these IO constraints.