Run Any JavaScript Function in the Cloud

spoiler · on Jan 13, 2015

I find it bizarre how the code examples provided on github use λ. I know JavaScript allows this, but I can't imagine myself switching to a greek layout to type λ. Otherwise, I have to copy and paste it, which adds a lot of overhead.

I also remember someone suggested a ±/∓ signs for CoffeeScript at one point, and they justified it as "Well, I can type it using <insert fancy keyboard combination> on my Mac."

I know this is slightly off-topic, but I had to point it out because I feel like it stabbed me in the eye.

Anyway, I do like the idea of Lambdaws, kudos to the author.

TeMPOraL · on Jan 13, 2015

I'm pretty sure it is meant only for aesthetics and readability of the demo.

Speaking of readability, I actually use a package for my editor that renders word "lambda" as λ. This way, I get to see code that's more pleasant to read and don't have to type in (and save) any greek characters.

cturhan · on Jan 13, 2015

Which editor?

TeMPOraL · on Jan 13, 2015

Emacs.

There are various code snippets for that floating around, e.g. http://www.emacswiki.org/emacs/pretty-lambdada.el.

josteink · on Jan 13, 2015

> I find it bizarre how the code examples provided on github use λ. I know JavaScript allows this, but I can't imagine myself switching to a greek layout to type λ

Like someone else mentioned you can use the compose-key, or if you use an editor like Emacs, there are commands to insert Unicode symbols. If you're a lisper, you might also have a macro for inserting those :)

seabee · on Jan 13, 2015

DrRacket (the default IDE for the Racket language) has a keyboard shortcut specifically to insert this symbol. I use it a lot!

witty_username · on Jan 13, 2015

You can type it on Linux too with XCompose.

onli · on Jan 13, 2015

You can type it everywhere, somehow with something. The point is that it is needlessly complicated.

_delirium · on Jan 13, 2015

A reasonable number of vim users use the RFC 1345 digraphs, since they have a default key binding (ctrl+k followed by the digraph). That makes it easy to insert many of the more common Unicode characters. Lambda's digraph is l*. Emacs also supports inserting Unicode characters using either their RFC 1345 digraph or their TeX symbol.

je42 · on Jan 13, 2015

MMh. Using toString to serialize a JS function breaks for some functions doesn't it:

https://github.com/mentum/lambdaws/blob/master/lib/LambdaHel...

It seems here is a discussion about the topic: http://perfectionkills.com/state-of-function-decompilation-i...

_Marak_ · on Jan 13, 2015

To my understanding, this library is intended to run in a some-what fixed environment ( server-side node.js ).

I'm unsure if you will run into any of the serialization issues mentioned by Kangax, as developers will be using the same ( if not very similar ) versions of v8.

_lce0 · on Jan 13, 2015

I feel like there's no sane way for the developer to understand that he's runnning in a different evironment

suppose:

   var i = 0;
   var localCaller = remoteBundler(function() {
       // this will be run at server-side
       i += 1; // oops .. "i" does not exists on server
   });

   localCaller();

and yes, this is could be easily spotted, but I've seen so many guys really stuck with so naive errors (eg mongodb's reduce function)

icebraining · on Jan 13, 2015

Can't the library detect the closure? I'm pretty sure I could do this in Python.

je42 · on Jan 13, 2015

True, but node.js would still screw up bound function's wouldn't it ? Thus if you want to schedule a function you would needed guarantee across all it's dependencies that bind hasn't been used.

While this might not be a problem for some use cases, I guess at least the documentation could discuss the topic such that the users of the library use the functionality with the restrictions in mind.

amvp · on Jan 13, 2015

This is cool. I'm pretty excited about AWS lambda. Keep in mind the current limitations of the AWS lambda at present: A process can't take longer than 60s. Only 25 concurrent executions. The zipped package containing your code and all dependant libraries can't be larger than 20MB.

alexdean · on Jan 13, 2015

Interesting! For people who prefer a less abstracted approach, check out: https://www.npmjs.com/package/grunt-aws-lambda (with accompanying blog post: http://hipsterdevblog.com/blog/2014/12/07/writing-functions-...).

scottmotte · on Jan 13, 2015

Nice. And for yet another less abstracted approach, check out: http://github.com/rebelmail/node-lambda (also with accompanying blog post: http://www.mot.la/2014-12-07-amazon-lambda-best-practices-de...)

cturhan · on Jan 13, 2015

The first time I saw Greek letters in a code was d3.js source code.

https://github.com/mbostock/d3/blob/master/src/geo/area.js#L...

It's a pure pain for me to type these letters. How do you write them besides copying/pasting?

quarterto · on Jan 13, 2015

My custom keyboard layout[1] has a third and fourth level (Alt+[Key], Alt+Shift+[Key]) containing various useful Greek and mathematical symbols, e.g. λσγδ∀∃∈⊂∞∅ℤℕ←→↓↑

[1]: I used Ukelele on Mac but there's also Keyboard Layout Creator for Windows and manually editing xkb files on Linux

claudiowilson · on Jan 13, 2015

Aside from being pretty cool, what's the use case for this?

panarky · on Jan 13, 2015

Each event is processed individually so thousands of functions can run in parallel and performance remains consistently high regardless of the frequency of events.

http://aws.amazon.com/lambda/

Now we're talking. If you can carve your application into small, independent tasks, Lambda can run thousands of them in parallel.

This could be cost-effective if you have a large amount of data stored in small chunks in S3, and you need to query it or transform it sporadically.

So instead of keeping terabytes of logs or transactions in Hadoop or Spark on hundreds of machines, keep the data in 100 MB chunks in S3. Then map-reduce with Lambda.

Set up one web server to track chunks and tasks, have each Lambda instance get tasks from the server. You could effectively spin up thousands of cores for a few minutes and only pay for the minutes you use.

jerf · on Jan 13, 2015

It probably won't be suitable for map-reduce. Buried in the FAQ is a statement that a "Lambda" function (scare-quoting because darn it, that name has been taken for longer than anybody working on it has been alive and is still in active use... grrrr... I'd like to see their trademark on that denied) can only run for up to a minute, with the initial default set to 3 seconds ("How long can a Lambda function execute?").

It's suitable for flinging a map-reduce job in response to some event, but I wouldn't try to jam a map-reduce job into those constraints. I mean, sure, yeah, theoretically possible, but really the wrong way to do it. If you're doing a task that even takes a second or two in Lambda you're coming perilously close to being less than an order of magnitude from a hard cutoff, which isn't a great plan in the public cloud. You really ought to be targeting getting in and out of Lambda much faster than that, and anything that needs to be longer being a process triggered in another more permanent instance.

panarky · on Jan 14, 2015

In the preview release, you can set the timeout value to anything between 1 second and 60 seconds. If you don't specify a timeout value, the default is 3 seconds.

I can stream a 100 MB chunk from S3 and map it concurrently as it streams in 10 to 15 seconds. Sixty seconds is more than enough time to process a chunk.

The bigger issue is that during the preview, Lambda is limited to 25 concurrent functions.

If Amazon delivers a product where "the same code that works for one request a day also works for a thousand requests a second[1]," then you might be able to analyze hundreds of gigabytes of data in a few seconds, spin up no servers, and only pay for the few seconds that you use.

500gb = 5000 chunks of 100mb each.

1000 concurrent tasks each running 10 seconds could process 500gb in 50 seconds.

You would use 5000 Lambda requests out of your free monthly allotment of 1,000,000. You'd also consume 5000 * 0.1gb * 10 seconds = 5000 gb-sec of your free monthly allotment of 400,000.

S3 transfer is free within the same region, and S3 requests cost $0.004 per 10000 GETs, or $0.002 for this query.

Even after you exhaust the free Lambda allotment, processing 500gb would cost $0.000000208 * 100 * 5000 or about 10 cents.

Scaling this up, querying 10 terabytes would take about 20 minutes to execute, cost $2 for the query, and about $300 per month for storage.

For sporadic workloads it might be more responsive and much cheaper than spinning up a fleet of machines for Hadoop or Spark.

[1] http://www.allthingsdistributed.com/2014/11/aws-lambda.html

z3t4 · on Jan 13, 2015

It would be interesting to work on Amazon and decide how many off-line CPU's you should have ready to "spin up" if needed ... Also considering Moore's law.

ajkjk · on Jan 13, 2015

It's cool precisely because it's useful! Have an expensive computation task? Just dump it onto the cloud instead of running it locally. This is probably the most streamlined way to do this that I've ever seen.

capisce · on Jan 13, 2015

If you have an expensive computation task and you're paying for computing time, do you really want to be using JavaScript for the purpose?

ajkjk · on Jan 13, 2015

Probably not, but this isn't purporting to be the best way to do it. It's just a cool thing you can do in your JS code.

Especially nice for spinning something up with zero overhead. Maybe not optimal for production apps. Maybe good if you're constrained on server resources but less so on budget. Maybe good if you're still on the Lambda free tier pricing.

atonse · on Jan 13, 2015

Actually, one of their examples was to generate image thumbnails, where they import the ImageMagick native libraries to do the heavy lifting.

Essentially, this can help offset the need for managing extra servers for those kinds of tasks.

ajkjk · on Jan 13, 2015

Re: the dead comment below me.

There's nothing mystical about this. The source code is 4 files, which I read. It sends your function to Lambda, and that's cool, and the syntax is really elegant. Enough that I could totally see throwing something together using this to solve some complex problems without really thinking about it.

I don't think this is all the way there, but I really like the idea of programming with APIs like this being as easy to use as language libraries.

kevan · on Jan 13, 2015

A concrete example: video upload and processing. Your frontend handles uploads, then you can offload conversion to AWS while you continue to handle other requests.

nostrademons · on Jan 13, 2015

Seems like a poor example - if you're building a product, you can easily afford the couple hours it takes to set up an EC2 image and autoscaling, dump a work item in SQS, and pick it up on an EC2 spot instance. And if you're doing video processing, you really really want to use a more efficient language than Javascript (like C) to handle the video processing. Combine the two of these and you'll get roughly a 100x cost saving over dumping a JS function into Amazon Lambda.

I see this being most useful if you have a one-off analytic you need to write against some big data in S3 or RDS. For one-off scripts dealing with the raw AWS APIs is just useless overhead, and the expense of running the script will be negligible.

emidln · on Jan 13, 2015

Having done this a few times from scratch for various companies, there a ton of moving pieces for almost any processing pipeline. Being able to scale that pipeline without writing the ops code to make it happen is actually magical. I'm not saying everyone should jump out and use this, but it takes a lot of work to:

(a) measure each of the points of your service (b) deploy your code in an automated manner (c) deploy your monitoring in an automated manner (d) make sure your code is under supervision (e) setup alerting on the monitoring (f) scale up / down and within price constraints as needed (g) repeat this for all supporting services (queue, db, etc) (h) write your actual application code

The potential to handle certain classes of problems via SQS/SNS/S3 pipelines is pretty alluring. You still have to do configuration, but the bet is that the configuration necessary for the SQS/SNS/S3/Lambda pipeline is far lower than that necessary to setup random autoscaling Celery, Resque, or random JMS/AMQP system on top of Ubuntu with Chef/Puppet/whatever.

fat0wl · on Jan 13, 2015

Cool! As someone with more experience than me in this, would you mind responding quickly to these points? I will give my personal opinions but if you can trump with more info that would be cool too. :)

1. I agree that JMS sounds like a hassle but is that really necessary? I would think that you can batch process data on an EC2 instance, then pick it up in your local code directly using AWS APIs... not sure.

2. I am not so familiar with the Lambda system but I'm also not sure how it would scale db as necessary (item "g" in your list) thus overall processing time would still be bottlenecked by other resources (database IO, for example), no? I agree with your points but in all these cloud-compute scenarios I always wonder "Are we trying to reach a theoretical limit of fastest-possible computation, or just reach some reasonable saturation point close to the natural bottlenecks/throttles of our system integrations?".

3. Having been burned a few times now by over-optimizing when considering cloud I would probably now first consider just picking a slightly oversized EC2 instance and throwing some high-performing code onto it (Java, C++). Dynamic languages + auto-scalable resources (though I'm talking about web hosting in particular now) seems to drain clients wallets more than anything. At this point I'd actually recommend anyone with new web infrastructure to just buy a static instance and write optimized Java rather than trying their hand at auto-scaling Ruby/Python/Node. Do you notice a similar issue with your clients regarding code optimization vs. auto-scaling?

fat0wl · on Jan 13, 2015

This is a cool thought exercise I suppose, the idea of within a program throwing a particular function to an offsite parallel-compute engine. I just imagine it will complicate platform integration & bleed money given that it is in js (plus API headaches abound, as you mention)... If the same concept of dynamic provision/deploy/process/collect could all be done from a Java/C++ app I'd be very intrigued, but I suspect it is already near possible or achieved trivially through AWS APIs, albeit it probably with a manual deploy involved.

I think I remember this concept in Matlab back from when I did some research in grad school -- basically an instance of Matlab can be setup as a compute server, and the parallel processing functions of Matlab code on other computers can portion out work to it. This is the ideal model in my mind.

1. Write high-performance code in any language with some function that should be happening remotely in parallel.

2. Configure AWS to auto-provision the resources necessary.

3. Execute code, have it behave as if it is all running locally.

Really all these things can already be achieved with SOA, RMI, message queues, etc., the trick is just in making it transparent to the programmer so there is no deploy step involved. With the right spec it could even become platform agnostic (change small config file somewhere to target different cloud platform... would be nice to see a JSR about that in the near future!).

nostrademons · on Jan 13, 2015

This is largely the architecture behind MapReduce or Hadoop Streaming. Write high-performance code in any language with an isolated parallelizable function, configure some cloud to auto-provision workers that run that function repeatedly on millions of records, execute code & pretend it's local.

moe · on Jan 13, 2015

Combine the two of these and you'll get roughly a 100x cost saving over dumping a JS function into Amazon Lambda.

And then move it off EC2 onto dedicated hardware and you'll see another ~30x cost savings.

Running a permanent transcode cluster on EC2 would be rather insane. Hetzner rents you i7's for $50 per month, the EC2 equivalent (c3.8xlarge) costs $40 per day.

Yes you can cut EC2 cost with spot-instances, but at least in our case that would still have been significantly more expensive than just renting some scrap metal.

If you need cheap, disposable compute for semi-predictable loads then the Hetzner flea market (yes, they really have one!) is hard to beat on bogomips per dollar.

nostrademons · on Jan 13, 2015

Yeah, dedicated is definitely the way to go if you can afford the ops staff. Actually, if you're really big and can afford the hardware engineers, building your own DCs and computers is the way to go. I've seen the profit margins of some major cloud providers; they're insane. About the only industry more profitable are the telecoms.

I was targeting my comment toward a startup that'd likely be building a product like the OP suggested, though, with maybe a dozen devs who are all generalists. Setting up a bunch of EC2 instances to pull videos out of SQS/S3 and run them through ffmpeg is something an ordinary full-stack developer can do in a morning, and scaling it up just involves clicking a button. Running and scaling a dedicated server farm reliably usually needs a dedicated ops person to keep it all working.

Kiro · on Jan 13, 2015

With the additional cost that you actually need to upload the raw video, which you don't need if you just do it locally without Lambda. Or do I misunderstand this?

_Marak_ · on Jan 13, 2015

This is a pretty neat trick / feature to add on top of a microservice hosting platform like Amazon Lamba. Automatic serialization, uploading, and queuing of your microservice from a running application. Normally this is done manually with a source code upload or git push.

If anyone is interested in an alternate open-source microservice platform to AWS Lamba, try checking out http://hook.io http://github.com/bigcompany/hook.io

No hook.io users have requested this novel functionality that Lambdaws performs ( yet ), but I suspect we'll add it to hook.io in the future.

pokpokpok · on Jan 13, 2015

You can't use this client side, correct? due to the fact you'd be publishing your aws key?

detaro · on Jan 13, 2015

Also because people could do arbitrary computations under your account (Bitcoin/Litecoin mining in JS anyone?)

kevan · on Jan 13, 2015

Correct, there's no way to prevent people from seeing your keys.

dzhiurgis · on Jan 13, 2015

Can't you implement/configure something using IAM?

adamrneary · on Jan 13, 2015

This is useful, but I am not sure at scale (i.e. a constellation of micro-services working together) I would want some of this functionality abstracted behind a library. I prefer gulp and wrote a couple articles talking through a simple process for local development, testing, and deployment if anyone finds it useful:

* https://medium.com/@AdamRNeary/developing-and-testing-amazon... * https://medium.com/@AdamRNeary/a-gulp-workflow-for-amazon-la...

atonse · on Jan 13, 2015

This is really cool but how do you trigger the execution with SQS? AFAIK, AWS lambda functions can only be triggered by DynamoDB or Kinesis streams, or S3 events. I didn't know you could trigger them off SQS as well.

alexdean · on Jan 13, 2015

It looks like SQS is only used for the out-of-band return path. The execution is triggered with the AWS SDK's invoke async option:

https://github.com/mentum/lambdaws/blob/master/lib/LambdaHel...

Documentation on this option:

http://docs.aws.amazon.com/lambda/latest/dg/walkthrough-cust...

atonse · on Jan 13, 2015

I didn't know about invoke async. Thanks!

shanemhansen · on Jan 13, 2015

It seems weird to me to have language specific sandboxing services. We should be able to execute any code safely "in the cloud" (which I'm going to charitably interpret as being automatically distributed over N machines).

You can use things like app armor profiles for runtime sanboxing or google's nacl for statically verified binaries. Heck, even java bytecode would be more flexible than javascript.

tericho · on Jan 13, 2015

This is interesting. The main use cases seem to be large computations, since HTTP calls would be more expensive any almost any routine operation.

dagw · on Jan 13, 2015

Well, not that large since it will kill your function if it takes more than 60 seconds to run.

IkmoIkmo · on Jan 13, 2015

Genuine question, didn't we have NodeJS for that? I haven't looked into this properly, but I what is the angle here exactly? Thanks in advance.

rattray · on Jan 13, 2015

Read the README; this runs a function on other instances in the cloud, on-demand. I agree the title is confusing. For example, you could write a parallelizable, computationally intense function, and have it run in parallel in AWS, directed from a script on your laptop. Without having to set up servers, upload code, etc.

nostrademons · on Jan 13, 2015

No need to setup Node on an EC2 image, or boot the image, distribute data, launch multiple instances, etc.

LAMike · on Jan 13, 2015

So could a service like jsfiddle allow for npm modules to be loaded in? It could be a cool premium feature

m90 · on Jan 13, 2015

There's requirebin already which does that.

zubairov · on Jan 13, 2015

The simplest PaaS ever, nice approach - should have been done by AWS

aaronbrethorst · on Jan 13, 2015

Except it was done for free by some upstanding person. If it proves to be successful, expect to see AWS roll it out in 2017 or 2018.

irunbackwards · on Jan 13, 2015

Is it just me or is this really beast