Hacker News new | past | comments | ask | show | jobs | submit | claudiowilson's comments login

It's because a lot of the users of gen ai are generating anime waifus. Better gen ai = better waifus. It also helps that devs and programmers are a group that is already likelier to be into anime. Generative AI's killer app is the AI girlfriend / boyfriend.


Aside from being pretty cool, what's the use case for this?


Each event is processed individually so thousands of functions can run in parallel and performance remains consistently high regardless of the frequency of events.

http://aws.amazon.com/lambda/

Now we're talking. If you can carve your application into small, independent tasks, Lambda can run thousands of them in parallel.

This could be cost-effective if you have a large amount of data stored in small chunks in S3, and you need to query it or transform it sporadically.

So instead of keeping terabytes of logs or transactions in Hadoop or Spark on hundreds of machines, keep the data in 100 MB chunks in S3. Then map-reduce with Lambda.

Set up one web server to track chunks and tasks, have each Lambda instance get tasks from the server. You could effectively spin up thousands of cores for a few minutes and only pay for the minutes you use.


It probably won't be suitable for map-reduce. Buried in the FAQ is a statement that a "Lambda" function (scare-quoting because darn it, that name has been taken for longer than anybody working on it has been alive and is still in active use... grrrr... I'd like to see their trademark on that denied) can only run for up to a minute, with the initial default set to 3 seconds ("How long can a Lambda function execute?").

It's suitable for flinging a map-reduce job in response to some event, but I wouldn't try to jam a map-reduce job into those constraints. I mean, sure, yeah, theoretically possible, but really the wrong way to do it. If you're doing a task that even takes a second or two in Lambda you're coming perilously close to being less than an order of magnitude from a hard cutoff, which isn't a great plan in the public cloud. You really ought to be targeting getting in and out of Lambda much faster than that, and anything that needs to be longer being a process triggered in another more permanent instance.


In the preview release, you can set the timeout value to anything between 1 second and 60 seconds. If you don't specify a timeout value, the default is 3 seconds.

I can stream a 100 MB chunk from S3 and map it concurrently as it streams in 10 to 15 seconds. Sixty seconds is more than enough time to process a chunk.

The bigger issue is that during the preview, Lambda is limited to 25 concurrent functions.

If Amazon delivers a product where "the same code that works for one request a day also works for a thousand requests a second[1]," then you might be able to analyze hundreds of gigabytes of data in a few seconds, spin up no servers, and only pay for the few seconds that you use.

500gb = 5000 chunks of 100mb each.

1000 concurrent tasks each running 10 seconds could process 500gb in 50 seconds.

You would use 5000 Lambda requests out of your free monthly allotment of 1,000,000. You'd also consume 5000 * 0.1gb * 10 seconds = 5000 gb-sec of your free monthly allotment of 400,000.

S3 transfer is free within the same region, and S3 requests cost $0.004 per 10000 GETs, or $0.002 for this query.

Even after you exhaust the free Lambda allotment, processing 500gb would cost $0.000000208 * 100 * 5000 or about 10 cents.

Scaling this up, querying 10 terabytes would take about 20 minutes to execute, cost $2 for the query, and about $300 per month for storage.

For sporadic workloads it might be more responsive and much cheaper than spinning up a fleet of machines for Hadoop or Spark.

[1] http://www.allthingsdistributed.com/2014/11/aws-lambda.html


It would be interesting to work on Amazon and decide how many off-line CPU's you should have ready to "spin up" if needed ... Also considering Moore's law.


It's cool precisely because it's useful! Have an expensive computation task? Just dump it onto the cloud instead of running it locally. This is probably the most streamlined way to do this that I've ever seen.


If you have an expensive computation task and you're paying for computing time, do you really want to be using JavaScript for the purpose?


Probably not, but this isn't purporting to be the best way to do it. It's just a cool thing you can do in your JS code.

Especially nice for spinning something up with zero overhead. Maybe not optimal for production apps. Maybe good if you're constrained on server resources but less so on budget. Maybe good if you're still on the Lambda free tier pricing.


Actually, one of their examples was to generate image thumbnails, where they import the ImageMagick native libraries to do the heavy lifting.

Essentially, this can help offset the need for managing extra servers for those kinds of tasks.


Re: the dead comment below me.

There's nothing mystical about this. The source code is 4 files, which I read. It sends your function to Lambda, and that's cool, and the syntax is really elegant. Enough that I could totally see throwing something together using this to solve some complex problems without really thinking about it.

I don't think this is all the way there, but I really like the idea of programming with APIs like this being as easy to use as language libraries.


A concrete example: video upload and processing. Your frontend handles uploads, then you can offload conversion to AWS while you continue to handle other requests.


Seems like a poor example - if you're building a product, you can easily afford the couple hours it takes to set up an EC2 image and autoscaling, dump a work item in SQS, and pick it up on an EC2 spot instance. And if you're doing video processing, you really really want to use a more efficient language than Javascript (like C) to handle the video processing. Combine the two of these and you'll get roughly a 100x cost saving over dumping a JS function into Amazon Lambda.

I see this being most useful if you have a one-off analytic you need to write against some big data in S3 or RDS. For one-off scripts dealing with the raw AWS APIs is just useless overhead, and the expense of running the script will be negligible.


Having done this a few times from scratch for various companies, there a ton of moving pieces for almost any processing pipeline. Being able to scale that pipeline without writing the ops code to make it happen is actually magical. I'm not saying everyone should jump out and use this, but it takes a lot of work to:

(a) measure each of the points of your service (b) deploy your code in an automated manner (c) deploy your monitoring in an automated manner (d) make sure your code is under supervision (e) setup alerting on the monitoring (f) scale up / down and within price constraints as needed (g) repeat this for all supporting services (queue, db, etc) (h) write your actual application code

The potential to handle certain classes of problems via SQS/SNS/S3 pipelines is pretty alluring. You still have to do configuration, but the bet is that the configuration necessary for the SQS/SNS/S3/Lambda pipeline is far lower than that necessary to setup random autoscaling Celery, Resque, or random JMS/AMQP system on top of Ubuntu with Chef/Puppet/whatever.


Cool! As someone with more experience than me in this, would you mind responding quickly to these points? I will give my personal opinions but if you can trump with more info that would be cool too. :)

1. I agree that JMS sounds like a hassle but is that really necessary? I would think that you can batch process data on an EC2 instance, then pick it up in your local code directly using AWS APIs... not sure.

2. I am not so familiar with the Lambda system but I'm also not sure how it would scale db as necessary (item "g" in your list) thus overall processing time would still be bottlenecked by other resources (database IO, for example), no? I agree with your points but in all these cloud-compute scenarios I always wonder "Are we trying to reach a theoretical limit of fastest-possible computation, or just reach some reasonable saturation point close to the natural bottlenecks/throttles of our system integrations?".

3. Having been burned a few times now by over-optimizing when considering cloud I would probably now first consider just picking a slightly oversized EC2 instance and throwing some high-performing code onto it (Java, C++). Dynamic languages + auto-scalable resources (though I'm talking about web hosting in particular now) seems to drain clients wallets more than anything. At this point I'd actually recommend anyone with new web infrastructure to just buy a static instance and write optimized Java rather than trying their hand at auto-scaling Ruby/Python/Node. Do you notice a similar issue with your clients regarding code optimization vs. auto-scaling?


This is a cool thought exercise I suppose, the idea of within a program throwing a particular function to an offsite parallel-compute engine. I just imagine it will complicate platform integration & bleed money given that it is in js (plus API headaches abound, as you mention)... If the same concept of dynamic provision/deploy/process/collect could all be done from a Java/C++ app I'd be very intrigued, but I suspect it is already near possible or achieved trivially through AWS APIs, albeit it probably with a manual deploy involved.

I think I remember this concept in Matlab back from when I did some research in grad school -- basically an instance of Matlab can be setup as a compute server, and the parallel processing functions of Matlab code on other computers can portion out work to it. This is the ideal model in my mind.

1. Write high-performance code in any language with some function that should be happening remotely in parallel.

2. Configure AWS to auto-provision the resources necessary.

3. Execute code, have it behave as if it is all running locally.

Really all these things can already be achieved with SOA, RMI, message queues, etc., the trick is just in making it transparent to the programmer so there is no deploy step involved. With the right spec it could even become platform agnostic (change small config file somewhere to target different cloud platform... would be nice to see a JSR about that in the near future!).


This is largely the architecture behind MapReduce or Hadoop Streaming. Write high-performance code in any language with an isolated parallelizable function, configure some cloud to auto-provision workers that run that function repeatedly on millions of records, execute code & pretend it's local.


Combine the two of these and you'll get roughly a 100x cost saving over dumping a JS function into Amazon Lambda.

And then move it off EC2 onto dedicated hardware and you'll see another ~30x cost savings.

Running a permanent transcode cluster on EC2 would be rather insane. Hetzner rents you i7's for $50 per month, the EC2 equivalent (c3.8xlarge) costs $40 per day.

Yes you can cut EC2 cost with spot-instances, but at least in our case that would still have been significantly more expensive than just renting some scrap metal.

If you need cheap, disposable compute for semi-predictable loads then the Hetzner flea market (yes, they really have one!) is hard to beat on bogomips per dollar.


Yeah, dedicated is definitely the way to go if you can afford the ops staff. Actually, if you're really big and can afford the hardware engineers, building your own DCs and computers is the way to go. I've seen the profit margins of some major cloud providers; they're insane. About the only industry more profitable are the telecoms.

I was targeting my comment toward a startup that'd likely be building a product like the OP suggested, though, with maybe a dozen devs who are all generalists. Setting up a bunch of EC2 instances to pull videos out of SQS/S3 and run them through ffmpeg is something an ordinary full-stack developer can do in a morning, and scaling it up just involves clicking a button. Running and scaling a dedicated server farm reliably usually needs a dedicated ops person to keep it all working.


With the additional cost that you actually need to upload the raw video, which you don't need if you just do it locally without Lambda. Or do I misunderstand this?


I'm interested to see the reactions this will garner from the anti-GMO crowd.


The usual suspects are of course against this, but one of the reasons we started the company was to change public perception around GMOs. The technology has enormous potential to help the world, but it's being held back by public opinion. What's frustrating is that this opinion is basically not based on science or evidence, but is mobilized because people are against things like biotech patents and large companies dominating the food chain. The glowing plant changes that discussion and we hope that people develop a more nuanced and balanced perspective once they have a GMO in their own home.


I can definitely see the potential for greatness that GMOs have. I'm really glad that you're trying to change the misguided public perception around them.


Plus it is harder to tell your kids they can't have glow in the dark plants than GMO food.


There was also no mention of which jobs were held by Asian Americans versus immigrant Asians. I think there's a stark difference between an increase in jobs held by Asian immigrants versus an increase in jobs held by Asian Americans. In the latter, the jobs are still held by Americans.


That should be true for Black, White and Latino immigrants as well. The assumption that Asians are immigrants while others are not is a prejudiced one.

It's also not wise to provoke animosity against immigrants generally. Most of these people are going to become "Americans" soon enough. They are here, contributing to the economy and culture, paying taxes etc.

We are a nation of immigrants .. cliched but true. Newcomers should be welcomed and celebrated.


"it shows them that they have the power to shape the world, rather than just experience it"

That is precisely why I love programming so much, it's that statement right there. You have the tools to shape the world right in front of you, all you need is the drive and vision to do it.


I use the ever eloquent, "yolo".


I don't think MongoDB should be replacing systems, it should rather be used as a data store for things that otherwise wouldn't be stored because of the high throughput needs, but aren't necessarily critical data.


I suppose it's also a way for users to remember the gist of an article that they've read a long time ago? But honestly aside from that it does look like a Foursquare for reading articles...


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: