Hacker Newsnew | past | comments | ask | show | jobs | submit | more netshade's commentslogin

Just a note to anyone looking to go down this road: IO performance is still pretty awful last I checked. If your workflow is IO intensive (non trivial Rails app, let's say), proceed w/ caution.

https://www.reddit.com/r/bashonubuntuonwindows/comments/5ece... - My last check.


(Addendum) I did a Fast Ring update on a separate machine and did not see substantial performance improvements, but the comparison is not valid between the machine linked and the machine I tested on, so I'd be hesitant to say that the performance hasn't improved at all, just that IO performance is definitely a weak point to WSL.


Yes - IO perf is not yet where we want it to be.

We have some improvements coming in Creator's Update and more substantial improvements planned in future releases, once we've completed some work with the NT kernel filesystem team.


Is this why Visual Studio is scp'ing source files around, instead of just copying the files to that /mnt/d directory and invoking the compiler there? That seemed like a big hack to me and it wasn't exactly clear why they were doing it that way (near the end, last five minutes).


Wow, that is pretty fast. I played w/ doing CSV parsing taking advantage of SIMD string lookahead a while back ( https://gist.github.com/netshade/aa9e836e843c8e84b97a ) and found it to be quite fast as well, as I had assumed (perhaps wrongly) that the cost of navigating back and forth between the CPU and the GPU would erase any performance gains. I suspect ( it's been a while! ) that the SIMD approach would be faster than GPU, but tbh after working on it for a bit, then comparing it w/ mawk's (http://invisible-island.net/mawk/mawk.html) performance, mawk still beat my approach handily, and did it w/ way more functionality. Which is all to say that mawk is pretty amazing and worth checking out if you're in the market for parsing CSV fast.


Awesome, thanks very much.


Nice, thanks for this!


Would be cool if this included cstore_fdw as well ( https://github.com/citusdata/cstore_fdw ).


At this point during the beta we're not supporting cstore, but it's definitely on our roadmap for the future.


This. I randomly picked Bellingham as a quiet place to go to for a week and do some self-teaching plus vacationing.

Decent food, friendly people, nice coffee shops, great areas to go trail running in. I'm not sure I'd go there for an actual vacation, but for just getting away and trying to learn some new things, it was a great place.


There's a reason they call it "the city of subdued excitement."


I had a hell of a time getting PgPool II w/ PG Streaming Replication set up right. The trivial cases seemed to be fine, but when I started triggering failovers back and forth, I ran into a lot of cases where PgPool II would go stale due to a data file left around in /tmp.

I eventually got it working, but there were way too many informally created scripts that PgPool and PG had to know to trigger failovers, initiate resyncs from WALs, etc. I didn't like it at all, and around about then AWS started offering PG on RDS, so I just moved to that.

So, my advice would be, unless you've got someone on team for who that isn't that much work, you get a lot of benefit from going w/ hosted. RDS Postgres has been pretty great - not exceptional, but for my use cases, okay. Hoping they add cross region read replicas for PG sometime soon, as that would make a lot of expansion opportunities really easy.


I don't have enough background knowledge to ascertain if this is the correct understanding of the problem, but this explanation reads very well, and is the best 'mechanical' explanation I've seen in this thread so far.

Thank you.


Cool library, can imagine how moving away from boxing / unboxing can be a huge boost for them.

I've been looking for something that gave SIMD intrinsics to Java programmers - does anyone know if such a thing exists? Could be a nice addition to this lib.


You can't, unless you write it as native code, put your data in direct NIO buffers, and go through the JNI dance.


Ah well, had hoped there might be an already made thing out there. Thanks!


I've been on the fence about trying this on Instrumental; we've been looking at moving to DynamoDB, but I wanted to at least see if it would give drop-in magic performance benefits, so I spun off a new SQS queue of our incoming data this evening and did a write throughput test.

These are only initial impressions, but:

* It's definitely faster. Our write behavior is largely upserts against integers and doubles, and I'm seeing roughly 100% improvement against stock Mongo 2.4. The machine in question is an m2.2xlarge with a 1000 piops EBS volume attached, and it's doing about 7000 update operations a second. ( safe mode )

* I'm seeing consistently lower IO util than stock Mongo. Stock tends to vary wildly between 200-750 write ops, while under sustained write traffic, I see about 250 write ops.

* CPU usage is pretty well balanced against all cores, as opposed to stock's behavior.

* It's too early to say whether or not the storage savings will be as good as claimed, but at this point it seems that the TokuMX reprs are about 40% of the stock reprs. Like I said, most of our data is ints and doubles tho.

VERY LARGE CAVEAT I'm not running the database in a replica set because I'm lazy. So, the write throughput numbers are likely the best case scenario of what you'd actually be running in production.

(edit: line spacing)

If you've got to use MongoDB, it seems pretty nice. If it came with a tiny person that maintained the database for you as well, it'd be a no brainer.


Disclosure: Rackspace owns ObjectRocket and I work at Rackspace.

Have you tried ObjectRocket (if you're in US-East or US-West)? http://objectrocket.com/ High performance Mongo with replica sets.


I've definitely taken a look at Object Rocket, and it's a really nice looking service. However, our dataset size would take us right into your custom quote plan, and based on the rate of increase between the different plans, we'd paying significantly more w/ ObjectRocket than we would w/ DynamoDB (if we do indeed move to DynamoDB).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: