More

netshade · on Feb 21, 2017

Just a note to anyone looking to go down this road: IO performance is still pretty awful last I checked. If your workflow is IO intensive (non trivial Rails app, let's say), proceed w/ caution.

https://www.reddit.com/r/bashonubuntuonwindows/comments/5ece... - My last check.

netshade · on Feb 21, 2017

(Addendum) I did a Fast Ring update on a separate machine and did not see substantial performance improvements, but the comparison is not valid between the machine linked and the machine I tested on, so I'd be hesitant to say that the performance hasn't improved at all, just that IO performance is definitely a weak point to WSL.

bitcrazed · on Feb 21, 2017

Yes - IO perf is not yet where we want it to be.

We have some improvements coming in Creator's Update and more substantial improvements planned in future releases, once we've completed some work with the NT kernel filesystem team.

castratikron · on Feb 22, 2017

Is this why Visual Studio is scp'ing source files around, instead of just copying the files to that /mnt/d directory and invoking the compiler there? That seemed like a big hack to me and it wasn't exactly clear why they were doing it that way (near the end, last five minutes).

netshade · on Jan 22, 2017

Wow, that is pretty fast. I played w/ doing CSV parsing taking advantage of SIMD string lookahead a while back ( https://gist.github.com/netshade/aa9e836e843c8e84b97a ) and found it to be quite fast as well, as I had assumed (perhaps wrongly) that the cost of navigating back and forth between the CPU and the GPU would erase any performance gains. I suspect ( it's been a while! ) that the SIMD approach would be faster than GPU, but tbh after working on it for a bit, then comparing it w/ mawk's (http://invisible-island.net/mawk/mawk.html) performance, mawk still beat my approach handily, and did it w/ way more functionality. Which is all to say that mawk is pretty amazing and worth checking out if you're in the market for parsing CSV fast.

netshade · on May 15, 2016

Awesome, thanks very much.

netshade · on May 15, 2016

Nice, thanks for this!

netshade · on April 21, 2016

Would be cool if this included cstore_fdw as well ( https://github.com/citusdata/cstore_fdw ).

craigkerstiens · on April 21, 2016

At this point during the beta we're not supporting cstore, but it's definitely on our roadmap for the future.

netshade · on Sept 12, 2014

This. I randomly picked Bellingham as a quiet place to go to for a week and do some self-teaching plus vacationing.

Decent food, friendly people, nice coffee shops, great areas to go trail running in. I'm not sure I'd go there for an actual vacation, but for just getting away and trying to learn some new things, it was a great place.

jedanbik · on Sept 12, 2014

There's a reason they call it "the city of subdued excitement."

netshade · on June 11, 2014

I had a hell of a time getting PgPool II w/ PG Streaming Replication set up right. The trivial cases seemed to be fine, but when I started triggering failovers back and forth, I ran into a lot of cases where PgPool II would go stale due to a data file left around in /tmp.

I eventually got it working, but there were way too many informally created scripts that PgPool and PG had to know to trigger failovers, initiate resyncs from WALs, etc. I didn't like it at all, and around about then AWS started offering PG on RDS, so I just moved to that.

So, my advice would be, unless you've got someone on team for who that isn't that much work, you get a lot of benefit from going w/ hosted. RDS Postgres has been pretty great - not exceptional, but for my use cases, okay. Hoping they add cross region read replicas for PG sometime soon, as that would make a lot of expansion opportunities really easy.

netshade · on July 25, 2013

I don't have enough background knowledge to ascertain if this is the correct understanding of the problem, but this explanation reads very well, and is the best 'mechanical' explanation I've seen in this thread so far.

Thank you.

netshade · on July 10, 2013

Cool library, can imagine how moving away from boxing / unboxing can be a huge boost for them.

I've been looking for something that gave SIMD intrinsics to Java programmers - does anyone know if such a thing exists? Could be a nice addition to this lib.

fiatmoney · on July 10, 2013

You can't, unless you write it as native code, put your data in direct NIO buffers, and go through the JNI dance.

netshade · on July 10, 2013

Ah well, had hoped there might be an already made thing out there. Thanks!

netshade · on June 20, 2013

I've been on the fence about trying this on Instrumental; we've been looking at moving to DynamoDB, but I wanted to at least see if it would give drop-in magic performance benefits, so I spun off a new SQS queue of our incoming data this evening and did a write throughput test.

These are only initial impressions, but:

* It's definitely faster. Our write behavior is largely upserts against integers and doubles, and I'm seeing roughly 100% improvement against stock Mongo 2.4. The machine in question is an m2.2xlarge with a 1000 piops EBS volume attached, and it's doing about 7000 update operations a second. ( safe mode )

* I'm seeing consistently lower IO util than stock Mongo. Stock tends to vary wildly between 200-750 write ops, while under sustained write traffic, I see about 250 write ops.

* CPU usage is pretty well balanced against all cores, as opposed to stock's behavior.

* It's too early to say whether or not the storage savings will be as good as claimed, but at this point it seems that the TokuMX reprs are about 40% of the stock reprs. Like I said, most of our data is ints and doubles tho.

VERY LARGE CAVEAT I'm not running the database in a replica set because I'm lazy. So, the write throughput numbers are likely the best case scenario of what you'd actually be running in production.

(edit: line spacing)

If you've got to use MongoDB, it seems pretty nice. If it came with a tiny person that maintained the database for you as well, it'd be a no brainer.

bretpiatt · on June 20, 2013

Disclosure: Rackspace owns ObjectRocket and I work at Rackspace.

Have you tried ObjectRocket (if you're in US-East or US-West)? http://objectrocket.com/ High performance Mongo with replica sets.

netshade · on June 20, 2013

I've definitely taken a look at Object Rocket, and it's a really nice looking service. However, our dataset size would take us right into your custom quote plan, and based on the rate of increase between the different plans, we'd paying significantly more w/ ObjectRocket than we would w/ DynamoDB (if we do indeed move to DynamoDB).