Folly - The Faceboook open source library

tptacek · on June 3, 2012

This is all pretty great stuff. Make sure you read through the docs; I'm unlikely to use Fb's C++ code, but I'm sure as hell going to look at making my C vector code do some of what FBVector does:

https://github.com/facebook/folly/blob/master/folly/docs/FBV...

chrisaycock · on June 3, 2012

I poured through the FBVector code because of your comment. It has some really interesting tricks. In a normal array allocation, it always goes for the next-largest-size in the jemalloc memory hierarchy. That is, it goes for multiples of 64 bytes, 256 bytes, 4 KB, or 4 MB. This makes the array cache-efficient.

The push_back() semantics features the standard array doubling technique, but only up to 4 KB; jemalloc can't grow in-place anything smaller than this, so a copy is required. Beyond that cut-off, push_back() will instead grow by 1.5 times the capacity to prevent too much "slack" memory from accumulating.

stianan · on June 3, 2012

Out of curiosity, are you unlikely to use the code because it's C++ or because it comes from Facebook?

tptacek · on June 3, 2012

tomjen3 · on June 3, 2012

I realise this is a dangerous subject, but I am genuinly curious: given the business you run and the work you do for the Freebsd team, why would you object to using c++, but not c?

I could understand why you would object to both, but why only the lower level language.

seiji · on June 3, 2012

Personally, I can't keep all of how C++ works in my brain at once. Most people end up limiting themselves to a "sane subset" of C++ to compensate for spec sprawl. But, everybody picks a different subset, so you can quickly stumble on things you think you know but don't work quite right when working on non-self-originated code.

cperciva · on June 3, 2012

given the business you run and the work you do for the Freebsd team

Are you getting confused between tptacek and me?

tptacek · on June 3, 2012

Hey, why don't you use C++? :)

cperciva · on June 3, 2012

I think Tavis put it best: C is easier to audit because it's transparent. With C++ you can have innocuous-looking source code and have the compiler doing all sorts of crazy things behind your back.

tptacek · on June 3, 2012

I don't work for the FreeBSD team.

joe_the_user · on June 3, 2012

Hmm,

I recently created a private memory allocator so the discussion on the page is somewhat interesting...

Only a tiny minority of objects are genuinely non-relocatable:

Hmm, I'm not exactly what is meant here. Moving a block of memory from here to there in the most general case will leave your pointers dangling and crash you in no short order. Things that pointed to the data you moved just don't any more. If you are disciplined and don't have raw pointers in the block moved, you're good. But as far as I know, that situation requires a very complete understanding of the data in the block you are moving. You can do that. It's just not easy or something that makes sense to stuff "anything" into.

dreiss · on June 3, 2012

If your object is in a vector, it's already possible for the vector to move the object without updating any pointers to it. The only question is whether it is safe for the vector to move the object with memmove instead of invoking a copy/move constructor, essentially skipping the step of telling the object that it is being moved. This will only fail if the object contains pointers to its own sub-objects (or the move constructor updates external pointers). This is fairly rare, especially for objects that you would store directly in a vector.

std::string is a key exception to this, but fbstring is not.

chasingtheflow · on June 4, 2012

Could you elaborate at all on what it means to put an object in a vector/why you would want to do that?

sureshv · on June 3, 2012

Edit: I think the assumption is that an object itself is relocatable inside a container like vector if it marks itself as such. The document itself doesn't actually explain the inner workings.

joe_the_user · on June 3, 2012

Yes, I think the point is that they are indeed glossing over important stuff.

shin_lao · on June 3, 2012

That's all good and well, but I'm not sure I understand what it brings compared to std::vector with a good allocator.

In addition, it does not support concurrent growth.

dreiss · on June 3, 2012

- It adjusts its growth strategy to work well with the properties of the allocator.

- It requires that its members be relocatable and uses memmove to speed up move operations.

shin_lao · on June 4, 2012

1 - Can be done from the allocator; 2 - I don't think it's useful with rvalues references. It also sounds like you want to use std::list...

luser001 · on June 2, 2012

I'm going to be taking a serious look at this for my own projects.

I like Format: https://github.com/facebook/folly/blob/master/folly/docs/For...

Histogram is going to come in handy. :)

fbstring seems to be exactly what the doctor ordered.

I like the emphasis on cooperating with the memory allocator in fbstring and fbvector. If the entire library does that, that's going to a big win for long-running programs: memory fragmentation can slowly increase program footprint, requiring the use of fancy arena allocators etc.

I had fun reading their vector doc: https://github.com/facebook/folly/blob/master/folly/docs/FBV...

I like dynamic: https://github.com/facebook/folly/blob/master/folly/docs/Dyn...

vl · on June 3, 2012

Modern heaps typically have some low-fragmentation technique built-in, for example, Windows ships with Low Fragmentation Heap, which is turned on by default since Vista.

orijing · on June 3, 2012

What is "low-fragmentation heap"? Why would anyone want "high-fragmentation heap"? (since you imply that it's an option)

vl · on June 3, 2012

Low-fragmentation heap puts object of similar size together, so once object is freed, this memory can be reused for other object of similar size without fragmentation. Because of this is has more "slack" - unused memory at the end of the objects that are smaller than their buckets. On other hand, application in steady state is not going slowly increase it's memory use over time.

Also it puts consequently allocated objects (of different size) far away (and thus reduces cache locality), which, in turn may reduce performance for some "allocate a lot of stuff at the beginning and then serve it", etc scenarios, but this is pretty esoteric problem.

Benefits outweigh the concerns, so most apps benefit from the low-fragmentation heaps.

cperciva · on June 3, 2012

Why would anyone want "high-fragmentation heap"?

For short-lived processes, it's faster and uses less memory. The code is also simpler (important if you're writing a malloc in the mid-1980s).

cpeterso · on June 3, 2012

Microsoft Low-fragmentation Heap: http://msdn.microsoft.com/en-us/library/windows/desktop/aa36...

slug · on June 3, 2012

Aren't most of these already within boost or recent C++ standard ? I just took a look at the format and dynamic cases.

shin_lao · on June 3, 2012

Some of them.

For example Boost.Spirit.Karma is an order of magnitude faster than sprintf (I don't know compared to fbformat), with the drawback of a very long compile time (see here: http://tinodidriksen.com/2010/02/07/cpp-convert-int-to-strin...).

Arena, SmallLocks and AtomicHashMap will have to challenge Intel TBB. I submit TBB will be better, but I haven't done the benchmarks.

As for the rest, some of the stuff is simply outdated.

There's better stuff than Synchronized in the C++11 standard. Foreach doesn't make sense with lambdas.

Some of the headers are just trivial and useless functions (MapUtil.h, IntrusiveList.h, Random.h...), etc.

I'm a bit disappointed.

chad_walters · on June 3, 2012

From the OP: "Our primary aim with this 'foolishness' is to create a solution that allows us to continue open sourcing parts of our stack without resorting to reinventing some of our internal wheels."

And: "we think C++ developers might find parts of this library interesting in their own right."

I think the key thing is that these are some basic pieces that a lot of FB's internal C++ code depends on. These pieces are being made available as a stepping stone to open sourcing more significant work.

tudorb · on June 3, 2012

I'm Tudor, one of the main folly authors. I wrote format, Arena / ThreadCachedArena, DiscriminatedPtr, GroupVarint, TimeoutQueue, and various other pieces (parts of String.h, ThreadLocal, etc), and I'm pretty well-versed with the entire library so I can reasonably answer questions (or poke the appropriate people to make a HN account and answer). Ask away.

chrisaycock · on June 3, 2012

Not necessarily specific to Folly, but I've wondered why vector classes don't have a "short vector optimization" like string classes do. That is, why don't they store 24 bytes or so in place? Is it because the required iterators would take-up too much space anyway?

tudorb · on June 3, 2012

folly::small_vector does just that (and it lets you use one bit for a mutex, too! -- we have a lot of memory-constrained apps so we had to design data structures for them).

We might unify that with fbvector eventually.

zem · on June 3, 2012

i just attended alexandrescu's talk on some of the optimisations that went into this [rami sayar liveblogged it here: http://ramisayar.com/fb-cpp-conf/]. very interesting and inspiring remarks on instruction-level parallelism [http://en.wikipedia.org/wiki/Instruction_level_parallelism] and ways to help your code achieve it.

cpeterso · on June 3, 2012

I'm surprised they used mixed-case file names when their class names are lowercase. You have to remember both capitalizations. Mixed-case file names can also be problematic when porting to case-insensitive file systems like Windows' or Mac OS X'.

eliben · on June 3, 2012

++ for being nicely documented

saraid216 · on June 3, 2012

I suppose I'll have to be the first to express shock that they decided to name it Folly. I feel like Poe's Law is expressing itself.

andralex · on June 3, 2012

"Folly" started as an internal codename loosely based on "F"acebook "O"pen source "LL"ibrar"Y". When the time to choose an official name arrived, we found "folly" too funny to not use.

tudorb · on June 7, 2012

Now with discussion group: https://groups.google.com/d/forum/facebook-folly

pjmlp · on June 3, 2012

Very nice addition to my C++ toolset.

bearwithclaws · on June 3, 2012

Is the 3 "o" intentional?

bwei · on June 4, 2012

Very nice.

alainbryden · on June 3, 2012

This is a conspiracy theory, but what if there are bugs in it that they can't find and so they're open sourcing it in the hope that someone else will fix it.

oconnore · on June 3, 2012

That isn't a conspiracy, it's one of the major reasons that anyone open sources anything ever.

tudorb · on June 4, 2012

Not the main purpose, obviously, but if you find any bugs, I'll be more than happy to take a look.

ifewalter · on June 3, 2012

i think you're just paranoid