Citadel and Two Sigma both have had large teams working on this for years. They are making a lot of money doing it but barriers to entry are very high. You need to collect all the data in a way such that you can reconstruct how it looked at any point in time. The vendors can’t be trusted to not make retroactive corrections, so you have to collect it for years before it becomes useful. Doing that takes a lot of time and money.
One of the wildest things I found while working on Wall Street was that ad blockers are wholesale selling data to analysts and funds. For example, I reached out to Ghostery asking if they had data about the data analytics tech / ad tech vendors used by industry, website, etc. to estimate market share and they responded back with a sales person telling me I could buy that data from them for five figures per year and that a number of my peers and clients were doing the same.
Ghostery used to belong to a company named Evidon, which had this business model of collecting data to sell it in some form (the feature was called "Ghostrank"). In 2017, Ghostery was acquired by Cliqz (which builds an independent and private search engine from Germany), and since then a few things happened: (1) the extension was open-sourced, (2) Ghostrank was completely removed and (3) Ghostery is now developing paid products as a business model. For example we launched Ghostery Insights[1] and Ghostery Midnight[2] recently, both of which are subscription-based.
Thank you for the update! I contacted Ghostery back in 2017, though I don't recall when exactly, and at that time I received a response from a salesperson. I was working on a project to try and understand market share of digital marketing software across the web at the time.
Ghostery seems to have fallen completely out of favor for this behavior, replaced by the EFF's Privacy Badger, actual ad-blockers, and even some browser built in functionality.
It would be interesting to know which other privacy plugins that sell data. Current recommendations seem to be pretty unanimously favoring plugins which do not, but it is always a moving target.
(1) Ghostery did not "fall completely out of favor", as far as I can see. It is true that in _some communities_ (in particular some sub-reddits), some people tend to recommend different addons instead (often not based on any technical arguments, though), but this is not a trend that can be generalized (more people continue to recommend Ghostery as a very solid privacy protection suit).
(2) EFF's Privacy Badger is not a replacement for Ghostery, for more information about why, we wrote about it in the past[1][2].
(3) Ghostery has an "actual ad-blocker" built-in, in fact, it is one of the most efficient out there as was shown in a study that we published this year[3]. The adblocker as well as benchmarks are open-source and anyone can run them locally to verify the claims. We also more recently wrote extensively about the internals of this adblocker[4].
>Citadel and Two Sigma both have had large teams working on this for years. They are making a lot of money doing it but barriers to entry are very high. You need to collect all the data in a way such that you can reconstruct how it looked at any point in time. The vendors can’t be trusted to not make retroactive corrections, so you have to collect it for years before it becomes useful. Doing that takes a lot of time and money.
As someone who works in this space I assure you the barriers to entry are not that high. If I were to sort the various lists of accounts alphabetically you'd see names like Citadel and Two Sigma flanked by hoards of tiny funds you've never heard of. Many of these funds you've never heard of are 5-10man businesses or teams within larger funds. Making money is much more of an algorithms problem than a resources problem.
Can also confirm this - the research team I used to work on was just three people. We worked directly with some 30 or so funds, successfully, including Citadel and Two Sigma.
Did you find the work to be intellectually satisfying and creative, or was it somewhat rote in the sense of having to spend lots of energy on data hygiene?
I found it to be very intellectually satisfying and creative! But there were parts to the job which were also very boring, including cleaning data. Cleaning data took up probably about as much time as all the fun analysis and exploration.
no disagreement that there is a huge long tail of small successful quant groups out there.
However, specifically in the realm of sourcing and processing novel raw unstructured datasets and creating signals that other firms don't have, as far as I know the big firms (some others in addition to the 2 I mentioned) dominate this area, as its too expensive and time consuming for smaller groups to do.
Dealing with the corrections is just another product. When I worked with IBES data, which collects all these estimates, you needed to buy a package called something like "As Was" to recreate any arbitrary point in time estimate.
The two funds I mentioned can afford exclusivity deals. When they expire the rest of the street catches on, by then they are already on to new ones. At any given time they have multiple in the pipeline.
I worked on an exclusive dataset with citadel. It’s never as valuable, the last few years were a special case where most funds were lagging on the analytics. At this point it’s nearly democratized. Also, even if you know exactly how a company is doing that doesn’t mean the stock will follow the companies real value. The whole thing is way more complex. Two sigma also doesn’t make as much money as they claim, they’re more of a market maker.
I can also confirm this. I used to curate data and develop equities forecasts professionally for about 30 or so funds, including Citadel and Two Sigma. It’s getting harder to build a successful trading strategy based on “quantamental” analysis alone (“alternative data”) each year.
A lot of fundamental hedge funds turned to this in the early 2010s as awareness of big data became a thing, thinking they could close the performance gap with the quant funds. It didn’t work. The quant funds that purchase this data use it as only one dimension of analysis to confirm a hypothesis which has already been empirically tested across many other inputs.
I have a specific example I can talk about, because my old firm abandoned the data: I found a reliable method for predicting exactly how many Model X and Model S vehicles Tesla sold well before earnings each quarter of 2017, including complete configuration data for each vehicle. Even with that KPI in hand, I couldn’t successfully forecast where the stock would go after each earnings call.
of course it’s more complicated than that - the end product is more features that can go in to all kinds of different trading strategies. For example as you said they are a big market maker - if they can more accurately predict earnings surprises with this data then they can price options more accurately in their MM strategies. Used properly it’s a lift to everything they do.