Hacker News new | past | comments | ask | show | jobs | submit login
Dataset of VCs investing in seed and Series A+ rounds (unicorn-nest.com)
207 points by scherbak on April 29, 2020 | hide | past | favorite | 45 comments



Judging by the 100% match rate with Crunchbase I'm guessing this is an analysis of CB data. It's still valuable, but I'm curious if/how they managed to secure a license to redistribute this publicly. I've spoken to their sales team before and they've very particular about how you can export & share their data.


Can't be used for analytical purposes... Welp looks like no one should even download or look at it. You could violate the ToS by looking at the data and drawing conclusions!


... whoops

df['Some of TOP industries'].value_counts()[:10]

Health Care, Biotechnology 129

Medical, Health Care 104

Biotechnology, Health Care 99

Software, Information Technology 90

Health Care, Medical 89

Pharmaceutical, Biotechnology 85

Medical, Biotechnology 81

Medical Device, Health Care 74

Health Care, Medical Device 74

Therapeutics, Biotechnology 73


Feels like someone in legal wanted that written in without appreciating that you don’t release datasets just to look pretty. You release datasets so people can look for insights.

Or was PR the point?


Looks to me like it is basically a sitemap to their website with some additional info sprinkled in.

I'm not really involved much in VC so I don't know how useful and reliable that information is or how hard would be to come by otherwise?


I had the exact same question.

Edit: and judging by the sites robots.txt you could just scrape the information your self and avoid any ToS problems.


The practice of stirring up controversy to get attention, especially in online venues, and viewing it as a positive rather than a negative, is something I learned a lot about from "Trust Me I'm Lying: Confessions of a Media Manipulator."

If this popped up after the Disney tweet made it to the top of Hacker News and went viral on Twitter, then I suspect this was entirely intentional.


Yeah the terms make no sense.


> Can't be used for analytical purposes

> You could violate the ToS by looking at the data and drawing conclusions!

What does that say about VCs...


> Shall not be used for any scientific or academic research, in commerce, for analytical purposes, for or any mailout or information distribution purposes, as well as for any illegal purpose

Well that's a bummer


Not to worry. It says "shall not" instead of "must not."


RFC 2119:

> MUST NOT This phrase, or the phrase "SHALL NOT", mean that the definition is an absolute prohibition of the specification.

https://tools.ietf.org/html/rfc2119


RFC's aren't legal documents. See https://feltg.com/shall-will-may-or-must/ "The only word of obligation from the list above is must – and therefore, the only term connoting strict prohibition is must not. The interpretation of everything else is up for debate."


Yeah, this is a hard no from me.

It also provides me with little confidence that the data itself is particularly useful, since the terms basically prevent anyone from publicly doing any kind of cross-validation. Scientific peer review didn't appear as some sort of bureaucratic oversight, it appeared because people value information that other experts have reviewed and said "Yeah, that checks out." By even trying to prevent people from doing that, I think the value of this data set is drastically reduced.


Thanks for pointing this out. We've actually meant that any use for personal purposes is allowed. Not as strict as it sounds in the ToS.


Today we published the database of funds for free. This is currently the largest, most complete and most relevant dataset in the world. It contains more than 500,000 data cells about 26,000+ venture funds and more than 30,000 employees who make investment decisions, including the rules of their email formation (we cannot share emails directly because of GDPR). We will update investor profiles on our website on a weekly basis, and this database on a quarterly basis.


Does analyzing the data and incorporating into slide decks for class material constitute "academic research"? I'm trying to stay within the terms of use.


No, that's fine. We don't consider this as academic purposes.


Incredible. What led your company to divulge this valuable data?


We don't see any intrinsic value in owning the dataset. We are working on a tool that optimizes the search of investors and saves dozens of hours. We are happy to share the dataset with those entrepreneurs who know what they need and what type of an investor they are searching for.


What is the reason to ban academic research?


If I want to publish a public visual on some of the most common words in the field "Some of TOP industries" - does this fall broadly under analytical purposes? I'm not sure I even understand the purpose of releasing the data if it's not to be analyzed. In a commercial manner I understand...


we meant prohibition of use in a commercial manner. Will have to adjust the terms accordingly. Thanks for pointing this out.


Is there not massive irony in publishing "very valuable" something that is then restricted to any practical use? seems to echo the bizarre and tortuous legal world these funds, and their people, live in...


wget https://backend.unicorn-nest.com/investor/csv

They purport to assert some undefined kind of intellectual property rights over the data using a clickwrap contract. I modified the DOM to say "Disagree" before clicking. In the US, I'd think Feist would protect you from any copyright claims on the data, although I haven't looked at it in detail, so it might be creative.


"Your honor, I simply modified the DOM to my liking and now claim ownership of all copyrighted material everywhere!"


That does seem to be the reasoning behind clickwrap contracts in general, yes — that by someone clicking on a button or opening an envelope or whatever, they are signaling their agreement to whatever contract terms are in the DOM or written on the paper or whatever. It's absurd, but if you accept it, it seems that you would have to accept that the fact that the button I clicked said "Disagree" means that clicking on it did not signal any such agreement.


I think this is cool. Not going to bother anyone just yet. For any VCs reading this, do you like this sort of thing or is it intrusive? Feels like your inbox would get powerbombed with shitty dealflow.


Most shitty deal flow gets ignored and it doesn’t serve founders well. The best approach is to spend time to research the firm and the partner you’re looking to seek investment from. You’ll get results when what you’re building matches what the partner’s thesis or perspective is.


I agree. It just seems like basically giving out their email addresses is a bridge too far, even though it's not that hard to figure them out.

What if this data was clustered not just by top industries invested in, but by thesis and ethos?


Yes agreed. That would be very helpful indeed. Even more so, which partners invested in which companies would also be good.


Hit Agree but no idea is 3 is true. If they scraped it, then they do not own it .

1. Is for information use only

2. Shall not be used for any scientific or academic research, in commerce, for analytical purposes, for or any mailout or information distribution purposes, as well as for any illegal purpose

3. Is the property of Unicorn Nest, all rights reserved.


I used to find the reviews on thefunded interesting (especially the companies that tried to get them suppressed). Anything recent like that?


Very useful. Big job. Thanks Denis


Saving this as a binary excel file (.xslb) speeds the spreadsheet up a bit.


From the ToS it seems to belong to Disney


WOW! Thanks for this!


I bet this dataset will be much more popular in investor’s network.


You guys rock! Thanks


For those that want to use this dataset for analytical purposes, research. Use this link:

https://backend.unicorn-nest.com/investor/csv

No terms are required to be agreed to if downloading directly.


Wait, are you an owner of this?

> No terms are required to be agreed to if downloading directly.

That’s not how terms work... Downloading something from a deep link doesn’t mean it’s automatically WTFPL. (Unless you’re an owner and you say so, of course.)


Not the owner.

Those terms were only presented and asked if you agree if you went through the landing page. This provides a direct link. Follow local laws.


This doesn't seem to hold up at all. What about using bots to scrape a page, they dont see the EULA but still can violate licenses.


While I think that you are legally correct, grabbing that dataset to play with doesn't seem to carry a high realistic risk of punishment. I may be wrong, but that's my sense of things.


The landing page link allows for playing around as long as it doesn't fall under research (you cannot cite these is my guess). You cannot republish either.

But the data are facts and can be used. The presentation falls under copyright. Not sure where this leaves you or us legally. In the end it's a public link, containing public data with the organization of the document being copyrightable. The author has another link somewhere else with terms excluding activities.

Definitely in the gray area. Punished unlikely. In the clear if someone has an ax to grind unlikely.

Probably best to just ignore in the end.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: