Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Quote tweets I'd do as a reference and they'd basically have the cost of loading 2 tweets instead of one, so increasing the delivery rate by the fraction of tweets that are quote tweets.

Hashtags are a search feature and basically need the same posting lists as for search, but if you only support hashtags the posting lists are smaller. I already have an estimate saying probably search wouldn't fit. But I think hashtag-only search might fit, mainly because my impression is people doing hashtag searches are a small fraction of traffic nowadays so the main cost is disk, not sure though.

I did run the post by 5 ex-Twitter engineers and none of them said any of my estimates were super wrong, mainly just brought up additional features and things I didn't discuss (which I edited into the post before publishing). Still possible that they just didn't divulge or didn't know some number they knew that I estimated very wrong.



I think the difficult part would be that tagging and indexing the relationship between a single tweet and all of its component hashtags (which you would then likely want metrics on to avoid needing to count indexes on, etc.) is where it would really start to inflate.

Another poster dug into some implementation details that I'm not going to go into. I think you could shoehorn it into an extremely large server alongside the rest of your project but then you're looking at processing overhead and capacity management around the indexes themselves starting to become a more substantial part of processing power. Consider that for each tweet you need to break out what hashtags are in it, create records, update indexes, and many times there's several hashtags in a given tweet.

When I last ran analytics on the firehose data (ca. 2015/16) I saw something like 20% of all tweets had 3 or more hashtags. I only remember this fact because I built a demo around doing that kind of analytics. That may have changed over time obviously, however without that kind of information we don't have a good guesstimate even of what storage and index management there looks like. I'd be curious if the former Twitter engineers you polled worked on the data storage side of things. Coming at it from the other end of things, I've met more than a few application engineers who genuinely have no clue how much work a DBA (or equivalent) does to get things stored and indexed well and responsively.


Twitter has full-text search, not just hashtags.

Also, the big data storage isn't text, it's images and videos.


You’re missing metadata in your size estimates.


[flagged]


I’m not sure why you are asking him to do the thing you’ve literally quoted him doing?


I believed he was saying the author should run his idea of indexes on disk taking up a lot of space by the engineers.


If an inspector reviews your house and finds no issues, that is indeed evidence of absence.


This is critically wrong, and misses the point of the cliché entirely.

Absence of evidence, in your case via a clean building inspection, does not mean the building is safe. It just means the checklist of known items was considered and nothing bad found.

Ask a building inspector if their clean report proves nothing is wrong with the building.

They will be firm and quick to inform you that it’s not a warranty — anything not checked was not covered. Items not covered could still be significant problems.

That’s the whole point of the saying. Absence of evidence is not evidence of absence.


I believe you have conflated "proving a negative" with "evidence of absence".


Sure, but it’s not definitive evidence.


But evidence is not necessarily proof.


Sure, but if someone accuses your house of having issues, and you retort that you've had it inspected by professionals, a reply of "Hah! That's evidence, not proof!" is just a bit smarmy.


A few weeks ago there was in incident[0] in Jersey, where some people called fire fighters one evening because they could smell gas, the fire fighters didn’t find any leaks, and the building literally blew up the next morning. Experts make mistakes, and failing to understand that evidence != proof can literally kill people. Sometimes, making the distinction is smarmy; other times, it’s just being sensible.

0. https://news.sky.com/story/amp/jersey-tower-explosion-questi...


Okay, but... we're spit balling database sizes. None of this is safety critical, or even in the general neighborhood of things where it's important enough to go and mathematically prove that our numbers are perfect.


Not necessarily. The inspector could be corrupt, bad at their job, whatever.


>Absence of evidence is not evidence of absence.

>possible that they just didn't divulge


I don’t think that hashtags are a search only feature. In the posts themselves, the hashtags are clickable to view other tweets. I don’t think that qualifies as a search.


It does strike me as a feature you'd typically serve out of some sort of search index since if you had to build search, you'd essentially get indexing of hashtags "for free"


You are probably right and I am wrong. I just looked at a tweet and clicking the hashtag takes to the search page with that hashtag typed in. Probably implemented similarly behind the scenes. Though hashtag most likely does an exact match search instead of fuzzy searching for regular words and phrases.


it does case matching (#hashTag === #hashtag === #HashTag) too




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: