Hacker Newsnew | past | comments | ask | show | jobs | submit | hrshtr's commentslogin

+1 to the podcast. He seems to be a calm guy and it is good to hear about his thought process and the vision he has. Interestingly he loved playing video games and he owes playing video game to help him in critical thinking.


> he owes playing video game to help him in critical thinking.

I often hear people attribute part of their success to things they already did, and I think it's usually just a combination of survivorship bias and other similar effects. I've heard people say similar things of nearly every substance or activity: "I am successful and doing X is part of the reason" where X can be anything from illegal substances to something mundane like coffee. Usually when X is something one likes, one tends to find it interesting.

This has been on my mind after recently debating with friends whether some substances like weed or shrooms should be legal and them basically saying they shouldn't because they know people who have taken them and turned their lives for the worse. Then I brought up how you can say similar things about video games (which IMO they are addicted to) and they quickly said it's different and you need to perform some impossibly complex studies to determine anything.

This is likely part of the reason echo chambers form everywhere. If ideas confirm our own beliefs, we hold them to much lower standards than when they don't.


Yup. Everything is ex-post-facto rationalization chock full of survivorship bias with no way for you to accurately calibrate for the difference in personal and temporal circumstances and accurately map what outcome you would have if you do X.

It's not worth listening to anyone's advice. Only experiment with your own life and find what works for you or die trying.


Not everything - sometimes you can filter useful data when things are clear cut.


The above data is only of engineers hired using TripleByte platform. I saw only a handful of open positions with Apple. The data is based on the type of companies uses the platform to hire which probably doesn't involve big paying companies such as Googles, FB, Uber etc


These companies have massive hiring pipelines already.


+1 to the book. It has been a great resource for me to understand lot of concepts on Distributed Systems


Thats true, I am thinking that Nazar is more like spam filter and monitors the user behavior.


Pretty much, yeah.


i was just talking with my friend to have something like this. It takes a while from being swipe right to book an actual date


Was using Thrift or Protobuf an option?


I'd like to know this too. As a passerby, those seem to have solved serialization, so I'm curious why you need rowfiles instead of e.g. protobuf.


One reason on top of my head: Using such communication protocol would require changes to the other services consuming it.


So did switching to their homebrew serialization format -- in fact, most of the article is about how they managed the changes (which touched codebases at multiple sites in a fairly large organization).


Those switches all occurred at the pipeline level, leaving the map-reduce platform untouched. Switching our base logs to something like Parquet, Thrift or Protobuf would be a much larger project. We do support writing and reading Parquet to allow us to interface with other big data systems.


We started developing rowfiles around 2005. Thrift wasn't open sourced until 2007. I couldn't find a date for protobuf's release, but I don't think it was standard outside of google at that time. We use protobufs internally, and have a number of Rows whose field values are byte[]s containing protobufs. One big thing our rowfiles gives us is fast indexing. The only other big data format I know of that gives that is Kudu, which uses the same indexing scheme.


> Kudu

Do you mean Apache Arrow?


Nope, Kudu https://kudu.apache.org/. Although from Arrow's homepage it looks like it works with Kudu. "Apache Arrow is backed by key developers of 13 major open source projects, including Calcite, Cassandra, Drill, Hadoop, HBase, Ibis, Impala, Kudu, Pandas, Parquet, Phoenix, Spark, and Storm making it the de-facto standard for columnar in-memory analytics."


Former Kudu developer here.

Kudu was designed to be a columnar data store. Which means that if you had a schema like table PERSON(name STRING, age INT), you would store all the names together, and then all the ages together. This lets you do fancy tricks like run-length encoding on the age fields, and so forth.

There is also a Kudu RPC format which uses protocol buffers in places. But Kudu also sends some data over the wire that is not encoded as protocol buffers, to avoid the significant overhead of protocol buffer serialization / deserialization.

Apache Arrow is a separate project which started taking shape later. Essentially it was conceived as a way of allowing multiple applications to share columnar data that was in memory. I remember there being a big focus on sharing memory locally in a "zero-copy" fashion by using things like memory mapping. I never actually worked on the project, though. Developers of projects like Impala, Kudu, etc. thought this was a cool idea since it would speed up their projects. I don't know how far along the implementations are, though. Certainly Kudu did not have any integration with Arrow when I worked on it, although that may have changed.

Protocol buffers is simple and has a lot of language bindings, but its performance is poor on Java. The official PB Java libraries generate a lot of temporary objects, which is a big problem in projects like HDFS. It works better on C++, but it's still not super-efficient or anything. It's... better than ASN.1, I guess? It has optional fields, and XDR didn't...


What is Thrift? Is it a service?


Thrift is a software library not a BaaS...


In bay area there are quite a handful of such firms which fake person resume and help them find out the job. What surprises me is that employer does catch the difference in experience while interviewing and bunch of positions are filled. These firms file h1b shown more experience than what a person does actually have and make good money out of such schemes :(


Please rephrase your comment in complete, comprehensible English sentences.


"In the bay area there are quite a handful of such firms which fake a person's resume and help them get a job. What surprises me is that the employer does catch the difference in experience while interviewing and a bunch of positions are filled (anyway?). These firms file h1b applications showing more experience than the person actually has and make good money out of such schemes :("

It seemed mostly comprehensible to begin with. I think a discussion of immigrant visas is actually a really poor place to insist on only hearing from people who speak perfect english.


congratulations to WifiDabba team. This is great progress for Aam Aadmi. To the founders - Why did you decide to join YC?


get into data science/engg reduce body fat and be big by 10%


I have been in Bay Area for little over 3 years with a fairly stable company. The opportunities are many in all technical fields(pro) but again it comes with the cut throat competition(con). One has to compete with people from FB/Google or new grads who have mugged all DS questions. My interest to stay in Bay Area is with the hope I could be able to join one of the companies which will be Uber/Airbnb of tomorrow and gain great experience and $$$.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: