Hacker News new | past | comments | ask | show | jobs | submit | lyal's comments login

SEEKING FREELANCER - MoonlightWork.com is seeking talented freelance developers around the world. Set your own rates, work with amazing companies that value moonlighting.


I'm very excited to see this -- data portability and management is a primary struggle we're trying to map out. Would love to see an engineering post on what you did for ElasticSearch.


Can do. A lot of the magic happens in the es/indexer and search lambdas here: https://github.com/quiltdata/quilt/tree/master/lambdas.

The short of what we do: we listen for bucket notifications in Lambda, open the object metadata and send it, along with a snippet of the file contents, to ElasticSearch for indexing. ElasticSearch mappings are a bit of a bear and we had to lock those down to get them to behave well.

What are the big barriers you're bumping into on the data management and portability side of things?


Seems like it'd be more elegant (and probably cost effective) if you stored the Lucene indexes inside the buckets themselves.


That is an interesting idea. What kind of performance could we expect, especially in the federated case of searching multiple buckets? Elastic has sub-second latency (at the cost of running dedicated containers).


That's a bit of an open question right now, unfortunately. Using S3 to store Lucene indexes is a roll-your-own thing since last I checked, and the implementation I wrote currently deals with smaller indexes where files can be pulled to fully disk as needed. S3 does support range requests, which I'd think would mimic random access well enough.

Assuming whatever ElasticSearch implementation you're using is backed by SSDs there'd likely be more latency with S3, but I'd expect it to scale pretty well. Internally, a Lucene index is an array of immutable self-contained segment files that store all indices for particular documents. Searching in multiple indices is pretty much just searching through all their segments- which can be as parallel as you want it to be.

To be honest, I'm actually surprised the Elasticsearch company doesn't offer this as an option. Maybe because they sell hardware at markup?


No.


Yes; unfortunately QA was used here, when QA doesn't capture what we do. Probably my fault in the interview.


Thanks much -- this is a company, product, and project where we have the liberty and passion to say that quality is what we care the most about.

Also: I'm lame, I went with "We're the Hotmail of Food Delivery." Still not sure how that deck worked out ;-) /s


Definitely a concern within the business -- we're trying to solve this through a combination of tooling and reassignment of reviewers to the same projects. Within tooling, we use analysis to pair developers with expertise against a project (if you've built a middleware stack in python ten times, it becomes easier to gain context and identify issues).

Totally true in places like mobile/react, or projects where teams are unlikely to have significant internal expertise despite technology in production (see a midsized game studio running an erlang server for chat) providing value is easy. We believe our approach is working, and will continue to get better.

(On a personal level -- huge fan of your work!)


Thanks for replying directly! Btw what I said is definitely not a limit for your business IMHO, because even if you continue to have an high rate of false negatives, many businesses may find valuable to find some bugs using your service: to lower the amount of imperfections can be already good enough. After all no code review is able to spot any bug. It's just that in complex software with many moving parts, one should understand that probably two kinds of reviews are needed, more "local" reviews that can still spot certain bugs, and other conceptual reviews made by people which are very expert in the code base. Moreover, I think that for large customers, you may even have a product that involves developers becoming experts of a single code base of a large customer to provide in-depth reviews. Good luck!


Yes; one of the reasons I created the company was to give flexibility to people I've seen drop out of the industry for one reason or another (be it having kids, illness, etc). It's a real shame when life condition means that years of incredible experience go away.


We both internally and have customer rate reviews -- we also establish corpus details with customers during the onboarding process. On the whole, we believe it's valuable to stay within the structure established within a team and nudge things forward rather than making wholesale changes to disrupt.


Ah! Bot in this case is the label applied to any application within Github.


Okay didn't realise that, thanks. So it's humans. Well, yes, in that case I fully agree with others' comments about sustainability and scalability.


Disclosure: I'm Lyal Avery, founder of PullRequest.

On functional terms, we have a couple of thousand reviewers that have signed up to review. Our tooling helps them work faster; long term, we'll also provide this tooling to internal terms.

So far, we have about a dozen of the Fortune 500 signed up; larger teams, for a variety of reasons, are more open in some cases (one of the many learnings of this startup).

Edit: signed not signing.


First off, I wish you all the best.

"we have about a dozen of the Fortune 500 signing up"

Let's be honest. What this means is that someone in the Fortune 500 clicked on your email advertisement. Maybe installed your software.

Do you have anyone in Fortune 500 running critical business on your platform?

> we have a couple of thousand reviewers that have signed up to review.

Anyone can sign up a couple thousand "Mechanical Turk" reviewers in no time. Or "RentACoder" or whatever. What is the value provided? Where I work, the value of "review" provided by someone not deeply involved in the code is "0". Which is why you ask some relevant people to review the code.


Yes; we most definitely have teams from Fortune 500s running mission critical code through out platform. We will be releasing a case study from one in the next quarter.

I would say the difference on the reviewer side of our platform is in quality. We are working to surface this more.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: