I had never thought about this before but from this repo I can see all these smart engineers at the big tech firms live within their own walled garden (or prison?).
If it weren’t for such a vibrant open source world then they’d really be stuck if they tried to apply their knowledge outside of their companies.
It is almost the other way around. Many of these open-source projects, have roots with engineers leaving these big companies and wanting to use the same tools, so they create them. Or, by these big companies releasing papers and then folks creating open-source versions. Here's some examples of the latter: Dapper (w/ OpenTracing) [1], Spanner (w/ CockroachDB) [2], Borg (w/ Mesos) [3], Dremel (w/ Drill) [4]. I'm sure there are tons more examples but these are just off the top of my head.
Prometheus is borgmon. Which is a bit funny as it's used by k8s now. It's also funny as the author left Google, wrote it in 2 years for SoundCloud, then went back to Google.
Why is the part that k8s uses it funny? That would have been one of the goals right? K8s is like lot of these projects except that it was created as a open source version of borg from within Google.
It’s funny because inside google borgmon is deprecated for new projects and has been since nine years at least. Of course google has stretched “deprecated” to its limits.
The replacement (Monarch) is similar to borgmon except:
* All metrics have an associated type. Eg. Response time (milliseconds). That's great because units for derived metrics can be dynamically computed. Eg. Bytes/second.
* The query language can fairly efficiently compute metrics at query time rather than needing everything precomputed (eg. 95 percent latency across 1000 tasks can be calculated in real-time).
* The config system is a mess and nobody likes it. Borgmon uses a DSL which is obscure, but almost identical to Prometheus. Monarch has various different config frontends (mostly around the idea of running code to produce an expanded protobuffer config) which all suck. Luckily because there isn't a strong requirement for rules to aggregate data, you don't need much config for most services - just say "scrape everything and keep it for a year".
* There are "levels" of storage at different speeds. In memory, on disk, etc. You have to configure where to put what data. You can also downsample (eg. Change scrape interval to 5 mins after a week).
* Metric names follow a directory-like heirachy. Since tasks can easily have 10k exported metrics, that's pretty important. No need to scrape the ones that aren't relevant.
* It has a shinier UI.
* It has support for exemplars. So to answer the question "Give me an example of a request which saw this high request latency". With not much added code to the monitored service, a small number of exemplars are captured and aggregated in a way that median and outlier exemplars are available. They're super useful for finding out the cause of random slow performance.
* It is run as a service. Rather than code that every team has to run, the new thing is a single instance for all teams in Google. That in turn means it can be more complex, have more dependencies, etc, without being a burden on the user.
> The query language can fairly efficiently compute metrics at query time rather than needing everything precomputed
I'm having hard time figuring out how query language might affect exectuion time. Granted, I've never seen borgmon in action, so the first question should probably be: how different its query language is from what Prometheus offers? If it's more on imperative side than declarative one then your point might actually make sense to me as it is.
> You can also downsample (eg. Change scrape interval to 5 mins after a week).
Does it allow different aggregation methods (avg/sum/max/whatever) for different metrics, like Carbon does?
I think what he is saying is that it is easier to do something ad hoc in Monarch since it is a global service with unified storage and retrieval. A Googler can join the data of Gmail and Ads easily if they want, whereas combining the borgmon data of those services is either not possible or it involves downloading CSVs and performing the join externally.
> Does it allow different aggregation methods (avg/sum/max/whatever) for different metrics, like Carbon does?
All of these methods are available[1]. Stackdriver Monitioring is the public face of Monarch.
No, I think it's strictly about where aggregations and other computations occur. You hint at that when you mention CSV files. Borgmon keeps X hours of data in memory and looks up anything older from TSDB. I never understood how to aggregate data from the same service, after the fact, in a reliable fashion. Or if it was possible at all. I believe it was only working when looking at data in memory. That's why people had rules to aggregate data at multiple levels as it was being ingested, then persisting that into TSDB.
I don't remember the latter being much more sophisticated than a bunch of files in GFS with the timeseries and some metadata, either. Monarch, I think, is based on Bigtable. It has a richer API. It might even use BT coprocessors to perform computations closer to the where the raw data is stored, rather than in the Monarch frontends. (I haven't watched the public talk, so I'm just going from vague recollection.)
I am pretty sure that was developed for Monarch. Perhaps later it was adopted by BM to consolidate storage, but still through a dumb API? I know that originally it had a bunch of files in GFS, with a layer of indirection, because there was a tool to fix them.
Importantly, Monarch is push-based and centralized. Previously, product teams would have to run their own borgmen, and those in turn would get scraped by the upstream borgmen of their orga for aggregation, archiving etc. Monarch is more of an As A Service offering.
The fact that Monarch configs can be written in Python instead of Borgmon is a huge win for our team; being able to write and debug our own alerting rather than have to bug SREs every time has been worth the switch alone.
Monarch, like Borg, is configured by RPC. You can use python if that suits you but you can also use C++ or Java or Go or any language capable of putting an encoded protobuf on the wire. This is a mistake people also make about Borg: there is borgcfg but borgcfg is not Borg's API. You can use the Borg RPC interface from any language and borgcfg is optional.
Compare to borgmon where use of the DSL is obligatory.
A close source told me only half joking that at Google, when you start any new service, you‘ve got the choice between about a dozen backing services, of which half are deprecated and the other half are not officially supported yet...
The actual proverb, by an engineer (Paul C), is "there are two ways of doing things at Google: the deprecated one, and the one that doesn't work yet". Usually the deprecation occurred when a service outlived the original requirements and was no longer a great fit or not very usable. A good example were Babysitter and GWQ, which were eventually obsoleted by Borg.
Fwiw, I like Borgmon, but I also like APL derivatives :). My sadness is that Borgmon’s type system is quite poor, but like many “product of necessity” systems it actually worked incredibly well.
A lot more than just ex-FAANG folks would be stuck without the OSS technologies listed in the right column. I daresay, considering much of this software was authored by Xooglers or inspired by Google papers, that it's actually everyone else who would be stuck. The folks who have seen how these problems can be solved at scale would just reimplement the solutions, as indeed they have done.
With Microsoft, this has changed substantially in the past decade, and there are many teams running entirely on public online services (GitHub, Azure DevOps, Travis etc) and with end-to-end open source tooling and dependencies. So it's much less of a technological bubble than it used to be. Probably less so than Google at this point. Can't say about Amazon.
They don’t, and if you’ve got the G-factor, pun intended, to get an offer at Google you can get an offer anywhere. Which, honestly, makes this repository all the more confusing to me.
I’m sure there are some people that have lived only within this ecosystem, but that’s the possibly the height of privilege.
Eh you'd be surprised, I couldn't even get an interview with some places, got denied from others, but now work at Google doing full stack. Every technology I use though is Google internal, and I don't have much knowledge of full stack development outside of Google. Some companies just test your algo / ds / system design knowledge, and sure I do fine there, but any specific knowledge I'm useless. TBH I think I'm not very good at any one language either because I often have to switch
This is pretty misleading: a lot of the presented open source equivalents are nothing like the original. For example, comparing HDFS to Colossus is... optimistic? Having used / worked on/with both, the similarity is roughly that they both store bytes.
This would be a lot less misleading if there were annotations and if the "open source/real world" didn't mix together other large companies' products with open source. For example protobuf is open source, but this makes it look like it isn't.
If you needed something Colossus-like in the open source world, what would you use? Is there anything available that's better than HDFS for the "general-purpose, big, distributed file system" use-case?
Ceph. Although it's slightly upside down with it's block / objects semantics (and as such, you don't exactly get the namespaced append-only semantics of Colossus), and doesn't depend on an excellent separate BigTable-like and Chubby-like layer underneath, instead rolling their own.
Sourcegraph CEO here. Cool to see we are listed as a CodeSearch alternative. We assembled some links to Google internal docs and studies about how they use their Google internal CodeSearch tool. See the links at the top of
https://docs.sourcegraph.com/user/search if interested.
What does Google (or others) use for an intranet or team documentation? We're using Quip and it is just a black hole where documents go to die, never be seen again no matter how badly you want to find it.
For many years there was also a wiki that just didn't refuse to die. I think that when I left there was yet another attempt going on to migrate the content away from it. Surprisingly, one of the main appeals was that its availability was independent of Google's production environment (Borg, GFS and co.). If Docs or Sites were down, which was slightly more likely inside Google because most employees were on dogfood builds, it was going to be a fun day...
Amazon had a giant internal wiki that they used for team information, playbooks, design docs, etc. Formatting was kind of ugly, but the barrier to making an edit was super low, and having everything in one searchable place was super useful. From experience, I think any system that requires more than a few minutes to make an edit ends up not being updated and going stale pretty quickly.
The challenge with Confluence - and all documentation systems for that matter - is that it needs a manager, someone who makes sure it's maintained, old / outdated pages get removed, the landing page is neat, and authors are encouraged to keep it updated.
I can imagine that can quickly become a fulltime job though. Still, a formal role of documenter / archivist in a bigger software endeavour should be considered.
Challenge with Confluence for me in particular is that it overrides common keybindings, e.g. C-b in a textfield inserts a pair of asterisks instead of moving the caret backwards one character.
It used to be Google Sites (then part of Google Apps for Business which is now GSuite), Docs and Slides which were indexed by an internal Google search engine called MOMA (which I believe was powered by the Google Search Appliance, but my recollection on this last point is faint)
Dory is available now! If you go into presenter view in slides there is an option to launch it. It will put a short URL at the top of your presentation.
It’s not the same thing. Slides Q&A is a very poor implementation, and isn’t persistent. Doesn’t work while someone is not presenting. Not to mention, you don’t always need Slides to have people ask questions.
An equivalent of this for Microsoft is pretty much impossible because there isn't one single way of doing any of these things at the company. Different departments/orgs/teams use different tools, some self-built and some off-the-shelf (including most of the stuff on this list).
I can't speak to Amazon, but at Microsoft the equivalents to the Google offering are going to have the same OSS competitors. Having worked on services there, the only thing I really miss is Kusto; it's now available as an Azure service, but I'm at a company all in on AWS :(
Not sure why you think free food is traded for free time. Free food helps me use my time at work more effectively. I usually have breakfast and lunch with teammates, and it's a mix of socialization and work-related conversations. Half an hour in the free cafe > 45-90 minutes going someplace to buy food.
Personal preference obviously but I like getting away from work for lunch. After eating if it's nice out I'll actually just sit myself on a bench and chill. 15, 30 minutes doing that is worth so much more to me than leaving work that much earlier.
If your income as a Google engineer is so tight that you have to make sure you get the most out of breakfast and dinner, go for it (but also maybe have a long hard think). The culture may well vary between sites, but this idea that Google (and others) exclusively employ fresh college grads and try to embrace then 24x7 just isn't true. We have families just like anyone else, and they matter to us every bit as much as everyone else's families matter to them.
I for my part am happy that I get a healthy, hot meal for lunch so dinner can be a quick snack when I get home, so I have more time with the kids. The alternative is more time spent on chores.
The fact that I don't have to exert ANY energy to decide on where to go for lunch vs. working downtown and having to decide between the same 5 places or spending 20 minutes pulling out and back into parking garages is very nice, and I'm nowhere near a recent college grad. Dinners also nice for those of us without anyone to go home to eat with.
I absolutely hate the "Whats for lunch today?" shuffle.
If you can eat faster, you get back to work faster. An early breakfast or tasty late dinner at work is a temptation to spend more time there.
I had a quick google and I’m not alone in my opinion. I’m not criticising the practice, it makes business sense and employees can choose not to use it.
That is mostly the food part though. I don't think anyone is questioning whether having a quality cafeteria at work is a good idea. It is the incentive to stay at work, especially dinner which can make working late become not working late enough. I am not sure what you mean by the cost? I can't imagine people would care about that.
Free food was originally introduced to Google in part because there were so few food options near the Mountain View campus that people were driving a half hour each way to spend twenty minutes eating. So yes, the goal was to get people to spend more time at work, but the alternative was “time spent sitting in traffic” rather than “time spent at home”.
(The other major driver behind food was supposedly the fact that the founders were PhD students and had found that some of their best ideas had arisen from informal discussions with other researchers over lunch, so they wanted to create an environment where employees had that same opportunity)
I don’t work for Google but my employer serves food. This is not true in my experience. Folks finish dinner at 6 and gtfo. Or they pack the food in containers on their way out and eat at home. Having the option of eating lunch within the office means I can spend less time on it and go home sooner. Same productivity, but less time spent at work.
But it's not free time at home. It's time cooking and meal planning and grocery shopping. And maybe that's with family and friends, but for many people it's probably done alone.
I feel like this might depend on the culture, and how late you usually eat dinner etc..
Because well, kids usually don't tend to cook for themselves so the parents end up cooking for them and eating with them.
That might be my bubble though, my friends / family have jobs that allow them to leave work around 5-ish and with school ending somewhere between 4-5, it's not that much after the kids are home. (or they pick up the kids after school).
Kids definitely can and do cook for themselves? I've also known families who would make and freeze dinners for the week so the kids just had to reheat them on weekdays. Or one parent gets home earlier and has dinner with the kids, but the other one will be later.
Most software companies in the Bay Area have flexible time. Just like everywhere else, Google SWEs don't punch a clock. You go home when you feel you've done as much as you can for the day.
Many places allow you to set your own hours when things are good. The real questions is things like if you get in early to beat traffic and something happens in the afternoon can you still leave without being considered "not a team player" for not "just fixing that thing we need for the meeting tomorrow"? Is the same true if you didn't get in early?
Not many things are so urgent they can't wait until the following day. But if something like that did come up, you could just leave at your regular time and continue working at home to fix the problem.
I'm jealous; my current job is billed by the hour, so I have to be able to describe every hour I've worked. Mind you that's usually a fixed 8 hour / day logged on project X, so having to write them down isn't a problem. But I can't go home early without feeling guilt; I mean I can, but that also means I have to do overtime sometime else in the month, to make sure the hours all check out at the end of the month.
It is nice to know you're being taken care of in the (hopefully rare) cases you do have to work late. And there's probably people working in ops that have an evening shift.
Googler here, can confirm. I work around 40 hours a week, sometimes more (as in 42 or 43, not 60) but more often less. Work-life balance is part of the culture.
We (Sourcegraph) are interested in building an alternative to Critique at some point in the future, or at least enhancing existing code review tools to offer many of the favorite features of Critique.
What are your favorite/must-have Critique features?
(Also would love to jump on a video chat if you or anyone else is interested, and we could live-stream/share it on YouTube for others interested. Email me at sqs@sourcegraph.com if interested.)
GitHub actually does some of these things already. Do these solutions help/suffice?
- Batching all comments in a single send: This is a core GitHub code review feature. You can send immediately by pressing the "Add single comment" button, but if you "Start a review", your comments are batched.
- Built-in fixes and lints on the PR "Files changed" tab: GitHub Checks offer this, although not many teams have adopted it (that I'm aware of).
- Changing indentation: The "Diff settings" menu has a "Hide whitespace changes" option, although it's not (afaik) possible to persist this.
- Auto-submit if tests pass: The Refined GitHub browser extension offers this (https://github.com/sindresorhus/refined-github), kind of. You need to keep the page open in your browser while it finishes.
Also, have you seen https://reviewable.io/ (made by an Xoogler) and does that appeal to you?
I'm totally biased, but I think Gitpod might be the closest thing to Cider here, being a hosted IDE + container solution that continuously prebuilds your workspaces along with your code.
The other options are just IDEs in a browser where you still have to hammer together your own setup + build manually every time.
Hmm, I was looking at Gitpod's pricing page and I do not at all like that clicking on the "Gitpod for teams" link takes me immediately to github to auth my account, uh no thanks.
Hi Josh, very sorry about the unfortunate surprise, we're aware of this and agree it's not nice.
We couldn't address it in time for the launch, so basically all the website's pricing options currently point directly to where you can buy them in the app, which implies a GitHub sign-in (where only a valid email is requested, NB).
We're planning to add un-authed pages with more info about each plan, but we also prefer spending our time on improving the developer experience rather than on payment options, so it may take a bit of time.
I've tried Eclipse Che (listed in the article), and it's pretty good. I'm looking forward to Che switching its editor to Eclipse Theia, which promises to be "the VS Code experience in the browser".
I recently discovered Coder's open source version of their web-based IDE (https://github.com/codercom/code-server). It's pretty much VS Code in the browser, which is almost exactly what I was looking for. You just download the binary, point it to a folder and it works exactly the way you would expect (including the terminal running under the account the server runs as so you get to keep your shell of choice and all that fun stuff).
Now with that AWS lock-in, no thanks. It only serves as a reminder of how early adopters get burned, similar though not as bad as the whole FoundationDB->Apple fiasco; that one affected me personally and drove home this lesson.
I don't enjoy being the insect who claimed a tall blade of grass only to be mowed down first. If you value your time and value stability, don't use closed-core tools.
Where AWS/GitHub is mentioned, there should be the equivalent Azure stuff, especially for bug tracking, CI/CD, repo subfolder "owners", code review, there's a ton of stuff equivalent or better to buganizer, OWNERS file, etc .
In services there should be vitess since it's both a YouTube and an open source project.
It seem to have a new transactional messaging feature with message sending and Ack: https://vitess.io/docs/advanced/messaging/, which seem to fill a slightly different niche than rabbitmq/pubsub/etc.
There is no listed equivalent of RecordIO. What do people use for high-reliability journals?
When I needed something like RecordIO to store market data, I couldn't find anything. So I implemented https://github.com/romkatv/ChunkIO. I later learned of https://github.com/google/riegeli (work in progress), which could've saved me a lot of time if only I found it earlier. I think my ChunkIO is a better though.
I suppose you mean "exactly" in a figurative way. Riegeli is definitely inspired by RecordIO and is meant as a successor to it but it's not RecordIO.
> Is there a reason that doesn't meet your requirements?
I need to store timeseries with fast lookup by timestamp. Riegeli doesn't support this out of the box. If I had discovered it before I built ChunkIO, I probably would've pulled the low-level code out of it and added timeseries support on top. Or maybe not. Reliability is very important to me and it's risky to use work-in-progress software that may or may not have any production footprint (I'm no longer with Google so I don't know if they use it internally.)
I don't understand. RecordIO doesn't support lookup of any kind; it is a linear format. The interface of Riegeli looks to me exactly like the interface to RecordIO. All they've done is removed support for Google's abstract File* storage interface so it can be used by the public.
What you are describing sounds like SSTable. Perhaps you could benefit from LevelDB.
This format looks somewhat underpowered. If one record is corrupted, there is no way to read anything after it. For the same reason there is no lookup/sharding support, such as finding the first record that starts in the second half of the file. If a writer crashes, a new instance of writer cannot append to an existing file without reading its whole content and truncating on the last readable record.
Well I guess I’m “that guy” who finds it highly interesting as an outsider with a very high interest in search. Never knew what tools Google’s using (never bothered to look it up either) and this list is a feast of “Aha!”’s for me :)
Hopefully that answers your question ;-) Same as with a discussion about patents the other day. It’s fascinating to learn / read / understand how big businesses work as someone who’s never worked at one nor aspires to work at one, but does want to build one ;-)
Just winging it here, but maybe because these are some of the most influential companies on the face of the Earth, they employ tens of thousands of HN users, they are the envy of maybe half of all employees on the planet, their programming systems effect everyone reading HN, and many people here are genuinely interested in what happens at ginormous corporations that employ other hackers.
How it's the age relevant? I'd guess that the number of engineers and the size of the infrastructure used are far more relevant to just guess how interesting things are.
I still have used an editor from a company far older than Google, and would love to learn what life was like on a Symbolics system or the dirty tricks of obsolete networks.
If it weren’t for such a vibrant open source world then they’d really be stuck if they tried to apply their knowledge outside of their companies.
Thanks for sharing the repo!