Hacker Newsnew | past | comments | ask | show | jobs | submit | dons's commentslogin

I think the point in the space Glean hits well is efficiency/latency (enough to power real time editing, like in IDE autocomplete or navigation), while having a schema and query language generic enough to do multiple languages and code-like things. You can accurately query JavaScript or Rust or PHP or Python or C++ with a common interface, which is a bit nuts :D


We use this to power things like find-references or jump-to-def, "symbol search" and autocomplete, or more complicated code queries and analysis (even across languages). Imagine rich LSPs without a local checkout, web-based code queries, or seeding fuzzers and static analyzers with entry points in code.

Our focus has been on very large scale, multi-language code indexing, and then low latency (e.g. hundreds of micros) query times, to drive highly interactive developer workflows.


I'm really struggling to understand what Glean does, and why I would use it. Most important: Your landing page should quickly show what Glean does that a typical IDE (Visual Studio, Visual Studio Code, Eclipse, ect, does.)

Specifically, things like "Go to definition," and tab completion have been in industry-leading IDEs for at least 20 years.

What's novel about Glean? It seems like a lot of hoops to jump through when Visual Studio (and Visual Studio Code) can index a very large codebase in a few seconds. (And don't require a server and database to do it.)

Perhaps a 20-second video (no sound) showing what Glean does that other IDEs don't will help get the message across?


> It seems like a lot of hoops to jump through when Visual Studio (and Visual Studio Code) can index a very large codebase in a few seconds.

I think you are not thinking large enough. An IDE absolutely can not index a very large codebase and allow users to make complex queries on it. Think multiple millions lines of code here. The use case is closer to "find me all the variables of this type or a type derived from it in all the projects at Facebook" than "go to this definition in the project I'm currently editing".


> Think multiple millions lines of code here. The use case is closer to "find me all the variables of this type or a type derived from it in all the projects at Facebook" than "go to this definition in the project I'm currently editing".

That wasn't clear, at all.

Maybe they could just say exactly what you said?


There's large, and there's scope. I use VSCode to dabble in dozens of projects across a dozen languages at a time, often coming back to fix things after years. VSCode is great at telling me what I did in the current project, but I can't remember library calls or even syntax without looking at something I wrote before. My efficiency is perhaps 50% at recalling where to look; a tool that kept my entire corpus at my fingertips would be extremely welcome. But I'm failing to see how this is that.


If you've not had to deal with a codebase that takes VSCode longer than a few minutes to index, then you're probably outside their initial target market. If you've not had to setup a hosted code search tool (eg livegrep https://github.com/livegrep/livegrep ) because there's just too much code, you've been lucky. If your projects can be scoped, and not pull in code from dozens of libraries, across dozens of teams, many of which are on different continents, you're doing a better job of organizing code than I've been able to manage.


This makes a lot of sense to me through an efficiency lens.

Facebook could spend a lot of money to get engineers beefy workstations, and then have each of these workstations clone the same repository and build the same index locally.

Or, they could leverage the custom built servers in their data centers (which are already more energy-efficient than the laptops), build a single index of the repo, and serve queries on-demand from IDEs throughout the company.

I could also see an analytics angle to this if it could incorporate history and track engineering trends over time. In my experience, decision making in engineering around codebase maintenance is usually rooted in “experience” or “educating guessing” rather than identifying areas of high churn in the codebase or what not.


100% same take.

I'd add that I didn't want to click "get started" because i didn't know if it was a thing i wanted, and then "get started" actually took me to documentation, which is not what i expect from a "get started" button. The Documentation had the presumption that i wanted to use it, and thus the implication that i knew wtf "it" was.

I don't care about its efficiency, or declarative language, or any of that when i still don't know what we're talking about.


I don't know what Glean is used for, but here are some guesses for this kind of technology:

- find references / go to definition for web tools, like when reviewing pull requests

- multi-language refactoring, e.g. modifying C bindings

- building structural static analysis tools like coccinelle, or semgrep, but better


Imagine that you pulled in all your dependencies in different languages in source + windows source and visual studio source. Now you want to click around that source. This is what this tool is for.


What size codebases do you have that a few seconds has visual studio fully indexing it? My experience with VS on large projects is that it takes however long the project takes to compile before it's usable, but many functions (go to definition) can occasionally hit a file that needs to be reparsed and can stall for minutes on end. I use Vs2019 on a 32 core workstation with 128GB ram, fwiw.


“Go to definition” has been around even longer, since at least the early 90s


I don't recall which version of Emacs first had "go to definition", but it was well before the 90's.


How easy was it in 90’s to have a go to definition over, say, 10% of today’s maven central?


indexing speed probably wouldn't be a problem. the index size would probably exceed available disk space for a typical workstation of the time.


ctags came about in 1992. etags was a little bit later.

was there something before etags in emacs?


The wiki page says ctags was part of the 3BSD release, which would put it about 1979. My memory isn’t good enough to recall which release of emacs picked up support.

I think there was a find-function key binding for lisp code before ctags, but these are very rusty bits I’m recalling.


I see you support Thrift and Buck. Would you also be interested in adding Proto and Bazel support? Being able to query the code based on the build graph (sort of) would be very cool.


Briefly skimmed the docs and it noted that it doesn't store expressions from the parsed AST. That means it's mostly a symbol lookup system?

When doing large system refactoring searching by code patterns is the number one thing I'd like to have a tool for. For example being able to query for all for loops in a codebase that have a call to function X within their body.


How would it perform for, say, 500TB of source code?

And what would be the disk and memory requirements for this? Could they be distributed across a handful of servers?


I'd be surprised if this question could have an off hand answer. Doesn't sound like something that could have scalability predictable enough to do back of the envelope calculations on.


What on earth has this much source code? Every open source project ever?


Yes, good guess! That's the size we have after deduplication across projects at https://www.softwareheritage.org/ . We archive all the source code we can find; and would like to support some sort of full-text search on it at some point, so Glean looks interesting


I mean, yeah. Imagine being able to do more rich queries against GitHub.


Since this is HN, could you please share more technical/impl details, e.g. what makes it more scalable and faster in general and also compared to other similar engines?


Does that mean you are using the shell or how is it used to enable these functionalities?


Most clients hit the Glean server via the network (thrift/JSON) and then mostly via language bindings to the Glean query language, Angle. The shell is more for debugging/exploration.

Imagine an IDE plugin that queries Glean over the network for symbol information about the current file, then shows that on hover. That sort of thing.


Alright gotcha. Thanks for the clarification.


This is really cool.

Seems like there are only indexers for Flow and Hack though.

Will there be more indexers built by Facebook, or will it rely on community contributions?


There will be more indexers: we have Python, C++/Objective C, Rust, Java and Haskell. It's just a case of getting them ready to open source. You can see the schemas for most of these already in the repo: https://github.com/facebookincubator/Glean/tree/main/glean/s...


A bit of both I think.


Been away from Fb for a few years. How does this relate to tbgs?


Jump to def is nice when biggrepping a piece of code a la what you can do with codesearch, cs.android.com


Solving NP hard problems efficiently.


I know that every NP problem can be written as an SMT problem, theoretically speaking. But is that actually practical?


The answer to that question is, "Yes, if your SMT solver has strategies that provide good results for the problem in question."

Unless you're implementing solvers or caught up on the particulars of a solver's implementation, the best way to gauge this is to try it.


And "eradicate" for Java... http://fbinfer.com/docs/eradicate.html

we love types. They help us ship stuff faster.


Lumping state with some functions that implicitly have R/W access to that state is not possible in Haskell

s/not possible/possible, but highly discouraged/


Facebook | C++/OCaml static analysis | | London, UK | Onsite | Full time

Work extending the open source Infer static analysis suite support for C++. We use Clang for the front-end and have an open role to work on this in general. The code is all OCaml. Expertise in C++ is highly desirable. The requirements are roughtly C++ AND (OCaml OR Haskell OR Static Analysis OR Formal Methods).

The work will be mostly open source.

https://www.facebook.com/careers/jobs/a0I1200000LT8aAEAT/


I know this isn't relevant to your post, but do you know anything about Facebook's London Software Engineering Co-op, or could you put me in touch with someone who does?

I'm trying to clarify whether the role is full-time, and if so, whether it might be possible to create a part-time equivalent position for a student enrolled in a MRes program.


Standard Chartered | London and Singapore | Full-time and Contracting

10+ open positions for Haskell developers in the Strats team at Standard Chartered. 1 open position in the developer efficiency team working on our build system and tooling.

https://donsbot.wordpress.com/2016/06/03/multiple-haskell-de...


Standard Chartered | London and Singapore | Full-time and Contracting

10+ open positions for Haskell developers in the Strats team at Standard Chartered. 2 open positions in the developer efficiency team working on our build system and tooling.

https://donsbot.wordpress.com/2016/06/03/multiple-haskell-de...


We have a typed version now - it's awesome


Yeah, I've heard the praise. I tried to rally Gergo into porting the idea to GHC. (Using the new pluggable constraint solvers in GHC seems like the best bet.)

I still remember when I hadn't had any clue about functional programming, and I was moving from stuff like QBasic and C to Python. I thought dicts were awesome. And they are---by comparison to only having arrays.

Relations do everything dicts do, but you don't have to decide on the structure beforehand. Exactly the same argument that Codd had.


Standard Chartered | ONSITE | Software Engineers (full time)

https://donsbot.wordpress.com/2016/02/25/haskell-developer-r...

7+ roles for Haskell developers in London and Singapore working on a large Haskell code base for trading and risk management. Join a growing team.


Three questions about the role:

May developers work in suitably private conditions, or is it an open-plan / trading-floor environment?

Will these roles involve working with a non-standard Haskell compiler?

What level of Haskell experience are you looking for ... what are some milestone Haskell skills that could help someone quickly identify whether they are plausibly a good fit?


>Will these roles involve working with a non-standard Haskell compiler?

They have a pretty well known internal compiler for a strict dialect of Haskell.

>What level of Haskell experience are you looking for ... what are some milestone Haskell skills that could help someone quickly identify whether they are plausibly a good fit?

They're unlikely to answer this and I don't work there, but you should probably be able to reason about things like GHC's RTS as a baseline.


It's an open plan trading floor.

It involves working with Mu, Standard Chartered's in-house compiler. Working with Mu feels very, very similar to just using GHC Haskell though.

I can't give any specific details about milestone skills, but if you've published libraries to Hackage that are used by people you've never heard of that's probably a good sign that it's worth applying.


> It's an open plan trading floor.

Do you find it challenging to "get in the zone"? i.e. there is all that fuss, noise, and you just sit quietly somewhere in the middle and enjoy programming? Do you have to wear headphones all day long?


I worked in an environment like this before and found it unsustainable. The number of defects and mistakes from everyone in the open-plan area was significant and was a major cause for re-work. Even though the company was making money, it was clear that the noisy workspace was causing them to leave a ton of money on the table, because it was simply not possible for any engineer to create things of minimally acceptable quality in that environment.

There are also concerns for engineers with misophonia (extreme physiological aversion to ambient sounds). For these engineers, though they may be able to do the job exceptionally well, the physical workspace is needlessly prohibitive, bordering on discriminatory, and the idea of using headphones would not address the underlying problem.

I feel this is one of the biggest health issues facing software engineering as a profession (whether it is applied to quantitative trading or banking or making a WordPress site). Hopefully more developers will continue to express their uncompromising need for adequate privacy and quiet in their workspace, and companies will respond by restructuring workspaces to respect these unavoidable human needs.


Don, any "similar" openings for C++ engineers? and/i.e. if there is a way to apply by directly contacting someone in a similar position like you but for "C++ team". To avoid std HR hole...


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: