The way a language server should work, imo, is it holds all the code and serves projections of the code on the fly to the editor. The user of the editor makes changes to the code and commits it back, at which point the language server parses the code and integrates it into the codebase, pretty-printing it to files as necessary for source control. But, the file layout should be an implementation detail that the editor knows nothing about.
Exactly! There is such a shift in paradigm that needs to happen here and the only project I know of that is moving in this direction is Unison.
I don't want to edit a "file", I want to edit these two functions that exist in some module(s), why can't I just see those two?
I constantly jump between many different languages and the cognitive load is noticeable: was it "!=", "/=" or "~="? Why am I writing/viewing ascii art when I'm coding?
If I am most comfortable/fluent viewing python, why can't I view a javascript source as python?
I think the remaining challenge is what sort of projections are the best/most useful? How do we manipulate these projections? I have played around with these ideas and made an AST viewer for the browser where you could configure exactly how a node was represented (using CSS) and navigation was done in block mode (node traversal) but I found it really hard to build an editing experience that felt smooth..
> If I am most comfortable/fluent viewing python, why can't I view a javascript source as python?
Because they are not isomorphic. At all. Even if you just consider the languages themselves, and ignore their ecosystems, which, in practice, you cannot.
I think this is a bit too simplistic of a take: there’s no reason you couldn’t have multiple syntaxes for the same AST so that people could work in the syntax they prefer (e.g. C-style, Pascal style, Indentation-sensitive, S-expressions). Textual syntax can be implemented a layer up, if the input to the interpreter/compiler is not text but a data structure. (The true meaning of what people think is the benefit of “homoiconicity”: eval consumes and produces the same types of data)
Sure, you could do that. But how does it help you understand the code better than reading it in the original source form? Programming languages are more than syntax. If you don't understand the semantics, you don't understand the code.
Different syntaxes are better for different purposes: s-expressions are easy to manipulate structurally; significant indentation is often easier to read; etc. Semantics is important, but syntactic noise is too.
You mean different purposes by humans or different purposes by machines?
For machine manipulation, I think it makes more sense to directly manipulate the AST.
For human manipulation, I think the cognitive overhead of mentally converting between the display syntax and the canonical syntax would far outweighs any gains in readability. But maybe your workflow is different than mine - If you have a lot of custom macros in your editor, I could see s-exps being useful (although, again, I think exposing and directly manipulating the AST would be less error prone)
A programming language doesn’t need a canonical syntax if its semantics are specified in terms of the data-structures the parser produces and not in terms of the textual representation of those data structures.
My point isn’t that existing languages are designed this way, but it wouldn’t be hard to retrofit this onto an existing language. Especially one like JavaScript that already has relatively widely-used transpilers
Meta's "Transcoders" or whatever they changed the name to demonstrate that. However, if we want perfectly semantically equivalent functions, as soon as you add two numbers, then Python -> JS is impossible. The best we can do is approximately translate the behavior.
> I don't want to edit a "file", I want to edit these two functions that exist in some module(s)
That shift happened like 20 (?) years ago. That's how Eclipse displays your Java stuff. It goes to a great length to pretend that there aren't files. Instead there are packages.
Seeing noobies and experienced programmers struggle with it for years, my conclusion is that this is a bad idea. Most problematically it creates "programmers" who have no idea how their project is actually organized, or how to open files that nobody from the ops department put into their editor in such a way that they can be discovered. The amount of dumb questions I had to deal with is on par with those IT stories about outrageously incompetent users pushing mouse buttons with their foot or forgetting to plug their appliance into power supply.
In practice, the more programmers are removed from the actual thing they are programming, the worse are the results, the lower is the competence and the more resources are wasted. I would rather live with the downsides of poor synchronization between the language server and the files I'm editing then let the language server be in the datapath. Too much headache for very little gain.
I'd only add that well before Eclipse and its ilk, Java started down this path with the deep filesystem paths that made it painful to work with from the filesystem without the kind of multi-level collapsing Github does. It was a choice that pushed people towards seeing the filesystem hierarchy as a nuisance, and laying the groundwork for encouraging people to obscure it in IDEs.
The problem with the filesystem is that it privileges organization scheme which isn’t the best one for every editing task. This makes, for example, implementation inheritance hard because your class has a bunch of invisible code in it. But, it you could expand all the superclass methods into a single view and then have edits automatically integrated into the appropriate places, this wouldn’t be as much of a problem.
Java’s filesystem hierarchy is a great example of a “fileout” format for the sort of environment I’m talking about. Another example here is smalltalk repositories generated by Iceberg: https://github.com/pharo-vcs/iceberg
The thing is, nothing stops you from having alternative views as well, but the moment you make that expected and de-facto privileged by making filesystem nabigation painful, and people stop thinking about how to present the project as a whole in a narrative as a result, you tend to lose structural information that matters when trying to navigate unfamiliar code.
It’s actually the opposite: if we moved to storing source code in, say, sqlite and built tooling to make querying these databases easy, then it would become a lot easier to get a high-level understanding of a project. Especially if, in addition to the code, you stored links (e.g. from a function to the functions it calls; from a class to what it references).
I personally find Common Lisp and Clojure much easier to navigate because I can just ignore the filesystem layout and use the in-image database of code relationships to navigate.
I strongly disagree with this, given we have real examples of image based systems to compare with. You lose a significant amount of structural information that way.
Again, note that nothing stops you from ignoring the filesystem when navigating relationships. Nothing stops your IDE from indexing the data. Even ctags is decades old.
What the filesystem structure provides is additional context: "these things belong together for some other reason than the relationships directly expressed in code.
In a codebase where nobody bothered with that, or they've just dumped code together for superficial reasons sure, you will gain nothing, but you also lose nothing because you can fall back to querying your IDE or whatever.
In a well written codebase, on the other hand, the structure lets you follow a narrative.
Put another way: If you need to query a database to get a high level understanding, it's a strong signal that the person who wrote the code thought nothing about communicating the architecture to you, and to me that's a warning that the code base is going to be a massive pain to work with because that tends to extend to other areas.
> note that nothing stops you from ignoring the filesystem when navigating relationships. Nothing stops your IDE from indexing the data. Even ctags is decades old
Sure, but all these systems do significantly more work than necessary (or have subtle caching issues and race conditions) because they have to be continuously reindexing an anemic model of the code base.
As far as image-based systems go, give me one of those any day: Common Lisp and Smalltalk have tooling and introspection capabilities from the future. My own experience is that I’m significantly more productive getting up to speed on a new Lisp (Common Lisp, elisp, Clojure) codebase than on any of the alternatives because the system stores so much metadata about the entities.
Also, I think you're underestimating the capabilities for forming narratives that my proposed system gives you: views, stored procedures, various tools built on things like graphviz for visualizing the structure of the code.
> Seeing noobies and experienced programmers struggle with it for years, my conclusion is that this is a bad idea. Most problematically it creates "programmers" who have no idea how their project is actually organized
The layout of files on a filesystem is not how a project is organized. The organization of a typical project is a graph that’s lossily represented by filesystem trees.
What I'm trying to say is that this approach prevents developers from effectively working with the tools their projects rely on to function.
I.e. be it Ant, Maven or Cradle, in order to carry out project-related tasks they will rely on files. They feed files to various tools, create new ones, delete or move old files, and then the deployed project needs to discover those files somewhere and so on.
When a programmer doesn't understand how what they are presented with in their editor maps to whatever any of those tools do you get questions like: "Where is my Java home?" or "I want to debug in the testing environment, can you tell me where is it?" or "I think I've built my program, and I want to patch the existing deployment with the program I've built -- how do I find the program I've built and where is it deployed?". Not to mention more trivial stuff like developers arguing about having / not having access to eg. Protobuf files in their project because someone's editor not having a plugin to open them and they simply don't know how to find their project directory on their computer... or trying to run poorly written Maven build which has some relative paths in it, from a wrong directory.
Even operators that look the same (e.g. “+”) often have different semantics between programming languages (type promotion, rounding, modulo arithmetics). Translating between programming languages while maintaining the original semantics is exceedingly complex, and you might not like how the result looks like. Those differences are why we have so many programming languages in the first place.
That's exactly how LSP is designed. The editor e.g. requests 'the user is hovering over code <here>, what should I show?' and the server responds with some Markdown text. Or 'the user wants to navigate to the thing under their cursor <here>, where should I go?' and the server responds with a file/line.
Once you try to use LSP for anything not in that form.. it's not so fun.
No, this is the opposite of how LSP is designed: LSP assumes the editor is looking at a file and this results in a complicated sync dance between editor state and file state. I want the editor to look at temporary buffers served up from the language server and have the code be “written” by sending the temporary buffer back to the language server which handles writing it out to files itself.
As far as I know, the only languages with something even vaguely like this are Pharo Smalltalk (with its git integration) and unison
> I want the editor to look at temporary buffers served up from the language server and have the code be “written” by sending the temporary buffer back to the language server which handles writing it out to files itself.
I don’t think any editor would want to accept this workflow. The biggest issue is that writing is reputation-critical, if the editor writes to the wrong place, or it fails to write when it should have written, then users lose their work which makes them very unhappy. So the editor has to take responsibility for doing the writing, and that is the user’s mental model when they save their work.
I want a system more like Bank Python[1] or Unison[2] or Pharo Smalltalk[3] that gets us beyond the idea of "code in files". And I have plans to build this on top of Common Lisp + SLIME
Loads of people absolutely hate having to use a weird custom IDE just to try a language.
That's why all the smalltalk like things are doomed for failure unless they are enforced by a platform.
The advantage here is this enables some interesting workflows, like “show me this method and every definition that overrides it in a single view” or “show me this method and all the methods it calls directly”. So you can project the interesting part of your code base into a temporary editor buffer and then edit it and the language server takes care of persistence.
Also, it ends the formatting wars because the in-repository format is disconnected from the user’s preferred format.
I don't use LSP for all the real-time stuff, but I do use it for other things like rename symbol, lookup documentation, jump to, etc. and just "the file you have open" isn't really enough for that in most cases. You need to know about the "file layout" or "complete project" for that sort of thing.
Last time I read the LSP specs/repos, there were 2 nuances regarding your statement:
(1) the editor read the files, display then, and then sends changes to the LSP which then mirrors the file to compile, analyze, etc it. The file is never persisted in that stage to the file system (needs to because otherwise no syntax highlighting while editing.
(2) there are conversations about that in both the LSP and Editor space about virtual file systems. Google "language server virtual file system". The core author of the LSP spec has written one issue very related to it.
Imho it should be the other way around: the editor knows about file layout and everything, and the language server queries the editor when it needs contents for a specific file. Unfortunately the LSP don't have calls to query contents.
The rationale is that only the editor knows what files are open and modified. Also we can imagine scenario where the editor and the language server are on different remote (eg GitHub codespace or the browser version of vscode. The language server could be a wasm build running where the editor is running, but the editor may access files in a remote server, or vice versa)
Well, a lot of this is that I think files are a bad way to represent code. You really want something more shaped like a database that allows for multiple views into the same code. Git would for serialization, but the source of truth should be inside the language server and either the files or the repository.
This is something I have been wanting for nearly a decade. A lot of writing software isn't just implementing your logic and abstractions but actively thinking about how to organize code to the constraints of the filesystem. Having to actively model your modules around file paths, Rust for example tightly binding the use of `mod` to your layout. Refactoring is the same, a non-trivial amount of time on large projects when re-factoring is realising you need to re-organise some module hierarchy and that involves modifying the file system too.
I really dislike this, instead of a fuzzy file finder I want a fuzzy function finder, where all functions are just kept in a database that I can pull into buffers at will. Where hierarchy is only based on the logical structure of your program and the filesystem ceases exist. "New Function" over "New File". You can get the "Fuzzy Function" finder part somewhat with LSP Symbols, but it doesn't get rid of the having to think about files.
Unfortunately I don't think you can get this without first-class support by the language itself, and new languages getting critical adoption isn't a regular thing.
That sounds like it would make for bad latency while editing. And what about other files in the project, other than the files of the specific programming language? The IDE needs to understand their file layout anyway, and often there are dependencies to the layout and naming of the programming-language source files. And you want to do stuff like textual search across all project files. Effectively your LSP server would have to become a full-scale remote IDE.