I'm quite sure I've read your article before and I've thought about this one a lot. Not so much from GIT perspective, but about textual representation still being the "golden source" for what the program is when interpreted or compiled.
Of course text is so universal and allows for so many ways of editing that it's hard to give up. On the other hand, while text is great for input, it comes with overhead and core issues for (most are already in the article, but I'm writing them down anyway):
1. Substitutions such as renaming a symbol where ensuring the correctness of the operation pretty much requires having parsed the text to a graph representation first, or letting go of the guarantee of correctness in the first place and performing plain text search/replace.
2. Alternative representations requiring full and correct re-parsing such as:
- overview of flow across functions
- viewing graph based data structures, of which there tend to be many in a larger application
- imports graph and so on...
3. Querying structurally equivalent patterns when they have multiple equivalent textual representations and search in general being somewhat limited.
4. Merging changes and diffs have fewer guarantees than compared to when merging graphs or trees.
5. Correctness checks, such as cyclic imports, ensuring the validity of the program itself are all build-time unless the IDE has effectively a duplicate program graph being continuously parsed from the changes that is not equivalent to the eventual execution model.
6. Execution and build speed is also a permanent overhead as applications grow when using text as the source. Yes, parsing methods are quite fast these days and the hardware is far better, but having a correct program graph is always faster than parsing, creating & verifying a new one.
I think input as text is a must-have to start with no matter what, but what if the parsing step was performed immediately on stop symbols rather than later and merged with the program graph immediately rather than during a separate build step?
Or what if it was like "staging" step? Eg, write a separate function that gets parsed into program model immediately, then try executing it and then merge to main program graph later that can perform all necessary checks to ensure the main program graph remains valid? I think it'd be more difficult to learn, but I think having these operations and a program graph as a database, would give so much when it comes to editing, verifying and maintaining more complex programs.
> what if the parsing step was performed immediately on stop symbols rather than later and merged with the program graph immediately rather than during a separate build step?
I think this is the way to go, kinda like on Github, where you write markdown in the comments, but that is only used for input, after that it's merged into the system, all code-like constructs (links, references, images) are resolveed and from then you interact with the higher level concept (rendered comment with links and images).
For programinng langauge, Unison does this - you write one function at a time in something like a REPL and functions are saved in content addressed database.
> Or what if it was like "staging" step?
Yes, and I guess it'd have to go even deeper. The system should be able to represent broken program (in edited state), so conceptually it has to be something like a structured database for code which separates the user input from stored semantic representation and the final program.
IDE's like IntelliJ already build a program model like this and incrementally update it as you edit, they just have to work very hard to do it and that model is imperfect.
There's million issues to solve with this, though. It's a hard problem.
I think mostly because an LLM is not a "mind". I'm sure there'll be an algorithm that could be considered a "mind" in the future, but present day an LLM is not it. Not yet.
This is in my opinion the greatest weakness of everything LLM related. If I care about the application I'm writing, and I believe I should if I bother doing it at all, it seems to me that I should want to be precise and concise at describing it. In a way, the code itself serves as a verification mechanism for my thoughts and whether I understand the domain sufficiently.
English or any other natural language can of course be concise enough, but when being brief they leave much to imagination. Adding verbosity allows for greater precision, but I think as well that that is what formal languages are for, just as you said.
Although, I think it's worth contemplating whether the modern programming languages/environments have been insufficient in other ways. Whether by being too verbose at times, whether the IDEs should be more like databases first and language parsers second, whether we could add recommendations using far simpler, but more strict patterns given a strongly typed language.
My current gripes are having auto imports STILL not working properly in most popular IDEs or an IDE not finding referenced entity from a file, if it's not currently open... LLMs sometimes help with that, but they are extremely slow in comparison to local cache resolution.
Long term I think more value will be in directly improving the above, but we shall see. AI will stay around too of course, but how much relevance it'll have in 10 years time is anybody's guess. I think it'll become a commodity, the bubble will burst and we'll only use it when sensible after a while. At least until the next generation of AI architecture will arrive.
I do like the build of Macbooks and especially the solid casing. Unfortunately I could never get used to MacOS even within 2.5 years and it was not quite as reliable for me as it is for many others.
XCode installations failing, Docker installation failing after an OS update never to work again without completely reinstalling OS, plugging in headphones would crash the Macbook (until OS update 6 months after I got it), video calls slowing to a halt, if sharing screen etc.
Also there were some things I just never got used to in Mac like window tabbing & minimize working in a Mac way. Maybe if I hadn't had a personal laptop that used Linux at the same time, I would have gotten used to it a little better, but I just plain hated the way it worked.
To be fair, I think it was still more reliable than varieties of Windows, especially the later ones! If tabbing worked more like under Windows and it allowed a bit more configuration, I might be using Mac these days.
That leaves Linux. Although it's not flawless neither after configuring Debian + i3, it works exactly like I want and the same installation has been reliably working for 5+ years. However, getting to the setup that just works certainly took several tries and depends on laptop compatibility, so... No ideal choices exist right now I think. Just luck and what someone is most used to in the end.
I’ve used Macs nearly exclusively for 13 years and have not gotten used to the window tabbing. I just fundamentally don’t think windows of the same application should be grouped together.
I gave it a try on my current codebase out of curiosity. Definitely useful. It worked well and fast, but it has a lot of duplicates that get rendered as exports in the NodeJS modules based codebase. I think it can sometimes be caused by me just being haphazard about re-exporting them, but other times I'm not sure.
Eg authenticatedMenu() appears 4 times in authenticatedMenu.js, only one of them is imported by 2 different files and 3 are just there alone. There's a single export in the file and a number of other files import it through an index.js that re-exports several files other files too.
In my case I think it'd help, if I could disable the duplicates as they don't really provide any useful information when exploring the codebase.
Also, if there was optionally a way to ignore the files that re-export functions/classes and collapse those paths, it'd make the graph a lot smaller and more easy to understand. Maybe it's already something that depgraph does, but the duplicates confuse things, so I'm not sure.
> I think it can sometimes be caused by me just being haphazard about re-exporting them, but other times I'm not sure.
I think so too. I guess that's how your project is structured and duplicates maybe inevitable.
The graph shows exactly how the project is organized. Right - "duplicates confuse things" - this would suggest eliminating "files that re-export functions/classes" or passing an option (-i) for ignoring specific paths would help. Otherwise, this issue is noted for further analysis.
My favourite approach to documentation is the "4 kinds of documentation" - whether it's about an API, a library or anything else. I think it's a very clean way of explaining "good/poor" documentation.
In a nutshell, which type of documentation we need depends on the goal we have. Any API missing one of the kinds of documentation will feel like it is missing something. Once I read about it, I've been noticing how the documentation I like tends to have all these aspects covered.
Over a (rotary) phone with a classmate of mine, who had gotten really-really into programming, explaining how to do things in Turbo-C. Turbo-C had a great help system, so I mostly followed that and my classmate's instructions to make a tiny drawing program and a small RPG that rendered characters straight from FILE* to screen a pixel at a time.
I didn't know how arrays or linking/including worked (or that they existed), so it was one long file with each creature having its own function to determine their behaviour and their own health_creature_1, health_creature_2 and so on. I really started wondering if there was a better way after a while.
reply