So, a lot of what people think of when they think of OS is UI based (GUI/TUI/CLI) but kernel APIs are the real bread and butter and this is where the UNIX philosophy really shines.
Passing a string as a file name in C++ with macOS or with Linux in my experience was simple. The permitted length of using ASCII characters is about 4 times as long (may god have mercy on your soul).
I am not here to shit on Windows but the Windows devs clearly have a very different set of priorities (e.g. backwards compatibility) than the other breeds of modern OS devs.
I guess to a large extent we all expect to be in the browser (gross) in some number of years but Windows seems so much harder from the perspective of someone that has programmed for Unixen and studied Windows as an OS.
Passing a string as a file name in C++ with macOS or with Linux in
my experience was simple. The permitted length of using ASCII characters
is about 4 times as long (may god have mercy on your soul).
Macs are simple enough too if you ignore the quirks. HFS (which was never seen on a modern MacOS) usually stores no information about what encoding was used for filenames. It's entirely dependent upon how the OS was configured when the file was named (although some code I've seen suggests that something in System 7 would save encoding info in the finderinfo blobs). So non-latin stuff gets mangled pretty easily if you're not careful. Filenames are pretty short (32 bytes) minus the one byte because (except for the volume name) they're Pascal strings with the length at the front.
HFS+ (which is what you'll find on OSX volumes) uses UTF-16 but then mandates its own quirky normalization and either Unicode 2.1 or 3.2 decomposition depending… which can create headaches because most HFS+ volumes are case-insensitive. It's been so long since I've touched anything Cocoa, but I assume the file APIs will do the UTF-16 dance for you and the POSIX stuff is obviously OK with ASCII.
And, of course, let's not forget the heavily leveraged resource forks. Of course NTFS has forks but nobody seems to use them.
APFS standardized on Unicode 9 w/ UTF-8.
CDs? Microsoft's long filenames (Joliet) use big endian UTF-16 (via ISO escape sequences that theoretically could be used to offer UTF-8 support). Which sounds crazy until you realize their relative simplicity (a duplicate directory structure) compared to the alternative Rockridge extensions which store LFNs in the file's metadata with no defined or enforced encoding. UDF? Yeah that's more or less UTF-16 as well.
I think we're perhaps forgetting just how young UTF-8 is.
Thanks for the comment. HFS/HFS+ is a fascinating bit of history.
It strikes me how developer ergonomics have improved as computers have become cheaper/increased in power.
As to UTF-8, we may say it’s young but in 14 months it will be old enough to purchase and consume alcohol in the United States. From other comments it seems like Microsoft don’t think the tech debt is too great so long as they have good libraries in C#
Microsoft does write native C++ apps for Windows all the time.
First of all, games are apps, second even if apps unit keeps mostly ignoring WinUI/UWP (written in C++), whatever they do with Web widgets is mostly backed by C++ code, not C#.
On of the reasons why VSCode is mostly usable despite being Electron, is exactly the amount of external processes written in C++.
Applications being written in .NET is mostly on the Azure side.
“Applications being written in .NET is mostly on the Azure side.”
You are of course, wrong about this. Most .Net/C# code is not Azure (yet anyway) -related; it is the billions of lines of enterprise application code across businesses around the world (for me, since 2001)…
Maybe for file handling in C++, but DirectX/HLSL is the best Graphics API I've worked with and C# is easily my favorite language to develop in. It's easy for us to talk shit about Win32 today, 30 years after it was initially developed, but there are myriad historical reasons why UTF-16 is used by Java, Windows, and other languages/runtime environments and why it's not simple to just break compatibility with decades of software running at hospitals and financial trading firms because the 32 year old armchair experts at HN said so.
> The UCS has over 1.1 million possible code points available for use/allocation, but only the first 65,536, which is the Basic Multilingual Plane (BMP), had entered into common use before 2000. This situation began changing when the People's Republic of China (PRC) ruled in 2006 that all software sold in its jurisdiction would have to support GB 18030. This required software intended for sale in the PRC to move beyond the BMP.
True. They broke the basic Windows search functionality some time in 2007 and broke Outlook search around 2013 and neither of which have been fixed since.
It's not all backwards comparability. I'm willing to bet that some (a large part?) is just sloppy software development.
SQL Server (2017?) breaks if you update it on a UTF-8 Windows because it runs a TSQL that doesn't work with that code page. That script is a mess. Some of it is indented using tabs, some space. Trailing whitespace. Yuck
My hot take: Code quality is not measured by formatting issues, but by error resilience and number of actual bugs.
Much of modern linting and commit hooking is dedicated to checking whitespace placement, variable naming and function lengths but the well-formatted newly rewritten code is still buggy as hell - it just looks pretty
Formatting doesn’t remove bugs, but it’ll help you detect them. Linted code helps you scan the code faster and provides valuable pattern recognition, allowing us to detect common mistakes.
Another reason for formatting is the “minimal diff” paradigm. If a formatting rule would not be followed, in the next commit hitting this code, the format would also be affected, causing a larger diff than necessary.
There are other reasons for simple format linting, but the reasons above are the most profound.
Lastly, formatting is part of a range of static code analysis tools. Generally, formatting inconsistencies are the easiest to detect and resolve, as opposed to more sophisticated tools.
I never understood what backward compatibility was met by windows api not supporting >260 chars file paths. It will work just in the same was if you pass any short path and no old application expects a long path anyway.
Your example isn't problematic API-wise, because CreateFileW doesn't need to care if you pass in 16 characters or 1600 - if it does, that is mostly a matter of refactoring and not inherent to how the function works. The real problem are APIs that inherently assume that you pass in a string of at most MAX_PATH characters, because you provide a pointer but no size, and the API is expected to write to that pointer. This affects most shell32 getter functions, e.g. SHGetKnownFolderPath.
But for functions outside of Windows itself, this is the exact reason why the long path feature is hidden behind an opt-in flag.
Passing a string as a file name in C++ with macOS or with Linux in my experience was simple. The permitted length of using ASCII characters is about 4 times as long (may god have mercy on your soul).
I am not here to shit on Windows but the Windows devs clearly have a very different set of priorities (e.g. backwards compatibility) than the other breeds of modern OS devs.
I guess to a large extent we all expect to be in the browser (gross) in some number of years but Windows seems so much harder from the perspective of someone that has programmed for Unixen and studied Windows as an OS.