Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The opinion in the article misses something fundamental.

The complexity is not artificial, it is completely organic and natural.

It is incidental complexity born of decades of history, backwards compatibility, lip-service to openness, and regulatory compliance checkbox ticking. It wasn't purposefully added, it just happened.

Every large document-based application's file format is like this, no exceptions.

As a random example, Adobe Photoshop PSD files are famously horrific to parse, let alone interpret in any useful way. There are many, many other examples, I don't aim to single out any particular vendor.

All of this boils down to the simple fact that these file formats have no independent existence apart from their editor programs.

They're simply serialised application state, little better than memory-dumps. They encode every single feature the application has, directly. They must! Otherwise the feature states couldn't be saved. It's tautological. If it's in Word, Excel, PowerPoint, or any other Office app somewhere, it has to go into the files too.

There are layers and layers of this history and complex internal state that has to be represented in the file. Everything from compatibility flags, OLE embedding, macros, external data source, incremental saves, the support for quirks of legacy printers that no longer exist, CYMK, external data, document signing, document review notes, and on and on.

No extra complexity had to be added to the OOXML file formats, that's just a reflection of the complexity of Microsoft Office applications.

Simplicity was never engineered into these file formats. If it had been, it would have been a tremendous extra effort for zero gain to Microsoft.

Don't blame Microsoft for this either, because other vendors did the exact same thing, for the exact same pragmatic reasons.





100% agree... I think most people don't get this. People whine that a program doesn't use a "standardized" (read: popularized FOSS) format, but then dismiss logical rebuttals like it not supporting everything they need.

What do they expect people to do, remove features in order to support other formats? Users won't like that.


You might start with something simple with aim for simplicity. Then you need to add more features. Eventually in enough years you will have lost the simplicity as you have that many features to support.

You might not add features, but well that is most likely losing proposition against those competitors that have features. As generally normal users want some tiny subset of features. Be it images, tables, internal links, comments, versions.


Everyone uses 10% of the features of complex software... it's just not the same 10%, which is why the other 90% needs to be in there and included in the file formats.

It's also not sufficient to find that "perfect" lean and mean application that happens to cover precisely the 10% that you need for yourself, because now you can't interchange content with other people that need different features!

I regularly open and edit Office documents created by others that utilise features I had never even heard of. I didn't know until very recently that Power Point has extensive animation support, or that Excel embeds Python, or that both it and Power BI can reach out to OData API endspoints to refresh data tables or even ingest Parquet directly.

You might not need that, but the guy that prepared the report for you needed it.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: