It’s normal for UIs to need some sort of nesting, e.g. menus, tabs (including things like tab panels and ribbon UIs), flexible split panes. It’s possible to do all of these without a tree, but you’ll be doing what nesting does, just at a far more painful level. Switch from tab A to tab B → hide all the A widgets, show all the B widgets. Adjust the split position → recalculate the position, width and height of every element that’s “within” it. Scroll a scrollable panel → adjust the y position of every element that’s “within” it, and also probably refresh the clip masks that every widget must now have since scrolling makes partial occlusion possible.
If you had to reason about your code as a tree when writing it, I think it would quickly be cumbersome.
Remember that we're talking about the API here, not the internals of the implementation.
If I had to explicitly say "tree", I agree. But if what I was writing was in fact not a tree, I think it would be substantially more difficult.
Sound theoretical underpinnings are usually necessary for a coherent system that allows for good abstractions.
Note that in code while I don't say the word "tree" while writing it. I create a tree with my braces (C) or indentation levels (python) and file structure. The language designers created a syntax using a tree. And so on.
I guess what I'm saying is that while the tutorial should not say tree, and I should not be thinking about tree nodes and edges while writing it, I should be writing a tree (and in code I am).
I dare you to make a website using only css position: absolute; Make it work on different screen sizes. Do independent scrolling. It'll be hell very quickly. Nesting is used in all somewhat complicated layouts.
Win32/MFC dialog layouts are based off absolute positioning of all widgets, with a few concessions to resizability (like the ability to select whether a widget moves down or stay still when a window gets taller). This is the reason why many dialogs, like Run, or Windows Explorer file properties, cannot be resized.
Any hierarchy (group boxes holding widgets) is purely visual, and the rectangle doesn't "own" its child widgets in a programmatic sense.
Text widgets have a fixed size. If your font gets wider (from DPI scaling or translations), text can become cut off or overflow onto the next line and disappear.