Because the "display" server is also responsible for window management and input. Does it have to be this way? Perhaps not - but seperating window management and input is also not trivial.
Note that more complex input methods do somewhat bypass the display server and communicate via dbus instead.
Note that more complex input methods do somewhat bypass the display server and communicate via dbus instead.