Wasm is too high level to implement your own tail calls. The WASM virtual machine handles the call stack & function calling convention, instead of being something that the code itself is responsible for. This means that the compiler can't implement tail calls; WASM doesn't allow a jump instruction in one function to jump into a different function.
There are a bunch of reasons why WASM was designed to be higher level than native assembly. One is that they want it to be more compact to save bandwidth when downloading via the internet. Another is that they want to make it faster to compile down to native code; one way they do this is that the control flow is more structured, for example it forbids irreducible control flow graphs. There is also the matter of safety. WASM defines several validation constraints that are checked before the bytecode is run e.g.: the target of a jump instruction must be listed in labels list. If WASM were too low level, it wouldn't be possible to do the safety checks they want.
The compiler absolutely can implement tail calls, I don't know why this keeps getting thrown around. Adding a high-level directive in the spec doesn't enable the compiler to do anything, it just enforces it. The only thing preventing it is browser vendors wanting the .stack property to stay well behaved, but that isn't required by the spec and certainly isn't relevant for non-browser targets.
Most hardware architectures (and software ones based on them) the compiler controls the calling convention: how the stack is managed, what gets pushed there vs passed in registers, and so forth. The architecture may or may not have helpers like specific "return from subroutine" instructions that help manage the stack... but a lot of things are under the compiler's control.
WASM is not like that. It doesn't support jumps or that kind of manipulation. It does not expose enough control over the stack and calling conventions.
In theory could the compiler emit a virtual machine that simulates another machine where it _can_ control these parameters? Sure. It can rewrite it all into a huge loop-and-switch statement. But that is not going to result in anything close to efficient native code. By expressing tail calls directly in WASM the JIT can generate much more efficient code that takes advantage of the platform's native calling convention and tail calls can be _actual_ tail calls on the hardware.
The compiler cannot implement tail calls correctly as it stands. You do not have access to modify the WASM stack and it's not present on the heap like it is for normal programs.
No compiler tricks can enable tail calls in WASM at the moment (with the exception of trampolines which always work and are absurdly slow).
There are a bunch of reasons why WASM was designed to be higher level than native assembly. One is that they want it to be more compact to save bandwidth when downloading via the internet. Another is that they want to make it faster to compile down to native code; one way they do this is that the control flow is more structured, for example it forbids irreducible control flow graphs. There is also the matter of safety. WASM defines several validation constraints that are checked before the bytecode is run e.g.: the target of a jump instruction must be listed in labels list. If WASM were too low level, it wouldn't be possible to do the safety checks they want.