IIRC, the stated (or assumed? I forget) rationale is that there's no future-compatible mechanism for functions to save/restore the full vector register width1. And the ABI designers were unwilling to say that only the baseline 128 bits, or low scalar element (64-bits) were call-preserved for a few registers, with future upper parts not.
You're right that AVX-512 was an opportunity to improve the situation, e.g. by defining XMM28..31 as call-preserved. (Scalar code often benefits from a one or two FP variables staying in registers, especially across calls to functions, including math library functions. For example, see the slowdown in an example where a hand-written asm version can't inline, but plain-C functions using sqrt can.)
Yes, this is fairly poor design, and causes spill/reload slowdowns in loops with function calls and (often scalar) FP. Sometimes even introducing store-forwarding latency into the critical path, e.g. in a loop involving a log(), or even worse a cheap library function like sqrt() if you fail to compile with -fno-math-errno so GCC can only speculatively inline it.
Footnote 1: xsave/xrstor and friends are usable from user-space, but that's not efficient/practical for functions. And IIRC you need to pass a mask of which parts of the state to store so OSes need to know about new extensions to the size of the architectural state is saves, so even that doesn't solve the problem of old libraries or other binaries saving/restoring wider registers.
What's the advantage of having nonvolatile registers in a calling convention? Windows x64 has 10 call-preserved XMM regs, which is probably too many, leaving only 6 call-clobbered for leaf functions to use without spending extra instructions saving/restoring.
Why do SSE instructions preserve the upper 128-bit of the YMM registers? - Intel's AVX design decision to have legacy-SSE instructions leave upper halves unmodified, mostly because of binary-only Windows kernel drivers that manually save/restore a few XMM regs.
When x86-64 (and SSE2) were new, there was no clue how future SIMD extensions would work, and some code was written to work now without an eye for the future. Also, x87 was always treated as call-clobbered, because its stack nature makes it hard for a function to know how many if any elements need saving/restoring if it wants to use the full 8 st0..7 registers. So historically x86 calling conventions didn't have any call-preserved FP registers; perhaps that's why GCC devs unfortunately didn't consider the value in having a couple.