I don't know much about the internal working of the CPU, and my understanding of SSE is equally basic; it works in the form of additional long registers that pack some number of data types you want a perform a single operation on (in parallel) using a single instruction.
Great, but why isn't every register and every operation like that by default? If I want to add two integers, why would I need to place each in two separate registers and do the operation through multiple instructions when I could just do it through SSE? does it interfere with concurrency somehow? is it a hardware limitation?
Thanks! If there are somewhat easy to follow sources as well, I would gladly appreciate it