Register banking is something else, I assume you are simply asking about using a register directly or not. Well the memory access takes an eternity, even if cached. Several to hundreds of clock cycles for each of the operands where in RISC if you are assuming a pure register based scheme which not all are, the lines are getting fuzzy. With CISC if microcoded it is going to registers anyway, then the operation is happening, if not microcoded then it still gets latched into internal temporary storage (registers) and then the operation can begin. With risc you have a couple-three extra, simpler, instructions the latching to registers takes the same amount of time as it does in CISC. Now if the algorithm never uses that result or does not use it for a while, it might be a win for CISC (if not microcoded) but if the value is an intermediate value in an algorithm then a clear win for RISC. Even if everything is cached it is a half a dozen to dozen clock cycles to get each parameter and write it back, any cache misses and it is an eternity. Same for RISC but with more registers, and significantly faster access to those registers, zero or one clock for each value and to store back, for some percentage if not the whole algorithm.
As with any benchmarking it is trivial to show a RISC winning case and to show a CISC winning case.
The major difference between RISC and CISC is CISC are complicated time consuming instructions where RISC they are much simpler, you arrange the tasks you need to do and have tighter control over those tasks, you dont have a lot of waste per step. One could argue caches were created to deal with the inefficiencies of CISC or at least one popular one. Both benefit sure, but one relies on the other doesnt as much. Trivial to show CISC winning code and trivial to show RISC winning code. Same goes for VLIW, and others.
RISC designs are simpler, smaller, pipes can be shorter, compiler has more control over the performance, etc. So with microcontrollers you can have a very nice processor core with a 3 stage pipeline that is really low power and still quite efficient. The 6502, z80, 8051, etc have really died off for the most part, you still do see a lot of 8051s if you are looking, the desktop/laptop you might be reading this with probably has one 8051, but that is due to royalties and not because of its size or performance, you probably have several to dozens of ARM cores for every x86, within the same box or certainly around the house. A CISC is going to be relatively massive and inefficient, it might be possible to get the power consumption down to RISC levels, that may just be a matter of design and not CISC vs RISC, but the RISC implementations are doing a much better job at watts per mhz than the CISC implementations.