CPU hardware doesn't find registers by name, it's up to the assembler to translate names like rax to 3 or 4-bit register numbers in machine code. (And the operand-size implied by the register name is also encoded via the opcode and (lack of) prefixes).
e.g. add ecx, edx assembles to
01 d1. Opcode 01 is add r/m32, r. The 2nd byte, the ModRM 0xd1 = 0b0b11010001, encodes the operands: the high 2 bits (11) are the addressing mode, plain register, not memory (for the dest in this case, because it's 01 add r/m32, r not 03 add r32, r/m32).
The middle 3 bits are the /r field, and 010 = 2 is the register number for edx.
The low 3 bits are the r/m field, and 001 is the register number of ECX.
(The numbering goes EAX, ECX, EDX, EBX, ..., probably because 8086 was designed for asm source compatibility with 8080 - i.e. "porting" on a per-instruction basis simple enough for a machine to do automatically.)
This is what the CPU is actually decoding, and what it uses to "address" its internal registers. A simple in-order CPU without register renaming could literally use these numbers directly as addresses in an SRAM that implemented the register file. (Especially if it was a RISC like MIPS or ARM. x86 is complicated because you can use the same register numbers with different widths, and you have partial registers like AH and AL mapping onto halves of AX. But still, it's just a matter of mapping register numbers to locations in SRAM, if you didn't do register renaming.)
For x86-64, register numbers are always 4-bit, but sometimes the leading zero is implicit, e.g. in an instruction without a REX prefix like mov eax, 60. The register number is in the low 3 bits of the opcode for that special encoding.
Physically, modern CPUs use a physical register file and a register-renaming table (RAT) to implement the architectural registers. So they can keep track of the value of RAX at multiple points in time. e.g. mov eax, 60 / push rax / mov eax, 12345 / push rax can run both mov instructions in parallel, writing to separate physical registers. But still sorting out which one each push should read from.
if thats the case, i am wondering why there are only 16 registers in x86_64 architecture ...
A new ISA being designed for the high-performance use-cases where x86 competes would very likely have 32 integer registers. But shoehorning that into x86 machine code (like AVX-512 did for vector regs), wouldn't be worth the code-size cost.
x86-64 evolved out of 16-bit 8086, designed in 1979. Many of the design choices made then are not what you'd make if starting fresh now, with modern transistor budgets. (And not aiming for asm source-level compatibility with 8-bit 8080).
More architectural registers costs more bits in the machine code for each operand. More physical registers just means more out-of-order exec capability to handle more register renaming. (The physical register numbering is an internal detail.) This article measures practical out-of-order window size for hiding cache miss latency and compares it to known ROB and PRF sizes - in some cases the CPU runs out of physical registers to rename onto, before it fills the ROB, for that chosen mix of filler instructions.
, doesn't more registers means more performance ?
More architectural registers does generally help performance, but there are diminishing returns. 16 avoids a lot of store/reload work vs. 8, but increasing to 32 only saves a bit more store/reload work; 16 is often enough for compilers to keep everything they want in registers.
The fact that AMD managed to extend it to 16 registers (up from 8) is already a significant improvement. Yes, 32 integer regs would be somewhat better sometimes, but couldn't be done without redesigning the machine-code format, or with much longer prefixes (like AVX-512's 4-byte EVEX prefix, which allow 32 SIMD registers, x/y/zmm0..31 for AVX-512 instructions.)
See also:
Related Q&As: