I need to use 32 registers as general purposes registers. Is it possible. How?
No, the architecture only defines 16, and they are not completely general purpose. Some instructions only work with certain registers. What you probably want to do is define your state in an activation record (data structure) on the stack (where C local variables go), and then load those values into registers as needed. I could only elaborate if I understood what you were trying to do, but I suggest that you look at the ABI for the OS (or some OS, if you're not using an OS) to see what is expected to happen to registers when a procedure call is made. Using an ABI to guide your register usage will also help you with interop with a higher level language such a C or C++.
I have heard that x86-64 processor has more general purposes registers but they are unnamed. There are only 16 named registers. So, is it true? And is it possible to use them?
The other general purpose registers are for the out-of-order-execution system to schedule execution of upcoming instructions in the instruction stream without changing the serial semantics of the instruction stream. This process is called "register renaming". On some chips, the extra registers are not present at all because those chips do not perform out-of-order execution. The extra registers are an implementation detail of the CPU, and they are not accessible from the x86_64 instruction set. Other architectures have avoided out-of-order-execution by providing a VLIW (Very Long Instruction Word) instruction set, which uses the compiler to schedule the instructions instead of letting the hardware schedule them. The Itanium is such an architecture.
When Itanium was produced the VLIW architecture had fallen out of favor, so they called it EPIC (Explicitly Parallel Instruction Computing) instead of VLIW, but it was still VLIW. The Itanium has 128 general purpose registers, which is because it's expected that you (a C or C++ compiler) will schedule a large number of simultaneous operations (semantically). Each instruction packet has 3 instructions (and predicate indicators for each of the 3) and an indicator if the following packet is expected to be (semantically) executed simultaneously. It does not have to be executed simultaneously. You could chain 27 instructions to execute simultaneously, but it might execute 3 at a time if you use a lower-end Itanium, or 9 at a time if you use a higher end Itanium, but the results would be the same on either processor, it would just take a longer or shorter number of cycles until the following instruction is executed.
As I said, VLIW has fallen out of favor because C and C++ compilers can order instructions in such a way that the out-of-order execution system could determine the data dependencies of the instruction stream and do a similar job of scheduling, and that also allows for future processors to have a wider execution pipeline stage without capping the register count at 128. That's the theory anyway.
You might get a better answer if you give more details about what you're trying to do. If you're trying to emulate a processor with 32 registers on an x86_64, then you don't need 1-to-1 mapping of registers. The ABI of the platform which you're emulating will tell you what is statistically likely to happen because procedures are most likely used, and they have a well defined (though different per CPU an OS) convention for every platform. Also, please consider C or C++ for most of such a project. You will not gain anything by writing it all in assembly, except for difficulty in porting.