First part: all library functions follow the standard calling convention. On all x86-64 platforms other than Windows, that's the x86-64 System V ABI.
You can make up your own conventions when writing your own asm functions, like returning multiple different values in multiple registers instead of limiting yourself to only what you could get a C compiler to do.
(e.g. you could write a memcmp that returns the position of the first difference in RDI and the actual < = or > in FLAGS, e.g. from doing a cmp on the mismatching bytes.)
But compiler-generated functions you can call from asm (including C standard library functions) will always follow the ABI.
Second part: implicit usage of registers by some instructions: check the ISA manual for relevant instructions. If you don't know it, don't just assume from the name.
You can single-step in a debugger that highlights register-value changes to help you notice any case where a register changes that you weren't expecting at all.
Look instructions up on in Intel's vol.2 manual (or AMD's equivalent). e.g. HTML extract of the Intel's PDF at https://www.felixcloutier.com/x86/, specifically the entry for loop. Also How exactly does the x86 LOOP instruction work? explains that it's like a dec rcx / jnz except without setting FLAGS.
There aren't that many instructions with implicit operands. The most commonly used ones are stack instructions like push/pop implicitly using RSP in the obvious way.
The other notable ones include E/RAX and E/RDX being used by one-operand [i]mul and [i]div. (And cdq to sign-extend EAX into EDX:EAX to set up for idiv, or cdqe into RAX)
CL for variable shift counts is implicit in the machine code, but explicit in asm source (like shr rdx, cl).
rep-"string" instructions implicitly use RCX, plus RSI and/or RDI.
Most of these implicit uses come from old 8086 history. See Why is there not a register that contains the higher bytes of EAX?. Instructions like loop and jrcxz aren't used by compilers because they're slow, and the 2-operand form of imul like imul ecx, edx are faster when you don't need the high half result in EDX/RDX.
Further reading:
This is not an exhaustive list. cmpxchg / cmpxchg16b, xlat, cpuid, rdtsc, rdpmc, and many others have implicit operands, but only a few of the instructions that get used regularly by compilers do.
Note that FLAGS is an implicit input to many instructions, like adc and cmov.
NASM has an appendix that lists all instructions, but generally assemblers leave that up to CPU vendors. All x86-64 assemblers produce machine code for the same instructions. This bugfixed fork of an older version of that doc keeps English descriptions of instructions. (Mainline NASM removed that for space after adding SSE instructions; there are just too many to do more than list in one flat page these days, with AVX2 and especially AVX512.)