0

I need to calculate the speed difference between performing a Montgomery Multiplication page 602-603 with a word-size/register of size 32 vs. 64.

So far, this is what I understand:

  • x and y are represented by multiple-word arrays of length n where n = m/w and w is the register size (either 32 or 64).
  • The total number of single-digit multiplications in Montgomery multiplication is n*(2 + 2*n), where n represents the number length of the word-arrays.
  • I will assume that the multiplication of two single-digit takes 1 clock cycle on each of the computers.

How can I put all this together to represent the number of clock cycles needed in Montgomery multiplication on a computer with a 32-bit register or 64-bit register?

em0605
  • 197
  • 2
  • 9

1 Answers1

1

The number of cycles for a multiple-precision Montgomery multiplication would indeed be n(2+2*n) if all the intermediate single-precision multiplication operands and results were available in registers. For cryptographic operations this is hardly possible since m is usually 1024 or larger. Assuming 32-bit registers (xyR^-1 mod m) would require 192 registers only to store the operands (3*(1024/32)). In fact you need to take into account memory accesses to answer this question.

A rewrite of the algorithm with memory accesses (assuming multiplications can be done in parallel with loads/stores):

  1. For i from 0 to n: a_i <- 0
  2. For i from 0 to (n − 1) do the following:
    1. Fetch a_0
    2. Fetch y_0
    3. Fetch x_i
    4. Compute u_i <- (a_0 + x_i*y_0)m' mod b. Store u_i in a register
    5. c = 0 (Computing A <- (A + x_i*y + u_i*m)/b)
    6. for j from 0 to (n-1):
      1. Fetch a_j
      2. Fetch y_j
      3. Compute (cv) = a_j + x_i*y_j + c, Fetch m_j
      4. Compute (cv) = (cv) + u_i*m_j, if j>0 Store a_{j-1} <- v
    7. Store a_n <- c and a_{n-1} <- v
  3. If A >= m then A <- A − m.
  4. Return(A).

Hope this helps.

Claudio
  • 927
  • 7
  • 15