Stochastic methods defeat regular RSA exponentiation algorithms with combined blinding methods

Margaux Dugardin; Werner Schindler; Sylvain Guilley

doi:10.1515/jmc-2020-0010

Article Open Access

Stochastic methods defeat regular RSA exponentiation algorithms with combined blinding methods

Margaux Dugardin , Werner Schindler and Sylvain Guilley

Published/Copyright: April 20, 2021

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Journal of Mathematical Cryptology Volume 15 Issue 1

Abstract

Extra-reductions occurring in Montgomery multiplications disclose side-channel information which can be exploited even in stringent contexts. In this article, we derive stochastic attacks to defeat Rivest-Shamir-Adleman (RSA) with Montgomery ladder regular exponentiation coupled with base blinding. Namely, we leverage on precharacterized multivariate probability mass functions of extra-reductions between pairs of (multiplication, square) in one iteration of the RSA algorithm and that of the next one(s) to build a maximum likelihood distinguisher. The efficiency of our attack (in terms of required traces) is more than double compared to the state-of-the-art. In addition to this result, we also apply our method to the case of regular exponentiation, base blinding, and modulus blinding. Quite surprisingly, modulus blinding does not make our attack impossible, and so even for large sizes of the modulus randomizing element. At the cost of larger sample sizes our attacks tolerate noisy measurements. Fortunately, effective countermeasures exist.

Keywords: RSA; Montgomery multiplication; side-channel analysis; optimization; extra-reduction; regular exponentiation; maximum likelihood; base blinding; modulus blinding

MSC 2010: 94A60; 60G99

1 Introduction

It has been noted by Kocher [13] as early as 1996 that asymmetric cryptographic algorithms are prone to side-channel attacks. Countermeasures have been developed in a view to make these attacks either impossible or at least much harder to perform. There are several countermeasure principles. One first class consists in balancing the control-flow so that execution traces perfectly superimpose whatever the value of the secrets. A second important class of countermeasures consists in deceiving correlation attempts by attacker with side-channel traces. The strategy consists in randomizing algorithm inputs or internal parameters, so that the computation is carried out on unpredictable data. Obviously, the randomization is restricted, since it must be possible to unravel the injected randomness at the end of the computation.

In this article, we focus on the Rivest-Shamir-Adleman (RSA) cryptosystem, while it uses its secret exponent k . Despite the balancing and randomization countermeasures, attackers will desperately persist at recovering k . But in order to bypass protections, the attacker needs to resort to more evolved strategies. We make a difference between attacks which can be carried out in one single trace and those which require multiple traces (since there is not enough information in a single trace). An attack which succeeds with one single trace can overcome any algorithmic countermeasure:^[1] basically, against randomizing countermeasures, it will recover the randomized version of some sensitive value, but this randomized value is still sufficient for the adversary to behave as if he knows the secret. As an example, in the case of exponent blinding, instead of computing m k mod N (where m is the base, k is the secret exponent, and N is the modulus), the side-channel protected RSA computes m k + φ ( N ) mod N (where φ is the Euler totient function). Those two quantities are equal, owing to the Fermat little theorem, hence, it does not matter if the attacker recovers k ′ = k + φ ( N ) in lieu of k : in both cases he can forge valid signatures or decrypt messages correctly. Indeed, k ′ is equivalent to k for the purpose of signature generation or decryption. When attacks require some kind of averaging, then randomization countermeasures do work in concealing the secret, at least if the randomness is refreshed at each new computation. However, the balancing countermeasures do not deceive an attacker which averages traces, because the averaging of always the same execution allows for the attacker to increase the signal-to-noise ratio (SNR).

In practice, the attacks which succeed in a single trace are the more dangerous, and implementers defend their implementation in the first place. The so-called simple power analysis (SPA [14, §2]) introduced in 1999 allows us to read out the exponent in one trace. Therefore, the usual countermeasure consists in the implementation of a regular exponentiation algorithm. In RSA, the so-called “regular algorithm” is a method to compute the modular exponentiation using a key-independent sequence of squaring and multiplication operations. Examples of regular exponentiation algorithms are the Montgomery ladder (treated in this paper), the square and multiply always algorithm, or fixed window exponentiation with explicit multiplication also if the exponent bits in the current window are all equal to zero [15, Algorithm 14.82].

Thus, it is a protection against the simple trace analysis, where the attacker attempts to derive the exponent by observing one (or several identical) computation. The regular exponentiation countermeasure against SPA plugs the leak, but in the meantime takes care to properly align traces corresponding to various executions. This is at the advantage of the adversary, in that such unfortunate alignment opens the door to differential power analyses, as discussed in [14, §5], to template attacks [5], or to machine learning attacks [19]. Those attacks, provided they require to collect several traces from the same inputs (for averaging in order to increase the SNR), are combated by randomizing countermeasures. For instance, the input of the RSA (its base) can be randomized at the input, while being consistently derandomized at the output. Another option to randomize the intermediate computations is to randomize the modulus (so-called “modular extension”). This second option also allows us to perform a sanity check for the computation, which is incidentally a countermeasure against fault injection attacks [7]. We insist that all three countermeasures might well be stacked one on top of each other, so as to thwart simple power attacks, differential power attacks, and perturbation attacks, altogether. As an alternative to regular exponentiation algorithm, or even as a complement to it, the secret exponent can be protected by blinding, as explained earlier.

2 Previous work and our contributions

2.1 State-of-the-art

We analyze in this article possible remaining biases, namely, extra-reductions inherent to the modular multiplication algorithm.

Given two integers a and b , the classical modular multiplication a × b mod p computes the multiplication a × b followed by the modular reduction by p . Montgomery Modular Multiplication (MMM) transforms a and b into special representations known as their Montgomery forms.

Definition 2.1

(Montgomery Transformation [16]) For any modulus p , the Montgomery form of a ∈ F p is ϕ ( a ) = a × R mod p for some constant R greater than and co-prime with p .

In order to ease the computation, R is usually chosen as the smallest power of two greater than p , that is, R = 2 ⌈ log 2 ( p ) ⌉ . Using the Montgomery form of integers, modular multiplications used in modular exponentiation algorithms can be carried out using the MMM:

Definition 2.2

(MMM [16]) Let ϕ ( a ) and ϕ ( b ) be two elements of F p in the Montgomery form. The MMM of ϕ ( a ) and ϕ ( b ) is ϕ ( a ) × ϕ ( b ) × R − 1 mod p .

Proposition 2.3

(MMM correction [15, §14.36]) The output of the MMM of ϕ ( a ) and ϕ ( b ) is ϕ ( a b ) .

Algorithm 1 shows that the MMM can be implemented in two steps:

compute D = ϕ ( a ) × ϕ ( b ) , then
reduce D using Montgomery reduction which returns ϕ ( c ) .

In Algorithm 1, the pair ( R − 1 , v ) is such that R R − 1 − v p = 1 .

Algorithm 1

input : D = ϕ ( a ) × ϕ ( b )

output : ϕ ( c ) = ϕ ( a ) × ϕ ( b ) × R − 1 mod p

1 m ← ( D mod R ) × v mod R ;

2 U ← ( D + m × p ) ∕ R ; // Invariant: 0 ≤ U < 2 p

3 if U ≥ p then

4 C ← U − p ; // Extra-reduction

5 C ← U ;

6 return C ;

Montgomery reduction (Algorithm 14.32 of [15])

Definition 2.4

(Extra-reduction) In Algorithm 1, when the intermediate value U is greater than p , a subtraction named eXtra-reduction occurs so as to have a result C of the Montgomery multiplication (MM) between 0 and p − 1 . We set X = 1 in the presence of the extra-reduction, and X = 0 in its absence.

As we shall explain, this side channel is induced by the choice of moduli represented on a bitwidth, which is exactly divisible by the bitwidth of the computers, namely, this bitwidth is typically a power of two, such as 16, 32, or 64. This bias has given rise to the so-called extra-reduction analysis (ERA). An overview of known ERAs is provided in Table 1. Specifically, this table shows which countermeasure can be bypassed by which attack. The classification criteria in Table 1 are listed as follows:

the implementation uses the Chinese Remainder Theorem (CRT), i.e., the moduli p and q are unknown to the attacker,
the protection against differential power analysis named the base blinding,
the protection against SPA protection named the regular exponentiation algorithm,
the compensation of the extra-reduction by a fake operation, which is named constant time nonstraight line algorithm (N-SLA), i.e., constant operations have their fixed values identified by software.^[2] In principle (at least with a reasonable probability), these countermeasures might be detected and nullified by a suitable side-channel attack. In Table 1, we assume that such side-channel attacks exist,
identical execution times are ensured by avoiding extra-reductions at all, which is named constant time straight line algorithm (SLA). Obviously, the attacks listed in Table 1 cannot work in this case, see also Section 5,
the protection against differential power analysis named the exponent blinding, and
the fault and differential protection named modular extension.

Table 1

Summary of capability of extra-reduction analyses published before December 2020

	With RSA-CRT	Basis blinding	Regular algorithm	Constant time N-SLA	Constant time SLA	Exponent blinding	Modular extension
`ERA-1a`	✗	✗	✗	✗	✗	✗	✗
[9,13,22,25]	No	No	No	No	No	No	No
`ERA-1b`	✓	✗	✓	✗	✗	✗	✗
[3,6,8,20]	Yes	No	Yes	No	No	No	No
`ERA-2`	✓	✗	✗	✗	✗	✓	✗
[23,24]	Yes	No	No	No	No	Yes	No
`ERA-L1`	✓	✓	✓	✗	✗	✗	✗
[1,2,21,26,30]	Yes/No	Yes	Yes	No	No	No	No
`ERA-L2`	✓	✓	✓	✓	✗	✗	✗
[10,11]	Yes/No	Yes	Yes	Yes	No	No	No
This work	✓	✓	✓	✓	✗	✗	✓
	Yes/No	Yes	Yes	Yes	No	No	Yes

The algorithms from ERA-1a, ERA-1b, and ERA-2 are pure (global) timing attacks. Of course, by definition, pure timing attacks cannot overcome constant time implementations. While the pure timing attacks are very different for CRT implementations and for non-CRT implementations the local timing attacks from ERA-L1 and ERA-L2 work for the CRT and non-CRT implementations as well. More precisely, these local attacks are a little bit easier to perform on non-CRT implementations because the ratio p / R (and sometimes also the value R 2 ( mod p ) / p ) does not have to be estimated there. For these reasons, we did not distinguish between CRT and not CRT there. The pioneer papers [9,30] are significantly less efficient than their successors in the respective ERA (up to factor 50) and less general [30]. The difference between ERA-L1 and ERA-L2 is that with ERA-L2, the attacker is capable of probing the cache to distinguish between two different execution paths of otherwise identical duration and power leakage, whereas with ERA-L1, the attacker is restricted to observe the duration or the power leakage. Arguably, this difference resides more in the side-channel collection than in its analysis.

Remark

The terminology in Table 1 shall be considered with attention. Indeed, historically, ERA-1a, ERA-1b, and ERA-2 are pure timing attacks discovered in this order. Similarly, ERA-L1 and ERA-L2 are local timing attacks, discovered in this order. But some papers about ERA-1b were published after the papers from ERA-L1 and vice versa.

In [10,11], side-channel attacks on RSA, with CRT and without CRT, were investigated using leakage information of the presence or absence of the extra-reductions in MMM. The side-channel information was used to identify, which MMs require extra reductions. Two exponentiation algorithms were considered, namely, the always square and multiply exponentiation and the Montgomery ladder. The overall attacks split into many individual decisions whether ( k i = k i − 1 ) or ( k i ≠ k i − 1 ) , where k i and k i − 1 denote subsequent key bits. The presented attacks were successful but for these decisions only two – one squaring and one multiplication – out of four Montgomery operations (squaring or multiplication) were exploited. However, the approach is too complex: the derivation of the probability mass function (PMF) of values for multiple operations becomes mathematically intractable when the number of operations analyzed jointly is strictly greater than two.

2.2 Novel contributions

For these reasons, in this article, we resort to another way to estimate the distribution of the extra-reduction which does not need the estimation of PMF values. We leverage on a previous work of Schindler [21]: this paper simplifies the characterization of the extra-reduction distribution using two elegant properties of MMM.

Using sophisticated stochastic methods, we solve the problem and improve the efficiency of [10,11], in the presence of regular exponent and base blinding.

Moreover, we extend the results to the case where the modulus is itself randomized. We show that ERA remains a powerful side-channel despite the stacking of three protections, namely, regular exponentiation and base and modulus blinding. We performed our experiments on 1024-bit RSA moduli as this allows a fair comparison of the attack efficiency with the experimental results in [10,11].

This manuscript contains joint research work from the years 2016–2018. We mention that parts of an intermediate version of this paper have found input in the PhD thesis of the lead author.

2.3 Outline

The rest of this paper is organized as follows. We start by giving our optimized attack in Section 3. Namely, we recapitulate in Section 3.1 the background to optimize the state-of-the-art when RSA uses a regular algorithm (we focus on the so-called Montgomery ladder) and base blinding. The core of our attack is presented in Section 3.2. Evaluation with both perfect and noisy measurements is conducted in Section 4, where we also consider the “modulus extension” as a third countermeasure on top of regular exponentiation and base blinding. Eventually, countermeasures are addressed in Section 5, and conclusions are derived in Section 6. Some formal computation results are given in Appendix 7.

3 The optimized attack: the stochastic background

In this section, we optimize the attack from [10,11]. We begin with definitions and we formulate the target of our attack in Section 3.1. In Section 3.2, we analyze the stochastic properties of the MM, and in Lemma 3.4 we develop a formula for the joint probability of several extra-reductions. The following subsections treat the estimation of two parameters, which are usually unknown, and the maximum likelihood estimator is derived.

3.1 Definitions and target of the attack

In this paper, we only consider the Montgomery ladder (left-to-right), which is described in Algorithm 2. Unlike [10,11] we do not consider the square and always multiply algorithm (cf. Algorithm 1.1 in [11]). It is obvious how the applied mathematical methods can be transferred to the square and always multiply exponentiation algorithm.

We assume that the message m has been blinded (message blinding, a.k.a. base blinding). The attack applies to both RSA with CRT and RSA without CRT. We further assume that the arithmetic operations apply the Montgomery’s multiplication algorithm [17]. As in [10,11] we assume that a side-channel attack yields the (possibly noisy) information, which MM need extra-reductions. The applied mathematical techniques are similar to that in [1,2,21], where attacks on different variants of fixed window exponentiation algorithms [2,21] and the sliding window exponentiation algorithm [1] were analyzed thoroughly.

To avoid clumsy formulations we always target RSA with CRT in the following, where p denotes one prime factor of the RSA modulus n . We note that the attack on RSA without CRT works identically and is even simpler since there is no need to estimate the ratio n / R (which is the ratio of two public parameters).

Definition 3.1 describes the notations, necessary to understand this paper.

Definition 3.1

For i = l − 1 , l − 2 , … , 0 , and j = 0 , 1 , the term r i , j denotes the value of register R j after the key bit k i has been processed. Furthermore, s i , j ≔ r i , j / p ∈ 0 , 1 stands for the normalized register values. For i = l − 2 , … , 0 , we set w i ( M ) = 1 if the first Montgomery operation for key bit k i (“multiplication”) needs an extra-reduction (ER) and w i ( M ) = 0 otherwise. Analogously, w i ( Q ) = 1 if the second Montgomery operation for key bit k i (“squaring,” or “Quadrierung” in German – we apply “Q” in place of “S” to prevent confusion with the stochastic process S i ; j defined below) needs an ER and w i ( Q ) = 0 otherwise. We recall that in the context of random variables the abbreviation “iid” stands for “independent and identically distributed.” The indicator function 1 A ( x ) assumes the value 1 if x ∈ A and 0 else. For b ∈ Z , the term b ( mod p ) denotes the unique element in Z p = { 0 , 1 , … , p − 1 } , which is congruent to b modulo p . The letter R denotes the Montgomery constant R = 2 x for some integer x ≥ ⌈ log 2 p ⌉ . (Usually, x = ⌈ log 2 p ⌉ .) When b is a real number, the term b ( mod p ) denotes the real number b − ⌊ b / p ⌋ . Finally, for a , b ∈ Z p we define MM ( a , b ; p ) ≔ a b R − 1 ( mod p ) (MM, as per Definition 2.2).

Algorithm 2

Input: m , k = ( k l − 1 k l − 2 … k 0 ) 2 , p ( k l − 1 = 1 and k 0 = 1 )

Output: m k mod p

1 R 0 ← MM ( m , R 2 ; p )

2 R 1 ← MM ( R 0 , R 0 ; p ) // First Square

3 for i = l − 2 down to 0 do

4 5 R ¬ k i ← MM ( R 0 , R 1 ; p ) // i ( M ) R k i ← MM ( R k i , R k i ; p ) // i ( Q )

6 return MM ( R 0 , 1 ; p )

Left-to-right Montgomery ladder with MM algorithm

We note that MM ( m , R 2 ; p ) ≡ m R ( mod p ) and MM ( R 0 , 1 ; p ) ≡ R 0 R − 1 ( mod p ) (cf. lines 1 and 6 of Algorithm 2). Besides, the key k is chosen of full length (hence k l − 1 = 1 ) and must be coprime with p − 1 , which is even (as p is a prime number); therefore, k is odd (hence k 0 = 1 ). This gives for free two bits of information to an attacker. The index l may be determined by an SPA. Moreover, it suffices to recover the exponent k for the exponentiation modulo p : If d denotes the secret RSA key and if y = x d ( mod n ) , then gcd ( x k ( mod n ) , y ) = p , which factorizes the modulus n (see, e.g., [21], Section 6).

3.2 The core of our attack

We interpret the s i , j as realizations of random variables S i , j , i.e., values taken on by S i , j , which assume values in 0 , 1 . Analogously, we view w i ( M ) and w i ( Q ) as realizations of { 0 , 1 } -valued random variables W i ( M ) and W i ( Q ) . Lemmas 3.2(i) and (ii) collect known stochastic properties of Montgomery’s multiplication algorithm, while Assertions (iii) and (iv) follow the strategy that has proven successful for fixed-window exponentiation in [2,21].

Lemma 3.2

(MM)

MM ( a , b ; p ) requires an extra-reduction iff

(3.1) a p b p p R + a b p ( mod R ) R ≥ 1 iff MM ( a , b , p ) p < a p b p p R .
Assume that a ∈ Z p and that the random variable B is uniformly distributed on Z p . Furthermore, U and V denote independent random variables, which are uniformly distributed on 0 , 1 . Then approximately
(3.2) P a p B p p R + a B p ( mod R ) R ≥ 1 = P a p p R U + V ≥ 1 = p 2 R a p ,

(3.3) P B p B p p R + B 2 p ( mod R ) R ≥ 1 = P p R U 2 + V ≥ 1 = p 3 R .
The random variables S l , 0 , S l , 1 , S l − 1 , 0 , … , S 0 , 0 , S 0 , 1 may be viewed as iid uniformly distributed on 0 , 1 .
For i = l − 1 , … , 0 , we have
(3.4) W i ( M ) = 1 S i , 1 < S i + 1 , 0 S i + 1 , 1 p R if k i = 0 1 S i , 0 < S i + 1 , 0 S i + 1 , 1 p R if k i = 1 ,

(3.5) W i ( Q ) = 1 S i , 0 < S i + 1 , 0 2 p R if k i = 0 1 S i , 1 < S i + 1 , 1 2 p R if k i = 1 .
For the indicator functions, we obtain
(3.6) 1 { W i ( M ) = 1 } = W i ( M ) , 1 { W i ( M ) = 0 } = 1 − W i ( M ) ,

(3.7) 1 { W i ( Q ) = 1 } = W i ( Q ) , 1 { W i ( Q ) = 0 } = 1 − W i ( Q ) .

Proof

Assertions (i) and (ii) are shown in [22] (see Lemma A.3 and its proof at page 209). The core idea of the approximate representations (3.2) and (3.3) is that a small deviation of the random variable B (resp. of B / p ) causes only a small deviation of the first summand but implies an “uncontrolled large” deviation of the second summand over the unit interval. We note that if U and V are independent, then U and ( a / R ) U + V ( mod 1 ) are independent, too. Since the base m (Algorithm 2) has been base-blinded, we may assume that s l , 0 = r l , 0 / p = m / p is a realization of a random variable S l , 0 , which is uniformly distributed on the unit interval 0 , 1 . Following (3.3) we further assume that S 1 , 0 is also uniformly distributed on 0 , 1 and that S l , 0 and S l , 1 are independent (see also Remark 3.3). Now let us assume that the random variables S l , 0 , S l , 1 , S l − 1 , 0 , … , S t + 1 , 1 are iid uniformly distributed on 0 , 1 . If ( k i , k i − 1 ) = ( 0 , 0 ) we may replace ( a / p ) , U (approximation of B / p ), and V in (3.2) by S i + 1 , 0 , S i + 1 , 1 , and V i , 0 , and analogously U and V in (3.3) by S i + 1 , 0 and V i , 1 , where V i , 0 and V i , 1 are uniformly distributed on 0 , 1 and independent of S l , 0 , … , S i + 1 , 1 . Furthermore, the assumption that V i , 0 and V i , 1 are independent seems to be reasonable since S i + 1 , 1 and S i , 1 are independent. This assumption finally implies that the random variables S l , 0 , … , S i , 1 are independent. Formula (3.4) follows from (3.1) if we replace the terms ( a / p ) and ( B / p ) by S i + 1 , 0 and S i + 1 , 1 (cf. (3.2)), and further MM ( a , b ; p ) / p by S i , 1 . Analogously, to verify (3.5) one replaces in (3.1) the terms ( B / p ) and MM ( a , b ; p ) / p by S i + 1 , 0 and S i , 1 , respectively. The cases ( k i , k i − 1 ) ∈ { ( 1 , 0 ) , ( 0 , 1 ) , ( 1 , 1 ) } are similar. Assertion (v) follows immediately from the definition of indicator functions. This completes the proof of Lemma 3.2.□

Remark 3.3

(The independence assumption) A central assertion of Lemma 3.2, which is used in Lemma 3.4, is that random variables S i , j may be viewed iid uniformly distributed on 0 , 1 . This property has been deduced from the (approximate) stochastic representations (3.2) and (3.3). In a strict sense, this claim is certainly not correct, e.g., because the normalized register values r i , j / p only assume values in the finite set Z p / p ⊆ 0 , 1 , and to mention just one missing number theoretical property, the r i , j cannot assume non-quadratic residua in Z p . However, this is not relevant for our purposes since we are only interested in the (joint) probabilities of extra reductions. These events can be characterized by “metric” conditions in R (cf. (3.1), (3.2), (3.3)). It should be noted that the iid assumption on the normalized intermediate random variables of the exponentiation algorithm (here: the S i , j ) has been proven successful, e.g., in [2, 3,20, 21,22], and it will turn out to be successful in the following, too.

The overall attack consists of many independent decisions (which nevertheless influence each other). Each of these attack steps (decisions) considers all MM simultaneously, which are carried out when u consecutive key bits ( k i , … , k i − u + 1 ) are processed. Lemma 3.4 is the core of our attack. It provides the probabilities, which are needed later in Lemma 4.6 (maximum likelihood decision strategy).

Lemma 3.4

Let u ≥ 2 and θ → = ( θ 1 , … , θ u ) ∈ { 0 , 1 } u .

The term (3.8) quantifies the probability that the extra-reduction vector ( w i ( M ) , w i ( Q ) , … , w i − u + 1 ( M ) , w i − u + 1 ( Q ) ) occurs if ( k i , … , k i − u + 1 ) = ( θ 1 , … , θ u ) . The probabilities are expressed by integrals over 0 , 1 2 u + 2 . The index θ → shows the dependency on θ .
(3.8) P θ → W i ( M ) = w i ( M ) , W i ( Q ) = w i ( Q ) , … , W i − u + 1 ( M ) = w i − u + 1 ( M ) , W i − u + 1 ( Q ) = w i − u + 1 ( Q ) = ∫ 0 1 ∫ 0 1 ∫ a 1 b 1 ∫ a 2 b 2 ⋯ ∫ a 2 u − 1 b 2 u − 1 ∫ a 2 u b 2 u 1 d s i − u + 1 , 1 d s i − u + 1 , 0 … d s i , 1 d s i , 0 d s i + 1 , 1 d s i + 1 , 0 .
Note: When the key bit k j (for j ∈ { i , i − 1 , … , i − u + 1 } ) is processed the register value R v ( v ∈ { 0 , 1 } ) corresponds to the integration variable s j , v . The integration boundaries ( a 2 j , b 2 j ) and ( a 2 j − 1 , b 2 j − 1 ) correspond to the integration with regard to the variables s i − j + 1 , 1 and s i − j + 1 , 0 , respectively ( j = 1 , … , u ). The integration boundaries depend on the hypothesis θ → = ( θ 1 , … , θ u ) and the observed extra-reduction vector ( w i ( M ) , w i ( Q ) , … , w i − u + 1 ( M ) , w i − u + 1 ( Q ) ) . More precisely, for j ∈ { 1 , … , u } we have

If θ j = 0 , then

(3.9) ( a 2 j , b 2 j ) = 0 , s i − j + 2 , 0 s i − j + 2 , 1 p R if w i − j + 1 ( M ) = 1 s i − j + 2 , 0 s i − j + 2 , 1 p R , 1 if w i − j + 1 ( M ) = 0

(3.10) ( a 2 j − 1 , b 2 j − 1 ) = 0 , s i − j + 2 , 0 2 p R if w i − j + 1 ( Q ) = 1 s i − j + 2 , 0 2 p R , 1 if w i − j + 1 ( Q ) = 0 .
If θ j = 1 , then

(3.11) ( a 2 j , b 2 j ) = 0 , s i − j + 2 , 1 2 p R if w i − j + 1 ( Q ) = 1 s i − j + 2 , 1 2 p R , 1 if w i − j + 1 ( Q ) = 0
and

(3.12) ( a 2 j − 1 , b 2 j − 1 ) = 0 , s i − j + 2 , 0 s i − j + 2 , 1 p R if w i − j + 1 ( M ) = 1 s i − j + 2 , 0 s i − j + 2 , 1 p R , 1 if w i − j + 1 ( M ) = 0 .
Let 1 → ≔ ( 1 , … , 1 ) (with u components). For each hypothesis θ → ∈ { 0 , 1 } u and each extra-reduction vector ( w i ( M ) , w i ( Q ) , … , w i − t + 1 ( M ) , w i − u + 1 ( Q ) ) , we have
(3.13) P θ → ( W i ( M ) = w i ( M ) , … , W i − u + 1 ( Q ) = w i − u + 1 ( Q ) )

(3.14) P 1 → − θ → ( W i ( M ) = w i ( M ) , … , W i − u + 1 ( Q ) = w i − u + 1 ( Q ) ) .

Proof

By Lemma 3.2(iv), the random variables W i ( M ) , W i ( Q ) , … , W i − u + 1 ( M ) , W i − u + 1 ( Q ) can be expressed by indicator functions, which depend on the random variables S i + 1 , 1 , S i + 1 , 0 , … , S i − u + 1 , 1 , S i − u + 1 , 0 . This allows us to express the probability (3.8) by an integral over 0 , 1 2 u + 2 of a product of indicator functions. Furthermore, for j < u the indicator functions 1 { W i − j + 1 ( M ) = w i − j + 1 ( M ) } and 1 { W i − j + 1 ( Q ) = w i − j + 1 ( Q ) } actually only depend on s i + 1 , 1 , s i + 1 , 0 , … , s i − u + 2 , 1 , s i − u + 2 , 0 , while 1 { W i − u + 1 ( M ) = w i − u + 1 ( M ) } and 1 { W i − u + 1 ( Q ) = w i − u + 1 ( Q ) } merely depend on s i − u + 2 , 1 , s i − u + 2 , 0 , s i − u + 1 , 1 , s i − u + 1 , 0 . This allows us to express (3.8) in the form

(3.15) ∫ 0 , 1 2 u ∏ j = 1 u − 1 1 { W i − j + 1 ( M ) = w i − j + 1 ( M ) } ⋅ 1 { W i − j + 1 ( Q ) = w i − j + 1 ( Q ) } ⋅ ∫ a 2 u − 1 b 2 u − 1 ∫ a 2 u b 2 u 1 d s i − u + 1 , 1 d s i − u + 1 , 0 d s i − u + 2 , 1 … d s i + 2 , 0

with suitable integration boundaries a 2 u − 1 , b 2 u − 1 , a 2 u , b 2 u . These integration boundaries follow immediately from Lemma 3.2(iv) and (ii) with i − u + 1 in place of i . This verifies the formula (3.9) to (3.12) for j = u . The integral over 0 , 1 2 u can be transformed in the same way into a sequence of one-dimensional integrals. Since the integration boundaries a 1 , b 1 , … , a 2 u − 2 , b 2 u − 2 depend only on the left-hand indicator functions, i.e., on the observations w i ( M ) , w i ( Q ) , … , w i − u + 2 ( M ) , w i − u + 2 ( Q ) Lemma 3.4(i) can be verified by induction on u .

We first note that

ϕ : 0 , 1 2 u + 2 → 0 , 1 2 u + 2 , ϕ ( s i + 1 , 1 , s i + 1 , 0 , … , s i − u + 1 , 1 , s i − u + 1 , 0 ) ≔ ( s i + 1 , 0 , s i + 1 , 1 , … , s i − u + 1 , 1 , s i − u + 1 , 0 )

(swapping the right-hand indices from 0 to 1 and vice versa) defines a volume-preserving diffeomorphism on 0 , 1 2 u + 2 . As already pointed out above the probabilities (3.13) and (3.14) can be expressed by integrals over 0 , 1 2 u + 2 of indicator functions

∏ j = 1 u 1 [ θ → ] { W i − j + 1 ( M ) = w i − j + 1 ( M ) } ⋅ 1 [ θ → ] { W i − j + 1 ( Q ) = w i − j + 1 ( Q ) }

and

∏ j = 1 u 1 [ 1 → − θ → ] { W i − j + 1 ( M ) = w i − j + 1 ( M ) } ⋅ 1 [ 1 → − θ → ] { W i − j + 1 ( Q ) = w i − j + 1 ( Q ) }

respectively. The terms θ → and 1 → − θ → indicate the hypotheses. From Lemma 3.2(iv), we conclude that 1 [ 1 → − θ → ] { W i − j + 1 ( M ) = w i − j + 1 ( M ) } = 1 [ θ → ] { W i − j + 1 ( M ) = w i − j + 1 ( M ) } ∘ ϕ and 1 [ 1 → − θ → ] { W i − j + 1 ( Q ) = w i − j + 1 ( Q ) } = 1 [ θ → ] { W i − j + 1 ( Q ) = w i − j + 1 ( Q ) } ∘ ϕ for all j ≤ u , which completes the proof of Assertion (ii).□

Lemma 3.4(ii) says that the information contained in the extra-reduction vectors ( w i ( M ) , … , w i − u + 1 ( Q ) ) does not allow us to distinguish between the hypotheses θ → and 1 → − θ → . This means that we can only determine the set { θ → , 1 → − θ → } , as depicted in Figure 1.

Figure 1

Information collected during the presented attack on u pairs of extra-reductions.

In particular, it would be pointless to consider the case u = 1 . For u = 2 one can distinguish between the cases ( k i , k i − 1 ) ∈ { ( 0 , 0 ) , ( 1 , 1 ) } and ( k i , k i − 1 ) ∈ { ( 0 , 1 ) , ( 1 , 0 ) } , or equivalently, between k i = k i − 1 and k i ≠ k i − 1 . For u ≥ 2 , the parameter θ → ∈ { ( θ 1 , … , θ u ) , ( 1 − θ 1 , … , 1 − θ u ) } corresponds to

(3.16) ( k i ⊕ k i − 1 = θ 1 ⊕ θ 2 , … , k i − u + 2 ⊕ k i − u + 1 = θ u − 1 ⊕ θ u ) ,

where “ ⊕ ” denotes the addition modulo 2. For the sake of clarity, we precise that the components of vector 1 → − θ → can also be written as ( 1 ⊕ θ i ) = ¬ θ i for 1 ≤ i ≤ u .

Remark 3.5

Lemma 3.4 can be applied to all u -tuples ( k i , … , k i + u − 1 ) for i = l − 1 , … , u − 1 . Combining the information from all u -tuples only provides the vector ( k l − 1 ⊕ k l − 2 , … , k 1 ⊕ k 0 ) . This information determines the whole key k = ( k l = 1 , k l − 1 … , k 0 ) since k is odd due to gcd ( k , φ ( p ) ) = 1 (where we recall that φ is Euler totient function).
The probabilities in Lemma 3.4 do not depend on the index i . By Lemma 3.4(ii), it suffices to compute at most 2 3 u − 1 probabilities of type (3.8). (Note that 2 2 u different extra-reduction vectors exist and one has to distinguish between 2 u − 1 hypotheses.) Example 3.6 illustrates the calculation of one particular probability, and the appendix contains two tables with all probabilities for u = 2 .
For u = 2 , our attack aims at pairs of consecutive key bits ( k i , k i − 1 ) . This is like the original attack in [10,11], but the original attack only exploits the extra reductions ( w Q ( i ) , w M ( i − 1 ) ) while our attack considers ( w M ( i ) , w Q ( i ) , w M ( i − 1 ) , w Q ( i − 1 ) ) . The probabilities, which are applied in the original attack, are the marginal probabilities of the probability (3.8) with regard to ( w M ( i ) , w Q ( i − 1 ) ) . Obviously, the original attack exploits less information than the new attack for u = 2 , and experiments confirm that for u = 2 our new attack reduces by a factor greater than 2 the number of queries (cf. Figure 3).

Example 3.6

Let ( θ 1 , θ 2 ) = ( 0 , 1 ) and ( w i ( M ) , w i ( Q ) , w i − 1 ( M ) , w i − 1 ( Q ) ) = ( 1 , 1 , 0 , 1 ) . By Lemma 3.4(i),

(3.17) P θ ( W i ( M ) = 1 , W i ( Q ) = 1 , W i − 1 ( M ) = 0 , W i − 1 ( Q ) = 1 ) = ∫ 0 1 ∫ 0 1 ∫ 0 s i + 1 , 0 s i + 1 , 1 p R ∫ 0 s i + 1 , 1 2 p R ∫ 0 s i , 0 2 p R ∫ s i , 0 s i , 1 p R 1 1 d s i − 1 , 1 d s i − 1 , 0 d s i , 1 d s i , 0 d s i + 1 , 1 d s i + 1 , 0 = ⋯ = ∫ 0 1 ∫ 0 1 ∫ 0 s i + 1 , 0 s i + 1 , 1 p R ∫ 0 s i + 1 , 1 2 p R s i , 0 2 p R − s i , 0 3 s i , 1 p R 2 d s i , 1 d s i , 0 d s i + 1 , 1 d s i + 1 , 0 = ⋯ = ∫ 0 1 ∫ 0 1 1 3 s i + 1 , 0 3 s i + 1 , 1 5 p R 5 − 1 8 s i + 1 , 0 4 s i + 1 , 1 8 p R 8 d s i + 1 , 1 d s i + 1 , 0 = ⋯ = 1 3 ⋅ 4 ⋅ 6 p R 5 − 1 8 ⋅ 5 ⋅ 9 p R 8 = 1 72 p R 5 − 1 360 p R 8 .

Corollary 3.7

For u = 2 by applying the law of total probability on P θ in (3.8), the joint probability for maximum likelihood described in [10, 11, Theorem 2] can be recovered.

Remark 3.8

The two approaches in previous work [10, 11] and this work are independent and both allow us to derive a maximum likelihood key distinguisher. Here, we are not interested in the values manipulated by the multiplication and square operations, but only with the necessary and sufficient conditions for the existence of extra-reductions, allowing an analysis of larger dimensions.

4 Perfect and noisy measurements

The attacker gets access to side-channel information about each bit k i ( l − 2 ≥ i > 0 ) of the exponent k through the noised distribution of the pair of extra-reductions ( W i ( M ) , W i ( Q ) ) . The noise consists in two binary random variables ( N i ( M ) , N i ( Q ) ) . Additionally, the random variables N i ( M ) and N i ( Q ) are assumed independent and identically distributed (iid), as is usually the case of measurement noise of different operations in a side-channel trace. Namely, we denote by p noise the probability

p noise = P ( N i ( M ) = 1 ) = P ( N i ( Q ) = 1 ) for all i .

Thus, the attacker garners an iid sequence ( y i ( M ) ̲ , y i ( Q ) ̲ ) = ( y i ( M ) ; n , y i ( Q ) ; n ) n = 1 , … , N , where for each query n and exponent index i ∈ { l − 1 , … , 0 } , y i ( M ) ; n = x i ( M ) ; n ⊕ n i ( M ) ; n and y i ( Q ) ; n = x i ( Q ) ; n ⊕ n i ( Q ) ; n . This means that W i ( M ) and Y i ( M ) are, respectively, the input and the output of a binary symmetric channel (BSC) of parameter p noise . Similarly, W i ( Q ) and Y i ( Q ) are also input and output of an independent identical BSC parallel to the first one.

In practical cases, detecting an extra-reduction using only one acquisition can lead to errors. Let us model the attack setup, taking into account that the detection of presence/absence of extra-reductions is a random variable, due to some noise. The random variables Markov chain for index i is given as follows:

Secret	⟶	Bias	⟶	Observable
K = k i	⟶	( W i ( M ) , W i ( Q ) )	⟶	( Y i ( M ) , Y i ( Q ) ) = ( W i ( M ) ⊕ N i ( M ) , W i ( Q ) ⊕ N i ( Q ) ) .

The probabilities (3.8) depend on the unknown ratio p / R . The crucial observation is that the attacker knows the position of all squarings and all multiplications. Lemma 4.2 provides concrete formula, which allows us to estimate p / R . Of course, this estimation step is only necessary for RSA with CRT but not for RSA without CRT. We begin with a lemma, which will be needed.

Lemma 4.1

It is

(4.1) E ( W i ( M ) ) = P ( W i ( M ) = 1 ) = 1 4 ⋅ p R ,

(4.2) E ( W i ( Q ) ) = P ( W i ( Q ) = 1 ) = 1 3 ⋅ p R ,

(4.3) p R = 3 E ( W i ( Q ) ) = 2 E ( W i ( M ) ) + 1.5 E ( W i ( Q ) ) .

Proof

Since W i ( M ) and W i ( Q ) assume values in { 0 , 1 } the left-hand side equations in (4.1) and (4.2) are obvious, while the right-hand side equation follow immediately from (3.4) and (3.5), respectively. For k i = 0 , for instance,

P ( W i ( M ) = 1 ) = ∫ 0 1 ∫ 0 1 ∫ 0 s i + 1 , 0 s i + 1 , 1 p / R 1 d s i + 1 , 1 d s i + 1 , 0 d s i , 1 = 1 4 ⋅ p R .

We note that the probability (4.2) was already verified in [20]^[3] and, for instance, in [11], respectively, the latter by other mathematical methods. Formula (4.3) follows directly from (4.1) and (4.2).□

The ER-values w i ( M ) and w i ( Q ) are determined (or more precisely: guessed) on the basis of single-trace template attacks. In particular, their guesses w ˜ i ( M ) and w ˜ i ( Q ) might be incorrect with some probability. We denote the corresponding random variables (referring to the guessed ER values) by W ˜ i ( M ) and W ˜ i ( Q ) . In the following, we assume that

(4.4) P ( W ˜ i ( M ) = v ∣ W i ( M ) = 1 − v ) = P ( W ˜ i ( Q ) = v ∣ W i ( Q ) = 1 − v ) = p noise for i ∈ { 0 , … , l − 1 } and v ∈ { 0 , 1 } ,

and similarly for the initialization of the registers R 0 and R 1 in Algorithm 2. In other words, the probability of guessing an ER value incorrectly is p noise ≥ 0 , independently of the true value. Of course, p noise = 0 characterizes a perfect side-channel measurement. Lemma 4.2(iii) is the generalization of (4.3) for noisy measurements. As noted in Lemma 4.4, this allows the estimation of p / R and p noise .

Lemma 4.2

(4.5) p R = 12 E ( W ˜ i ( Q ) ) − 12 E ( W ˜ i ( M ) ) 1 + 6 E ( W ˜ i ( Q ) ) − 8 E ( W ˜ i ( M ) ) ,

(4.6) p noise = 4 E ( W ˜ i ( M ) ) − 3 E ( W ˜ i ( Q ) ) .

Proof

Since W ˜ i ( Q ) is { 0 , 1 } -valued, we obtain

E ( W ˜ i ( Q ) ) = P ( W ˜ i ( Q ) = 1 ) = P ( W ˜ i ( Q ) = 1 ∣ W i ( Q ) = 1 ) P ( W i ( Q ) = 1 ) + P ( W ˜ i ( Q ) = 1 ∣ W i ( Q ) = 0 ) P ( W i ( Q ) = 0 ) = ( 1 − p noise ) p 3 R + p noise 1 − p 3 R ,

and similarly

E ( W ˜ i ( M ) ) = P ( W ˜ i ( M ) = 1 ) = P ( W ˜ i ( M ) = 1 ∣ W i ( M ) = 1 ) P ( W i ( M ) = 1 ) + P ( W ˜ i ( M ) = 1 ∣ W i ( M ) = 0 ) P ( W i ( M ) = 0 ) = ( 1 − p noise ) p 4 R + p noise 1 − p 4 R .

Solving these equations for ( p / R ) and p noise yields (4.5) and (4.6).□

In Lemma 4.3, ( e 1 ( M ) , e 1 ( Q ) , … , e u ( M ) , e u ( Q ) ) ∈ { 0 , 1 } 2 u represents the “error vector” and ham corresponds to the Hamming weight of a value. The nonzero entries give the positions at which the guessed extra-reduction vector ( w ˜ i ( M ) , w ˜ i ( Q ) , … , w ˜ i − u + 1 ( M ) , w ˜ i − u + 1 ( Q ) ) are incorrect.

Lemma 4.3

(4.7) P θ → ( W ˜ i − j + 1 ( M ) = w ˜ i − j + 1 ( M ) , W ˜ i − j + 1 ( Q ) = w ˜ i − j + 1 ( Q ) ∣ j = 1 , … , u ) = ∑ 0 ≤ e j ( M ) , e j ( Q ) ≤ 1 1 ≤ j ≤ u P θ → W i − j + 1 ( M ) = w ˜ i − j + 1 ( M ) ⊕ e j ( M ) , W i − j + 1 ( Q ) = w ˜ i − j + 1 ( Q ) ⊕ e j − i + 1 ( Q ) ∣ j = 1 , … , u × p noise ham ( e 1 ( M ) , … , e u ( Q ) ) ( 1 − p noise ) 2 u − ham ( e 1 ( M ) , … , e u ( Q ) ) .
For each hypothesis θ → ∈ { 0 , 1 } u and each (guessed) extra-reduction vector ( w ˜ i ( M ) , w ˜ i ( Q ) , … , w ˜ i − u + 1 ( M ) , w ˜ i − u + 1 ( Q ) ) , we have
(4.8) P θ → ( W ˜ i ( M ) = w ˜ i ( M ) , … , W ˜ i − u + 1 ( Q ) = w ˜ i − u + 1 ( Q ) ) = P 1 → − θ → ( W ˜ i ( M ) = w ˜ i ( M ) , … , W ˜ i − u + 1 ( Q ) = w ˜ i − u + 1 ( Q ) ) .

Proof

The term p noise ham ( e 1 ( M ) , … , e u ( Q ) ) ( 1 − p noise ) 2 u − ham ( e 1 ( M ) , … , e u ( Q ) ) quantifies the probability for the error vector ( e 1 ( M ) , e 1 ( Q ) , … , e u ( M ) , e u ( Q ) ) . This fact and the definition of the conditional probability imply (4.7). Assertion (ii) follows immediately from (i) and Lemma 3.4(ii), applied to the particular right-hand probabilities in (4.7).□

The last lemma of this section explains how to estimate the ratio p / R and the probability p noise .

Lemma 4.4

Assume that the attacker has observed N side-channel traces. Then

(4.9) μ ˜ M ≔ 1 N l ∑ n = 1 N ∑ i = 0 l − 1 w ˜ i ( M ) ; n

provides an estimator for E ( W ˜ i ( M ) ; n ) and analogously

(4.10) μ ˜ Q ≔ 1 N l ∑ n = 1 N ∑ i = 0 l − 1 w ˜ i ( Q ) ; n

for E ( W ˜ i ( Q ) ; n ) . The index n refers to the numbering of the side-channel traces. (ii) Substituting μ ˜ M and μ ˜ Q for E ( W ˜ i ( M ) ; n ) and E ( W ˜ i ( Q ) ; n ) into (4.5) and (4.6) yields estimates p ∕ R ˜ and p ˜ noise .

(iii) For perfect measurements alternatively (4.3) might be used to estimate p / R . Compared to the mid-term the right-hand term considers twice as many MM and thus should provide a more precise estimate.

Proof

Straightforward.□

Example 4.5

(Estimation of p / R and p noise ) For different exponents of 512-bit length, we estimate p / R ˜ and p noise ˜ for two moduli (RSA-1024-p and RSA-1024-q defined in [11, Section 2.2]) and different values of p noise depending on the number of side-channel traces N . For each value of N between 0 and 500, we compute p / R ˜ using (4.5) and p noise ˜ using (4.6) for the different exponents and the found values are represented using a box plot (deciles/quartile/median values) in Figure 2.

$Figure 2 Statistic box plot to estimate the ratio p / R p\hspace{0.1em}\text{/}\hspace{0.1em}R and the probability p noise {p}_{{\rm{noise}}} in function of side-channel traces N {\mathcal{N}} using 1.000 randomly selected exponent values.$

Figure 2

Statistic box plot to estimate the ratio p / R and the probability p noise in function of side-channel traces N using 1.000 randomly selected exponent values.

$Figure 3 Success rate for an entire exponent using 1.000 randomly selected exponent values depending on the number of side-channel traces N {\mathcal{N}} with different noise probabilities p noise {p}_{{\rm{noise}}} : (a) p noise = 0.00 {p}_{{\rm{noise}}}=0.00 , (b) p noise = 0.10 {p}_{{\rm{noise}}}=0.10 , (c) p noise = 0.20 {p}_{{\rm{noise}}}=0.20 .$

Figure 3

Success rate for an entire exponent using 1.000 randomly selected exponent values depending on the number of side-channel traces N with different noise probabilities p noise : (a) p noise = 0.00 , (b) p noise = 0.10 , (c) p noise = 0.20 .

4.1 The optimal decision strategy

Lemma 4.6 provides the optimal decision strategy for the individual decisions, i.e., for guessing the parameter set { θ → , 1 − θ → } for the particular u -tuples ( k i , … , k i − u + 1 ) . The decision strategy exploits the information from the observed (guessed) ER-vectors from N side-channel traces. For p noise = 0 , Lemma 4.6 describes the situation in the case of perfect measurements.

Lemma 4.6

(Maximum likelihood estimator) Assume that the key k has been selected randomly and that the attacker has no information on the subkey ( k i , … , k i − u + 1 ) . Let

(4.11) θ → ˆ ≔ argmax θ → ∈ { 0 , 1 } u ∏ n = 1 N P θ → W ˜ i − j + 1 ( M ) ; n = w ˜ i − j + 1 ( M ) ; n , W ˜ i − j + 1 ( Q ) ; n = w ˜ i − j + 1 ( Q ) ; n ∣ j = 1 , … , u .

θ → ˆ maximizes the right-hand side of (4.11) iff 1 → − θ → ˆ maximizes the right-hand side of (4.11). It thus suffices to compute the right-hand term of (4.11) for all θ → ∈ { { 0 , 1 } u ∣ θ u = 0 } , or, without loss of generality, by fixing one arbitrary bit within { 0 , 1 } u .
The attacker decides for
(4.12) ( k i ⊕ k i − 1 = θ ˆ 1 ⊕ θ ˆ 2 , … , k i − u + 2 ⊕ k i − u + 1 = θ ˆ u − 1 ⊕ θ ˆ u ) .
This is the optimal decision strategy.

Proof

The first assertion of (i) follows from Lemma 3.4(ii), and the second is an immediate consequence of the first. With regard to the assumptions on k and on the subkey ( k i , … , k i − u + 1 ) we interpret the unknown subkey ( k i , … , k i − u + 1 ) as a realization of random variable, which is uniformly distributed on { 0 , 1 } u . Then ( k i ⊕ k i − 1 , … , k i − u + 2 ⊕ k i − u + 1 ) may be viewed as a realization of a random variable, which is uniformly distributed on { 0 , 1 } u − 1 . Furthermore, ( k i , … , k i − u + 1 ) ∈ { θ → , 1 → − θ → } iff ( k i ⊕ k i − 1 = θ i ⊕ θ i − 1 , … , k i − u + 2 ⊕ k i − u + 1 = θ i − u + 2 ⊕ θ i − u + 1 ) . Hence, (4.11) yields the maximum likelihood estimator for the transformed subkey ( k i ⊕ k i − 1 , … , k i − u + 2 ⊕ k i − u + 1 ) . If we assume that each false decision is equally bad the optimal decision strategy (Bayes strategy against the uniform distribution on { 0 , 1 } u − 1 , identical loss for all types of errors) is given by the maximum likelihood estimator, which completes the proof of Lemma 4.6.□

Remark 4.7

Lemma 4.6 assumes that p / R and p noise are known. Substituting p / R ˜ and p ˜ noise into (4.11) yields estimates for the probabilities
P θ → ( W ˜ i ( M ) = w ˜ i ( M ) , … , W ˜ i − u + 1 ( Q ) = w ˜ i − u + 1 ( Q ) ) .
In the proof of Lemma 4.6, we assume that ( k i , … , k i − u + 1 ) is a realization of a uniformly distributed random variable on { 0 , 1 } u . This assumption may not be justified for i = l − 2 and in particular not for i = u − 1 since k 0 = 1 . Since we are only interested in the distribution on { 0 , 1 } u − 1 (cf. equation (4.12)), this relaxes the uniformity condition.
In the proof of Lemma 4.6, we assume that each false decision is equally bad. This assumption is reasonable if all transformed subkeys
( k i ⊕ k i − 1 , … , k i − u + 2 ⊕ k i − u + 1 )
are treated independently.

4.2 Attack summary and success rate

The decision strategy in Lemma 4.6 is based on the observed extra-reductions for each multiply and square operation for N calls of the cryptographic operation with a static key k of l -bit length ( k l − 1 = 1 and k 0 = 1 , as described in Algorithm 2). For each 2 u -tuple of (noisily) observed extra reductions

( y ˜ i ( M ) , y ˜ i ( Q ) , … , y ˜ i − u + 1 ( M ) , y ˜ i − u + 1 ( Q ) ) ∈ { 0 , 1 } 2 u ,

the attacker estimates the θ i → value using the maximum likelihood estimator like described in Lemma 4.6 using only the probabilities P θ (for u = 2 and p noise = 0 the probabilities are given as polynomials in the ratio p / R in the informative appendix 7). Algorithm 3 permits us to retrieve the key bit values. It is a windowed algorithm, which recovers an estimation k ˆ of the secret key k by tuples of u bits. In Algorithm 3, i takes values ( u − 1 ) , 2 ( u − 1 ) , 3 ( u − 1 ) , etc. The first u -bit window considers the 2 u Montgomery operations, which depend on the key bits k u − 1 , … , k 0 . Due to Lemma 4.3, subsequent windows overlap in one bit position. Note that at lines 4 and 16 of Algorithm 3, the final value of i must be ( l − 2 ) , which might not be a multiple of ( u − 1 ) depending on the values of l and u . Thence, the final value of i is adjusted to be equal to ( l − 2 ) . In this case, the last window consists in bits of indices { l − 2 , … , l − u − 1 } , which overlaps the last but one window in more than one bit position. Alternatively, the final maximum likelihood can be computed for a smaller window (of length < u ). Our first proposal saves the computation of additional probabilities (step 3 of Algorithm 3), hence it is adopted in Algorithm 3, and put in force at lines 7 and 17.

The last steps of Algorithm 3 consist in putting together pieces of ( u − 1 ) bits of the key guess. Simple error correction can be applied at this stage, to fix easily one or two errors while rebuilding the full l bits of the secret exponent. For each trial only the loop from line 16 in Algorithm 3 has to be executed (with modified guesses θ → ˆ i for an index i = i 1 or for two indices i ∈ { i 1 , i 2 } ) and the Euclidean algorithm, which is not costly. We point out that in Definition 4.8 we do not allow any false decision for the particular u -bit windows for the sake of a fair comparison with the attacks in [10,11]. If we did so this would increase our success rate to some extent (and those in [10,11] as well).

Algorithm 3

	Input: ( w ˜ l − 2 ( M ) ̲ , w ˜ l − 2 ( Q ) ̲ , … , w ˜ 0 ( M ) ̲ , w ˜ 0 ( Q ) ̲ ) , a set of N ( l − 1 ) pairs of noisy bits (extra- reductions)
	Output: A guessed key value k ˆ ∈ { 0 , 1 } l
Attack phase
1	Estimate the ratio p / R and the probability p noise (by their estimated values p / R ˜ and p noise ˜ using Lemma 4.4)
2	for each θ → = ( θ 1 , … , θ u ) ∈ { 0 , 1 } u with θ u = 0 do
3	Compute the probability law P θ → ( W ˜ i ( M ) , W ˜ i ( Q ) , … , W ˜ i − u + 1 ( M ) , W ˜ i − u + 1 ( Q ) ) using the ratio p ∕ R ˜ and the value p noise ˜ by ( 4.7 )
4	for i = u − 1 up to l − 1 u − 1 ( u − 1 ) by step ( u − 1 ) do
5 6 7 8 9 10 11 12 13 14	If i > l − 2 then i ← l − 2 // The last window is left-justified on [ l − 2 , … , l − 2 − ( u − 1 ) ] = [ l − 2 , … , l − u − 1 ] for ( v 1 , … , v 2 u ) ∈ { 0 , 1 } 2 u do Accum ( v 1 , … , v 2 u ) ← 0 for n = 1 to N do Accum ( w ˜ i ( M ) ; n , w ˜ i ( Q ) ; n , … , w ˜ i − u + 1 ( M ) ; n , w ˜ i − u + 1 ( Q ) ; n ) ++ // incrementing observed entry of Accum by 1 for each θ → ∈ { 0 , 1 } u with θ u = 0 do T θ → ← 0 // (Non-normalized) log - likelihood for each tuple ( v 1 , v 2 , … , v 2 u − 1 , v 2 u ) ∈ { 0 , 1 } 2 u do T θ → ← T θ → + Accum ( v 1 , … , v 2 u ) × ln ( P θ → ( v 1 , v 2 , … , v 2 u − 1 , v 2 u ) ) // See Lemma 4.6 θ → ˆ i = ( θ ˆ i , 1 , … , θ ˆ i , u ) ← argmax θ → ( T θ → ) // see Lemma 4.6
Computation of the estimated key value
15	k ˆ 0 ← 1 , k ˆ l − 1 ← 1 // `by definition of the key (see Alg. 2)`
16	for i = u − 1 up to l − 1 u − 1 ( u − 1 ) by step ( u − 1 ) do
17 18	if i > l − 2 then i ← l − 2 // See line 6 of this Alg. 3 for j = 1 to u do k ˆ i − j + 1 ← θ ˆ i , j ⊕ k ˆ i − u + 1
19	return k ˆ = ( k ˆ l − 1 k ˆ l − 2 … k ˆ 0 ) 2

Optimal extra-reduction attack using maximum likelihood estimator

In order to compare the previous work and this optimized method, we compute the success rate of those attacks. In this article, we define the success rate of a whole exponent value.

Definition 4.8

(Success rate of an attack) The success rate of an attack is the number of succeeded attacks over the number of experiments. The attack succeeds when all the key bits of entire exponent are found. As a corollary, if only one bit is badly guessed, then the attack is considered to have failed.

For different exponents of 512-bit length, we estimate the success rate of the attack for the modulo (RSA-1024-q defined in [11, Section 2.2]), for different probabilities p noise and different values of u depending on the number of side-channel traces N . Figure 3 shows a comparison between the attack described in [10,11] and our method for u between 2 and 5. Here one can observe that our method for different u values increases significantly the success rate compared to the state-of-the-art method described in [10,11]. The number of side-channel traces needed to succeed the attack is divided by a factor greater than 2. More precisely, our new method recovers the key with probability 80 % using only ≈ 40% of the traces needed in [10,11]. This advantage does not depend on the size of the modulus p .

The gain obtained by the increasing u values is not significant.

4.3 The attack in the presence of several blinding techniques

We already know that base blinding (a.k.a. message blinding) does not prevent our attack. The reason is that our attack neither requires the knowledge of any register values R 0 and R 1 nor it needs chosen input values. In this section, we analyze the situation when in addition to base blinding either modulus blinding or exponent blinding is applied.

4.3.1 The combination of basis blinding with modulus blinding

In the first step, an odd modulus blinding factor r ∈ Z R F ; odd ≔ { z ∈ Z R F ∣ z is odd } is selected randomly, where R F = 2 F for a suitable exponent F , e.g., for F = 64 . The modular exponentiation is calculated modulo ( p r ) (instead of modulo p ), and the new Montgomery constant is the product R ∗ ≔ R R F in place of R . The input value (base) y is reduced modulo ( p r ) , yielding y p r , and then the product m ≡ y p r r B ( mod p r ) is computed for some random value r B ∈ Z p (base blinding). The result of the modular exponentiation, m k ( mod ( p r ) ) , is reduced modulo p , which yields m k ( mod p ) . Finally, the effect of the base blinding is annihilated by the multiplication with r B − k ( mod p ) , providing the desired output y k ( mod p ) = ( y ( mod p ) ) k ( mod p ) .

Remark 4.9

The modulus blinding factor r needs to be odd because Montgomery’s multiplication algorithm requires that the modulus is coprime to R ∗ .
Of course, the annihilating term r B − k ( mod p ) is not computed straightforward. First of all, this would be extremely inefficient, and further, k is a sensitive variable. Hence, it is better not to touch it more than necessary in computations. Hence, we recommended already to resort to a similar albeit less harmful strategy (cf. [13, §10]). If e denotes the public RSA exponent, then e k ≡ 1 mod φ ( p ) , and thus for r B ≔ ( r B ′ ) − e ( mod p ) (with randomly selected r B ′ ) we have r B − k ≡ ( r B ′ ) ( ( − e ) ( − k ) ) ≡ r B ′ ( mod p ) . Such blinding, applied to Montgomery ladder regular exponentiation using MM (i.e., Algorithm 2), is illustrated in Algorithm 4. (The affectation “ ← ℛ ” stands for uniformly random assignment.) Moreover, once a pair ( r B , r B − k ) has been found it can easily be updated by squaring both components modulo p [13, §10].
In this paper, we consider the case “first modulus blinding then base blinding.” This countermeasure is represented in Algorithm 5. We point out that reversed order, “first base blinding then modulus blinding,” can be attacked in the same way.

Algorithm 4

Input: m , k = ( k l − 1 k l − 2 … k 0 ) 2 , p ( k l − 1 = 1 and k 0 = 1 )

Output: m k mod p

1 r B ′ ← ℛ { 1 , 2 , … , p − 1 } // Nonzero base blinding factor

2 R 0 ← MM ( m ⋅ r B ′ , R 2 ; p ) // Basis blinding

3 R 1 ← MM ( R 0 , R 0 ; p ) // First Square

4 for i = l − 2 down to 0 do

5 6 R ¬ k i ← MM ( R 0 , R 1 ; p ) // i ( M ) R k i ← MM ( R k i , R k i ; p ) // i ( Q )

7 return MM ( R 0 , r B ′ − e mod p ; p ) // where e = k − 1 mod φ ( p )

Left-to-right Montgomery ladder exponentiation built on top of MM algorithm, with base blinding (attacked in this paper, in Section 4.1)

Algorithm 5

Input: m , k = ( k l − 1 k l − 2 … k 0 ) 2 , p ( k l − 1 = 1 and k 0 = 1 )

Output: m k mod p

1 r ← ℛ { 1 , 3 … , R F − 1 } // Odd modulus blinding factor

2 r B ′ ← ℛ { 1 , 2 , … , p − 1 } // Nonzero base blinding factor

3 R 0 ← MM ( m ⋅ r B ′ , ( R R F ) 2 ; p ⋅ r ) // Basis & modulus blinding

4 R 1 ← MM ( R 0 , R 0 ; p ⋅ r ) // First Square

5 for i = l − 2 down to 0 do

6 7 R ¬ k i ← MM ( R 0 , R 1 ; p ⋅ r ) // i ( M ) R k i ← MM ( R k i , R k i ; p ⋅ r ) // i ( Q )

8 return MM ( R 0 , r B ′ − e mod p ; p ⋅ r ) mod p // where e = k − 1 mod φ ( p )

Left-to-right Montgomery ladder exponentiation built on top of MM algorithm, with base and modulus blinding (attacked in this paper, in Section 4.3). (Throughout this algorithm, the MM algorithm uses R * ≔ R R F as the Montgomery constant.)

For the case that only base blinding (or even no blinding technique at all) is applied, Lemma 3.4 provides concrete formulae that the extra-reduction vector ( w i ( M ) , w i ( Q ) , … , w i − u + 1 ( M ) , w i − u + 1 ( Q ) ) occurs if the relevant part of the secret exponent, ( k i , … , k i − u + 1 ) , equals θ → . These probabilities are polynomials in the ratio β ≔ p / R . So far, the parameter β remained constant during the attack so that there was no need to mention it explicitly.

In this subsection, the ratio between the modulus and the Montgomery constant is no longer constant but depends on the selected modulus blinding value r . Hence, we extend the notation and write P θ → ; β ∗ ( W i ( M ) = w i ( M ) , … , W i − u + 1 ( Q ) = w i − u + 1 ( Q ) ) in place of P θ → ( W i ( M ) = w i ( M ) , … , W i − u + 1 ( Q ) = w i − u + 1 ( Q ) ) if β ∗ ≔ p r / R ∗ . For given modulus blinding factor r one has β ∗ = α β with α ≔ r / R F and β ≔ p / R .

However, the applied modulus blinding factor r is unknown. Relevant to our formulae is the normalized modulus blinding factor α ≔ r / R F . We interpret α as a realization of a random variable A , which assumes values in the finite set M A ≔ Z R F ; odd / R F ⊆ 0 , 1 . Then

(4.13) P θ → ( W i ( M ) = w i ( M ) , … , W i − u + 1 ( Q ) = w i − u + 1 ( Q ) ) = ∑ α ∈ M A P ( A = α ) P θ → ; α β ( W i ( M ) = w i ( M ) , … , W i − u + 1 ( Q ) = w i − u + 1 ( Q ) )

quantifies the probability for the extra-reduction vector ( w i ( M ) , … , w i − u + 1 ( Q ) ) under θ → with a randomly selected (normalized) modulus blinding factor α (selected according to the distribution of the random variable A ). The probabilities P θ → ; α β ( ⋅ ) are given by Lemma 3.4(i), and Assertion (ii) of Lemma 3.4 remains valid, too.

Usually the normalized blinding factors should be uniformly distributed on M A , i.e., each value in M A should occur with probability 1 / M A = 2 / R F . For typical parameters F (e.g., for F = 64 ), the right-hand side of (4.13) can be replaced by

(4.14) P θ → ( W i ( M ) = w i ( M ) , … , W i − u + 1 ( Q ) = w i − u + 1 ( Q ) ) = ∫ 0 1 P θ → ; α β ( W i ( M ) = w i ( M ) , … , W i − u + 1 ( Q ) = w i − u + 1 ( Q ) ) d α .

For reasonable parameter F , the deviation of the right-hand term from the exact probability (4.13) is negligible, which should justify the “=” sign. The evaluation of the integral is fairly easy since the integrand is a polynomial in ( α β ) . In fact, for the integrand ∑ j γ j ( α β ) j the integral equals ∑ j γ j β j / ( j + 1 ) . Another protection strategy would be to select modulus blinding factors uniformly in Z R F ; odd ∩ 2 F − 1 , 2 F so that all blinding factors have identical (maximal) length. In this case, A assumes each value in M A ∩ 0.5 , 1 with probability 4 / R F , and (4.13) can be expressed by

(4.15) P θ → ( W i ( M ) = w i ( M ) , … , W i − u + 1 ( Q ) = w i − u + 1 ( Q ) ) = 2 ∫ 0.5 1 P θ → ; α β ( W i ( M ) = w i ( M ) , … , W i − u + 1 ( Q ) = w i − u + 1 ( Q ) ) d α .

In analogy to Section 4, the next step is to estimate β and p noise . The equivalents to (4.1) and (4.2) are

(4.16) E ( W i ( M ) ) = P ( W i ( M ) = 1 ) = ∑ α ∈ M A P ( A = α ) 1 4 α β = E ( A ) β 4 ,

(4.17) E ( W i ( Q ) ) = P ( W i ( Q ) = 1 ) = ∑ α ∈ M A P ( A = α ) 1 3 α β = E ( A ) β 3 .

Substituting P ( W i ( M ) = 1 ) and P ( W i ( Q ) = 1 ) in the proof of Lemma 4.2 by the right-hand terms of (4.16) and of (4.17) (in place of (4.1) and (4.2)) yields equivalents to the formulae (4.5) and (4.6) for the modulus blinding case. Note that the conditional probabilities P ( W ˜ i ( Q ) = i ∣ W i ( Q ) = j ) and P ( W ˜ i ( M ) = i ∣ W i ( M ) = j ) depend only on p noise but not on α or β .

More precisely, a careful computation yields

(4.18) p R = 1 E ( A ) ⋅ 12 E ( W ˜ i ( Q ) ) − 12 E ( W ˜ i ( M ) ) 1 + 6 E ( W ˜ i ( Q ) ) − 8 E ( W ˜ i ( M ) ) ,

(4.19) p noise = 4 E ( W ˜ i ( M ) ) − 3 E ( W ˜ i ( Q ) ) .

The right-hand side of (4.18) differs from (4.5) by the factor 1 / E ( A ) , while (4.19) coincides with (4.6)

Above we have identified two strategies for the selection of modulus blinding factors, which are of particular interest. If A is uniformly distributed on M A , then E ( A ) = ∫ 0 1 α d α = 0.5 . Similarly, if A is uniformly distributed on M A ∩ 0.5 , 1 , then E ( A ) = 2 ∫ 0.5 1 α d α = 0.75 .

Substituting (4.13) (resp., (4.14) or (4.15)) into Lemma 4.3(i) yields analogous assertions for the modulus blinding case. The estimation of μ ˜ M and μ ˜ S is done as in Lemma 4.4. For different power traces, the blinding factors are selected independently according to the same distribution so that the normalized blinding factors α 1 , α 2 , … for the power traces 1 , 2 , … may be interpreted as realizations of iid random variables A 1 , A 2 , … , where A j is distributed as A . With the aforementioned considerations and Lemma 4.6 also applies to the modulus blinding scenario when P θ → ( W ˜ i ( M ) = w ˜ i ( M ) , … , W ˜ i − u + 1 ( Q ) = w ˜ i − u + 1 ( Q ) ) is calculated as in (4.7), combined with (4.13). Usually, the latter should coincide with (4.14) or (4.15).

Altogether, modulus blinding does not prevent our attack. For power trace j it yet reduces its efficiency since ( p r j ) / ( R R F ) = α j β < β , which lowers the probability for extra-reductions. Moreover, the applied blinding factor r j is unknown, which results in averaged probabilities (4.13). Both can be compensated by increasing the sample size.

Remark 4.10

Alternatively to the attack just analyzed one might estimate the product α j β separately for all power traces with formula (4.5), whereas (as above) p noise is estimated only once at the beginning of the attack on the basis of all N power traces. The intention is to reduce the loss of efficiency caused by the use of averaged probabilities (4.13). Lemma 4.6 then could be applied as in Sections 3.2–4.1 with individual parameters α j β for each power trace. On the negative side, the estimates of the products α j β are less precise than the estimate of β in the scenario without modulus blinding since μ ˜ M and μ ˜ S depend only on the MM of single power traces, which undermines the intention of this attack variant.

4.3.2 Experimental results with modulus blinding

Figure 4 compares the success rate evolution of our attack, using (4.14), for the same three noise levels as in Figure 3, for F ∈ { 8 , 16 , 32 , 64 } with modulus randomization uniformly distributed in interval 0 , 2 F .

$Figure 4 Success rate for an entire exponent depending on the number of side-channel trace N {\mathcal{N}} for different values of probability p noise {p}_{{\rm{noise}}} and for modulus randomization on F F bits, for F ∈ { 8 , 16 , 32 , 64 } F\in \left\{8,16,32,64\right\} and modulus randomization uniform in 0 , 2 F \left[0,{2}^{F}\right) .$

Figure 4

Success rate for an entire exponent depending on the number of side-channel trace N for different values of probability p noise and for modulus randomization on F bits, for F ∈ { 8 , 16 , 32 , 64 } and modulus randomization uniform in 0 , 2 F .

It can be seen that the value of F does not really impact on the success rate of the attack, which is in line with (4.14) and (4.15). It is corroborated by the fact that the attack success rate in the case of a modulus randomization factor uniformly distributed in 2 F − 1 , 2 F does not change significantly, by adapting the attack with (4.15). These success rates are shown in Figure 5. Note that the ratio p / R is the same in the results from Figures 4 and 5, because the modulus p (on 512 bits) is the same and the Montgomery constant is also the same, namely, R ∗ = R R F = 2 512 + F .

$Figure 5 Success rate for an entire exponent depending of the number of side-channel trace N {\mathcal{N}} for different values of probability p noise {p}_{{\rm{noise}}} and for modulus randomization on F F bits, for F ∈ { 8 , 16 , 32 , 64 } F\in \left\{8,16,32,64\right\} and modulus randomization uniform in 2 F − 1 , 2 F \left[{2}^{F-1},{2}^{F}\right) : (a) p noise = 0.00 {p}_{{\rm{noise}}}=0.00 , (b) p noise = 0.10 {p}_{{\rm{noise}}}=0.10 , (c) p noise = 0.20 {p}_{{\rm{noise}}}=0.20 .$

Figure 5

Success rate for an entire exponent depending of the number of side-channel trace N for different values of probability p noise and for modulus randomization on F bits, for F ∈ { 8 , 16 , 32 , 64 } and modulus randomization uniform in 2 F − 1 , 2 F : (a) p noise = 0.00 , (b) p noise = 0.10 , (c) p noise = 0.20 .

The success rate for some modulus randomization factors F < 8 could be derived from the exact formula (4.13). However, one shall take care that such small blinding factors should be of no practical relevance. For instance,

when F = 2 , there exists only two eligible random numbers, namely 1 and 3;
when F = 3 , the only four eligible random numbers are { 1 , 3 , 5 , 7 } ;
when F = 4 , the only eight eligible random numbers are { 1 , 3 , 5 , 7 , 9 , 11 , 13 , 15 } .

If furthermore we demand that the blinding factors have full bit length (which corresponds to (4.15)) the situation is even worse. The sets then reduce to { 3 } , { 5 , 7 } , and { 9 , 11 , 13 , 15 } , respectively. However, such little sets of admissible modular blinding factors might allow other, even stronger attacks. Interestingly, the attacks work about with the same success rate as the original attacks [10,11] before our improvement in the absence of modulus blinding.

4.3.3 The combination of basis blinding with exponent blinding

Assume that base blinding is combined with (additive) exponent blinding, which means that the exponent k is replaced by k + r E φ ( p ) for some randomly selected exponent blinding factor r E ∈ Z 2 E . Our attack cannot be transferred to this situation since (4.11) assumes that θ → is the same for all N power traces.

It should be noted, however, that if (e.g.) single-trace template attacks provide significant advantage over blind guessing of the exponent bits k i a successful attack may be possible anyway; see [27,28], for example, for details. The techniques developed in [27] obviously apply to the Montgomery ladder as well. The knowledge of the extra-reductions alone does not yet give sufficient advantage over blind guessing for single power traces. Sufficient advantage might be achieved by exploiting further features of the power traces but this is not within the scope of this paper.

5 Countermeasures

In Table 1 and in Section 4.3, several countermeasures were addressed and analyzed. In particular, even the combination of base blinding and exponent blinding does not prevent our attack. An option is to add exponent blinding, resulting in the combination (base blinding and exponent blinding) or in (base blinding, modulus blinding, and exponent blinding). In the absence of additional leakage, to our best knowledge no attack is known (Section 4.3).

The most solid solution, of course, is to avoid extra-reductions at all. Following an idea of C. Walter one can completely resign on extra-reductions if the Montgomery constant R is not only larger than p but if R > 4 p [29], Theorems 3 and 6. In this case, the intermediate values of the Montgomery operations within the exponentiation algorithm are always between 0 , 2 p but they do not “explode.” Currently, OpenSSL library uses another strategy. Indeed, most security standards prescribe that p be chosen with a size which is a multiple of the machine word size (typically 1024, 2048, 3072, and 4096 bits, which are all multiple of 32 and even 64 bits). Therefore, the abovementioned strategy of C. Walter requires that an extra limb (machine word encoding on radix in the representation of a big number) shall be allocated for each intermediate variable, which is considered too high an overhead. For this reason, OpenSSL disguises the extra-reduction in a constant time SLA, a technique mentioned already in Section 2.1. Namely, a mask m of size l bits ( l = ⌈ log 2 p ⌉ is the size of the modulus) is computed to be equal to ( 1 … 1 ) 2 (i.e., 0xFF...FF in hexadecimal) when an extra-reduction is required or to ( 0 … 0 ) 2 (i.e., 0x00...00 in hexadecimal) when no extra-reduction is needed. Subsequently, the quantity m ∧ p (word obtained by bitwise logical AND of bits from m and p ) is subtracted from the result of the MM. This quantity is either 0 or p , depending on whether an extra-reduction is needed or not. This strategy implements an SLA. Such coding style is, as of today, believed secure against cache-timing attacks, because the control flow is data independent. However, the authors warn that the strategy of OpenSSL might not hide perfectly the extra-reduction if the attacker is able to partition power or electromagnetic side-channel traces based on the value of m ∧ p , since the absence of extra-reduction involves a remarkable subtraction with a big number equal to zero. Such bias has already been exploited in the past by attacks such as the Refined Power Analysis [12] or the Zero Power Analysis [4]. Note that OpenSSL is nowadays used in embedded systems (microcontrollers, internet of things devices, smartphones [5,18], etc.), which are indeed attackable with power and electromagnetic side-channel analyses.

6 Conclusion

In [10,11], ERA exploiting the dependency of two consecutive MMs was applied to attack RSA implementations, which use the Montgomery ladder or the always square and multiply exponentiation algorithm. Basis blinding does not prevent this attack. Although both attacks were successful they did not exploit all the available information. In this paper, we followed the strategy in [1,2,21], formulated, and analyzed a stochastic process, which was tailored to the stochastic behavior of the extra-reductions in Montgomery ladder. This sophisticated strategy allowed us to exploit all the given information in an optimal way. Practical experiments underlined that the new method reduces the sample size by a factor greater than 2 (to ≈ 40% of the original sample size). Our new attack can directly be transferred to the always square and multiply algorithm. Moreover, we presented a generalization of our attack, which cannot even be prevented by combination of base blinding with modulus blinding. This generalization of our attack is efficient, too.

Acknowlegments

This work has benefited from a partial funding via TeamPlay (https://teamplay-h2020.eu/), a project from European Union’s Horizon 2020 research and innovation program, under grant agreement no. 779882. The analysis methods have been integrated into Secure-IC Laboryzr tools.

sylvain.guilley@telecom-paristech.fr

Appendix A

Tables A.1 and A.2 contain all probabilities for u = 2 (with or without base blinding/no modulus blinding); cf. Lemma 3.4(i). These values have been computed with SageMath (http://www.sagemath.org/) computer algebra system, using formal computations and simplifications.

Table A.1

P θ ( W i ( M ) = w i ( M ) , W i ( Q ) = w i ( Q ) , W i − 1 ( M ) = w i − 1 ( M ) , W i − 1 ( Q ) = w i − 1 ( Q ) ) for θ → ∈ { ( 0 , 0 ) , ( 1 , 1 ) } (corresponds to the case k i = k i − 1 ) (with or without base blinding/no modulus blinding)

( w i ( M ) , w i ( Q ) )	( w i − 1 ( M ) , w i − 1 ( Q ) )
	(0,0)	(0,1)	(1,0)	(1,1)
(0,0)	1 − 7 6 p R + 1 3 p R 2 + 7 90 p R 3 + 17 504 p R 4 − 11 336 p R 5 − 1 72 p R 6 + 1 264 p R 8	1 3 p R − 5 24 p R 2 − 17 504 p R 4 + 1 48 p R 5 + 1 72 p R 6 − 1 264 p R 8	1 4 p R − 1 8 p R 2 − 7 90 p R 3 + 1 72 p R 4 + 1 84 p R 5 + 1 72 p R 6 − 1 264 p R 8	1 8 p R 2 − 1 72 p R 4 − 1 72 p R 6 + 1 264 p R 8

(0,1)	1 3 p R − 1 8 p R 2 − 1 20 p R 3 − 1 21 p R 4 + 11 336 p R 5 + 1 72 p R 6 − 1 264 p R 8	1 21 p R 4 − 1 48 p R 5 − 1 72 p R 6 + 1 264 p R 8	1 20 p R 3 − 1 84 p R 5 − 1 72 p R 6 + 1 264 p R 8	1 72 p R 6 − 1 264 p R 8

(1,0)	1 4 p R − 5 24 p R 2 − 1 36 p R 3 + 1 72 p R 4 + 11 336 p R 5 − 1 264 p R 8	1 12 p R 2 − 1 72 p R 4 − 1 48 p R 5 + 1 264 p R 8	1 36 p R 3 − 1 72 p R 4 − 1 84 p R 5 + 1 264 p R 8	1 72 p R 4 − 1 264 p R 8

(1,1)	1 8 p R 2 − 11 336 p R 5 + 1 264 p R 8	1 48 p R 5 − 1 264 p R 8	1 84 p R 5 − 1 264 p R 8	1 264 p R 8

Table A.2

P θ ( W i ( M ) = w i ( M ) , W i ( Q ) = w i ( Q ) , W i − 1 ( M ) = w i − 1 ( M ) , W i − 1 ( Q ) = w i − 1 ( Q ) ) for θ → ∈ { ( 0 , 1 ) , ( 1 , 0 ) } (corresponds to the case k i ≠ k i − 1 ) (with or without base blinding/no modulus blinding)

( w i ( M ) , w i ( Q ) )	( w i − 1 ( M ) , w i − 1 ( Q ) )
	(0,0)	(0,1)	(1,0)	(1,1)
(0,0)	1 − 7 6 p R + 13 36 p R 2 + 7 90 p R 3 − 1 240 p R 4 − 13 504 p R 5 − 1 200 p R 6 + 1 360 p R 8	1 3 p R − 17 72 p R 2 + 1 240 p R 4 + 1 72 p R 5 + 1 200 p R 6 − 1 360 p R 8	1 4 p R − 1 8 p R 2 − 7 90 p R 3 + 1 40 p R 4 + 1 84 p R 5 + 1 200 p R 6 − 1 360 p R 8	1 8 p R 2 − 1 40 p R 4 − 1 200 p R 6 + 1 360 p R 8

(0,1)	1 3 p R − 17 72 p R 2 − 1 20 p R 3 + 1 40 p R 4 + 13 504 p R 5 − 1 360 p R 8	1 9 p R 2 − 1 40 p R 4 − 1 72 p R 5 + 1 360 p R 8	1 20 p R 3 − 1 40 p R 4 − 1 84 p R 5 + 1 360 p R 8	1 40 p R 4 − 1 360 p R 8
(1,0)	1 4 p R − 1 8 p R 2 − 1 36 p R 3 − 1 48 p R 4 + 13 504 p R 5 + 1 200 p R 6 − 1 360 p R 8	1 48 p R 4 − 1 72 p R 5 − 1 200 p R 6 − 1 360 p R 8	1 36 p R 3 − 1 84 p R 5 − 1 200 p R 6 − 1 360 p R 8	1 200 p R 6 − 1 360 p R 8

(1,1)	1 8 p R 2 − 13 504 p R 5 + 1 360 p R 8	1 72 p R 5 − 1 360 p R 8	1 84 p R 5 − 1 360 p R 8	1 360 p R 8

References

[1] Acıiçmez O, Schindler W. A major vulnerability in RSA implementations due to microarchitectural analysis threat. IACR Cryptology ePrint Archive 2007;2007:336. Search in Google Scholar

[2] Acıiçmez O, Schindler W. A vulnerability in RSA implementations due to instruction cache analysis and its demonstration on OpenSSL. In: Malkin T, editor. Topics in Cryptology – CT-RSA 2008, Proceedings of the Cryptographers’ Track at the RSA Conference 2008, San Francisco, CA, USA, April 8–11, 2008. Lecture Notes in Computer Science, vol. 4964, Springer; 1997. p. 256–73. 10.1007/978-3-540-79263-5_16Search in Google Scholar

[3] Acıiçmez O, Schindler W, Koç ÇK. Improving Brumley and Boneh timing attack on unprotected SSL implementations. In: Atluri V, Meadows C, Juels A, editors. Proceedings of the 12th ACM Conference on Computer and Communications Security, CCS 2005, Alexandria, VA, USA, November 7–11, 2005. ACM; 2005. p. 139–46. 10.1145/1102120.1102140Search in Google Scholar

[4] Akishita T, Takagi T. Zero-value point attacks on elliptic curve cryptosystem. In: Boyd C, Mao W, editors. ISC, Lecture Notes in Computer Science, vol. 2851. Springer; 2003. p. 218–33. 10.1007/10958513_17Search in Google Scholar

[5] Alam M, Khan HA, Dey M, Sinha N, Callan RL, Zajic AG, et al . One&Done: A single-decryption EM-based attack on OpenSSL’s constant-time blinded RSA. In: Enck W, Porter Felt A, editors. 27th USENIX Security Symposium, USENIX Security 2018, Baltimore, MD, USA, August 15–17, 2018, USENIX Association; 2018. p. 585–602. Search in Google Scholar

[6] Arnaud C, Fouque PA. Timing attack against protected RSA-CRT implementation used in polarssl. In: Dawson E, editor. Topics in Cryptology - CT-RSA 2013. Proceedings of the Cryptographers’ Track at the RSA Conference 2013, San Francisco, CA, USA, February 25-March 1, 2013. Lecture Notes in Computer Science, vol. 7779. Springer; 2013. p. 18–33. 10.1007/978-3-642-36095-4_2Search in Google Scholar

[7] Baek YJ, Vasyltsov I, How to prevent DPA and fault attack in a unified way for ECC scalar multiplication - ring extension method. In: Dawson Ed, Wong SW, editors. Information Security Practice and Experience, Lecture Notes in Computer Science, vol. 4464. Berlin Heidelberg: Springer; 2007. p. 225–37. 10.1007/978-3-540-72163-5_18Search in Google Scholar

[8] Brumley D, Boneh D. Remote timing attacks are practical. In: Proceedings of the 12th USENIX Security Symposium, Washington, D.C., USA, USENIX Association; August 4–8, 2003. 10.1016/j.comnet.2005.01.010Search in Google Scholar

[9] Dhem J-F, Koeune F, Leroux P-A, Mestré P, Quisquater J, Willems J-L. A practical implementation of the timing attack. In: Quisquater J-J, Schneier B, editors. Smart Card Research and Applications, Proceedings of the International Conference, CARDIS ’98, Louvain-la-Neuve, Belgium, September 14–16, 1998, Lecture Notes in Computer Science, vol. 1820. Springer; 1998p. 167–82. 10.1007/10721064_15Search in Google Scholar

[10] Dugardin M, Guilley S, Danger J-L, Najm Z, Rioul O. Correlated extra-reductions defeat blinded regular exponentiation. In: Gierlichs B, Poschmann AY, editors. Cryptographic Hardware and Embedded Systems - Proceedings of the CHES 2016 - 18th International Conference, Santa Barbara, CA, USA, August 17–19, 2016, Lecture Notes in Computer Science, vol. 9813. Springer; 2016. p. 3–22. 10.1007/978-3-662-53140-2_1Search in Google Scholar

[11] Dugardin M, Guilley S, Danger J-L, Najm Z, Rioul O. Correlated extra-reductions defeat blinded regular exponentiation - extended version. Cryptology ePrint Archive, Report 2016/597, June 6 2016. http://eprint.iacr.org/2016/597Search in Google Scholar

[12] Goubin L. A refined power-analysis attack on elliptic curve cryptosystems. In: Proceedings of the 6th International Workshop on Theory and Practice in Public Key Cryptography: Public Key Cryptography, PKC ’03. London, UK: Springer-Verlag; 2003. p. 199–210. 10.1007/3-540-36288-6_15Search in Google Scholar

[13] PC Kocher Timing attacks on implementations of diffie-hellman, RSA, DSS, and other systems. In: Koblitz N, editor. Advances in Cryptology - CRYPTO ’96, Proceedings of the 16th Annual International Cryptology Conference, Santa Barbara, California, USA, August 18–22, 1996, Lecture Notes in Computer Science, vol. 1109. Springer; 1996. p. 104–13. 10.1007/3-540-68697-5_9Search in Google Scholar

[14] Kocher PC, Jaffe J, Jun B. Differential power analysis. In: Wiener MJ, editor. CRYPTO, Lecture Notes in Computer Science, vol. 1666. Springer; 1999. p. 388–97. 10.1007/3-540-48405-1_25Search in Google Scholar

[15] Menezes AJ, van Oorschot PC, Vanstone SA. Handbook of Applied Cryptography. CRC Press; October 1996. http://www.cacr.math.uwaterloo.ca/hac/. Search in Google Scholar

[16] Montgomery PL. Modular multiplication without trial division. Math Comput. April 1985;44(170):519–21. 10.1090/S0025-5718-1985-0777282-XSearch in Google Scholar

[17] Montgomery PL. Modular multiplication without trial division. Math Comput. 1985;44(170): 519–21. 10.1090/S0025-5718-1985-0777282-XSearch in Google Scholar

[18] Nakano Y, Souissi Y, Nguyen R, Sauvage L, Danger J-L, Guilley S, Kiyomoto S, et al . A pre-processing composition for secret key recovery on android smartphone. In: Naccache D, Sauveron D, editors. Information Security Theory and Practice. Proceedings of the Securing the Internet of Things - 8th IFIP WG 11.2 International Workshop, WISTP 2014, Heraklion, Crete, Greece, June 30–July 2, 2014. Lecture Notes in Computer Science, vol. 8501. Springer; 2014. p. 76–91. 10.1007/978-3-662-43826-8_6Search in Google Scholar

[19] Perin G, Imbert L, Torres L, Maurine P. Attacking randomized exponentiations using unsupervised learning. In: Prouff E, editor. Constructive Side-Channel Analysis and Secure Design - 5th International Workshop, COSADE 2014, Paris, France, April 13–15, 2014. Revised Selected Papers, Lecture Notes in Computer Science, vol. 8622. Springer; 2014. p. 144–60. 10.1007/978-3-319-10175-0_11Search in Google Scholar

[20] Schindler W. A timing attack against RSA with the Chinese remainder theorem. In: Koç ÇK, Paar C, editors. Cryptographic Hardware and Embedded Systems - Proceedings of the CHES 2000, Second International Workshop, Worcester, MA, USA, August 17–18, 2000, Lecture Notes in Computer Science, vol. 1965. Springer; 2000. p. 109–24. 10.1007/3-540-44499-8_8Search in Google Scholar

[21] Schindler W. A combined timing and power attack. In: Naccache D, Paillier P, editors. Public key cryptography, Proceedings of the 5th International Workshop on Practice and Theory in Public Key Cryptosystems, PKC 2002, Paris, France, February 12–14, 2002, Lecture Notes in Computer Science, vol. 2274. Springer; 2002. p. 263–79. 10.1007/3-540-45664-3_19Search in Google Scholar

[22] Schindler W. Optimized timing attacks against public key cryptosystems. Statistics and Decisions 2002;20(2):191–210. 10.1524/strm.2002.20.14.191Search in Google Scholar

[23] Schindler W. Exclusive exponent blinding may not suffice to prevent timing attacks on RSA. In: Güneysu T, Handschuh H, editors. Cryptographic Hardware and Embedded Systems - CHES 2015 - Proceedings of the 17th International Workshop, Saint-Malo, France, September 13–16, 2015, Lecture Notes in Computer Science, vol. 9293. Springer; 2015. p. 229–47. 10.1007/978-3-662-48324-4_12Search in Google Scholar

[24] Schindler W. Exclusive exponent blinding is not enough to prevent any timing attack on RSA. J. Cryptographic Eng. 2016;6(2):101–19. 10.1007/s13389-016-0124-7Search in Google Scholar

[25] Schindler W, Koeune F, Quisquater J-J. Improving divide and conquer attacks against cryptosystems by better error detection / correction strategies. In: Honary B, editor. Cryptography and Coding, Proceedings of the 8th IMA International Conference, Cirencester, UK, December 17–19, 2001, Lecture Notes in Computer Science, vol. 2260. Springer; 2001. p. 245–67. 10.1007/3-540-45325-3_22Search in Google Scholar

[26] Schindler W, Walter CD. More detail for a combined timing and power attack against implementations of RSA. In: Paterson KG, editor. Cryptography and Coding, Proceedings of the 9th IMA International Conference, Cirencester, UK, December 16–18, 2003, Lecture Notes in Computer Science, vol. 2898. Springer; 2003. p. 245–63. 10.1007/978-3-540-40974-8_20Search in Google Scholar

[27] Schindler W, Wiemers A. Power attacks in the presence of exponent blinding. J. Cryptographic Eng. 2014;4(4):213–36. 10.1007/s13389-014-0081-ySearch in Google Scholar

[28] Schindler W, Wiemers A. Generic power attacks on RSA with CRT and exponent blinding: new results. J. Cryptographic Eng. 2017;7(4):255–72. 10.1007/s13389-016-0146-1Search in Google Scholar

[29] Walter CD. Precise bounds for montgomery modular multiplication and some potentially insecure RSA moduli. In: Preneel B, editor. Topics in Cryptology - CT-RSA 2002, Proceedings of the Cryptographer’s Track at the RSA Conference, 2002, San Jose, CA, USA, February 18–22, 2002, Lecture Notes in Computer Science, vol. 2271. Springer; 2002. p. 30–9. 10.1007/3-540-45760-7_3Search in Google Scholar

[30] Walter CD, Thompson S. Distinguishing exponent digits by observing modular subtractions. In: Naccache D, editor. Topics in Cryptology - CT-RSA 2001, Proceedings of the Cryptographer’s Track at RSA Conference 2001, San Francisco, CA, USA, April 8–12, 2001, Lecture Notes in Computer Science, vol. 2020. Springer; 2001. p. 192–207. 10.1007/3-540-45353-9_15Search in Google Scholar

Received: 2020-03-01

Revised: 2020-12-19

Accepted: 2021-02-26

Published Online: 2021-04-20

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/jmc-2020-0010

Keywords for this article

RSA; Montgomery multiplication; side-channel analysis; optimization; extra-reduction; regular exponentiation; maximum likelihood; base blinding; modulus blinding

Creative Commons

BY 4.0