On the independence heuristic in the dual attack

Kaveh Bashiri; Andreas Wiemers

doi:10.1515/jmc-2024-0028

40% Rabatt

auf Fachbücher bei De Gruyter Brill *

Artikel Open Access

On the independence heuristic in the dual attack

Kaveh Bashiri und Andreas Wiemers

Veröffentlicht/Copyright: 3. Juli 2025

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen

Aus der Zeitschrift Journal of Mathematical Cryptology Band 19 Heft 1

Abstract

Post-quantum cryptography deals with the development and analysis of cryptographic schemes that are assumed to be secure even against attackers with access to a powerful quantum computer. Along the main candidates for quantum-safe solutions are cryptographic schemes, whose security is based on classic lattice problems such as the bounded-distance decoding (BDD) problem or the learning with error problem. In this work, we contribute to the analysis of an attack category against these problems called dual attack. In recent years, a lot of notable progress was achieved in this topic. Our first contribution is to provide theoretical counterarguments against a so-called independence assumption, which was used in earlier works on this attack, and which was shown in a previous work to be contradicting practical experiments. Then, we provide estimates on the success probability and the cost of the dual attack against the decisional version of the BDD problem. These estimates are derived both rigorously and heuristically. Finally, we also provide experimental evidence that confirms these results.

Keywords: dual attack; learning with errors; lattices; cryptanalysis

MSC 2010: 06B99; 94A60

1 Introduction

In recent years, much research has been done on the lattice problems called decisional bounded-distance decoding (BDD) problem or the closely related learning with error (LWE) problem. This is due to the fact that many post-quantum crypto schemes rely on their security. As a result, many attacking schemes against the BDD and LWE problem have been established. For instance, there are the algebraic attacks [1], the combinatoric attacks [2], or the primal lattice attacks [3]. The latter is based on sampling short vectors in the primal lattice Λ , i.e., the lattice, where the BDD or LWE problem is defined on. Another notable way to attack this problem is via the so-called dual attack, on which we will focus in this work, and which is based on sampling short vectors in the dual lattice. The main idea here is as follows. We are given a sample t , from which we know that it is

either originated from a BDD-sample
or a uniformly distributed random-sample.

Then, short vectors from the dual lattice are sampled in order to compute a statistical quantity, the so-called score function. Then, depending on the result of the score function, we make a decision, which case is true, i.e., whether t came from a BDD-sample or a random-sample. If the guess is correct, this in turn breaks the decisional version of the BDD (or LWE) problem. A very important ingredient in this attack is to exploit the fact that lattice sieving algorithms output not only the shortest vector but exponentially many short vectors.

The main idea of the dual attack originated in coding theory. The first time these ideas appeared in lattice theory seems to be in the classic paper [4]. In recent years, many seminal results on the dual attack have then been established, where this idea is exploited see e.g. [5–9]). However, many recent advances in this direction rely on an independence assumption. Ducas and Pulles [7] especially reported on experiments they made comparing the distributions of scores for random-samples and BDD-samples. They discovered that the distribution of scores for BDD-samples deviates from the predictions made under this independence assumption.

1.1 Main results

In this paper we provide the following contributions.

We first theoretically show that the independence assumption cannot be true.
Then, under certain assumptions on the distribution of the dual vectors, we provide rigorous estimates for the success probability of the abovementioned strategy. Moreover, as a byproduct, we also obtain a cost estimate in terms of the number of dual vectors that are needed for such a successful distinction. In this part of the paper we make use of conditional expectations and rely on techniques that are inspired from the work by Pouly and Shen [9].
We then additionally provide a more intuitive approach to derive these results by relying on a central limit theorem heuristic. That is, here we again derive estimates for the success probability and the number of needed dual vectors for the dual attack; however, this time using an intuitive, heuristic approach. We believe that both approaches, the rigorous one and the intuitive one, are essential for a full understanding of the subject; the beginning of Section 5 gives a more detailed explanation on why we believe that both approaches are important.
Finally, we provide experimental evidence that the just mentioned central limit theorem heuristics indeed hold true. In this way, we verify experimentally our heuristically derived results, which in turn verify the rigorously derived results.

1.2 Outlook

This work is organized as follows. In Section 2, we introduce some preliminary notions that we will need. In Section 3, we compute certain covariances that reveal the incompleteness of the independence assumption. In Section 4, we provide rigorous estimates for the success probability and the cost of the dual attack. In Section 5, we show that these estimates also hold true under certain intuitively justified heuristics. Finally, in Section 6, we provide experimental evidence for our results.

2 Preliminaries

This section is organized as follows. First we provide a glimpse into lattice theory in Section 2.1. Then, we introduce the BDD problem in Section 2.2 and describe the main framework of the dual attack in Section 2.3.

2.1 Lattices

2.1.1 Main definitions

A lattice Λ ⊂ R n is defined by

Λ = Z b 1 + … + Z b k ⊂ R n ,

where b 1 , … , b k ∈ R n . We say that the lattice Λ has full rank if k = n and b 1 , … , b n are linearly independent. The volume of a lattice is defined by

det ( Λ ) ≔ Vol ( R n ⁄ Λ ) ,

and for full-rank lattices, we have that

det ( Λ ) = ∣ det ( B ) ∣ ,

where B = ( b 1 ∣ … ∣ b n ) ∈ R n , n . The dual lattice is defined by

Λ ˆ ≔ { w ∈ R n ∣ ⟨ w , Λ ⟩ ⊂ Z } .

2.1.2 Shortest vectors of a lattice

A very important object in lattice-based cryptography are its shortest nonzero vectors, i.e., elements of the lattice with length

λ 1 ( Λ ) ≔ min v ∈ Λ \ { 0 } ‖ v ‖ .

It is a common strategy in lattice-based cryptography to rely on the so-called Gaussian Heuristic, which is given as follows.

Heuristic 1

The length of the shortest vector of a randomly generated lattice Λ ⊂ R n is approximately given by

λ 1 ( Λ ) ≈ n 2 π e det ( Λ ) 1 n .

2.1.3 Discrete Gaussian distribution

The discrete Gaussian distribution is an important probability distribution on a lattice Λ . This distribution is defined via the Gaussian density function ρ s : R n → ( 0 , ∞ ) , which, for s > 0 , is given by

ρ s ( x ) = e − π ‖ x ‖ 2 s 2 ,

where we denote by ‖ ⋅ ‖ the standard Euclidean norm in R n . For convenience, for a discrete subset A ⊂ R n , we write

ρ s ( A ) ≔ ∑ x ∈ A ρ s ( x ) .

Then, the Gaussian distribution, D Λ , s , over Λ of width s is defined by the following probability mass function:

D Λ , s ( v ) = ρ s ( v ) ρ s ( Λ ) for all v ∈ Λ .

2.1.4 Poisson summation formula

A very useful tool to switch from a lattice to its dual lattice is given by the Poisson summation formula, which is already used in the classic and seminal papers in lattice-based cryptography by Regev [10].

Lemma 2.1

Let Λ be a full-rank lattice, t ∈ R n , and f : R n → R be such that some growth and integrability conditions are fulfilled. Then,

∑ v ∈ Λ f ( v + t ) = 1 det ( Λ ) ∑ w ∈ Λ ˆ f ˆ ( w ) e 2 π i ⟨ t , w ⟩ ,

where f ˆ : R n → C is the Fourier transform of f defined by

y ↦ f ˆ ( y ) = ∫ R n e − 2 π i ⟨ x , y ⟩ f ( x ) d x .

In this work, we are very interested to apply this result for the function x ↦ ρ s ( x ) , where it has the following form.

Corollary 2.2

Let Λ be a full-rank lattice, t ∈ R n , and s > 0 . Then,

∑ v ∈ Λ ρ s ( v − t ) = s n det ( Λ ) ∑ w ∈ Λ ˆ ρ 1 ⁄ s ( w ) e − 2 π i ⟨ t , w ⟩ .

2.2 BDD problem

The main object of investigation in this paper is the BDD problem on a lattice Λ . It is defined as follows.

Definition 2.3

Let Λ be a full-rank lattice, and χ , the error distribution, be a probability measure on R n with

E [ χ ] = 0 and with small variance Var ( χ ) ,

where the notion of “small” depends on the context.

Suppose that we are given
t = v + e , where v ∈ Λ and e ← χ (i.e., e is sampled according to χ ) .
In the BDD-search-problem, we are asked to find v .
Suppose that we are given a sample t ( mod Λ ) , for which we know that
1. it is either a BDD-sample, i.e.,
  t = v + e , where v ∈ Λ and e ← χ ,
2. or a random-sample, i.e.,
  t ( mod Λ ) is distributed according to U ( R n ⁄ Λ ) ,
  where U ( R n ⁄ Λ ) denotes the uniform distribution on R n ⁄ Λ .
In the BDD-decision-problem, we are asked to decide which is the case.

The BDD-problem is closely related to the LWE problem, which is the notion more commonly used in post-quantum cryptography. For a survey on the LWE-problem and its relation to the BDD problem, we refer to the excellent notes by Peikert [11]. Moreover, we refer to these notes for further references concerning the question on the equivalence of the search- and the decision-version.

2.3 Dual attack

In this paper, out of the attacks we listed in the introduction, we focus on the dual attack. This attack attracted a lot of interest in recent years (refer for instance [5–9]) and is still the object of ongoing research. In this section, we present the main ideas of this approach.

The dual attack is aimed at the BDD-decision-problem so that we are given a sample t ( mod Λ ) , for which we know that it is either a BDD-sample or a random-sample. The idea is based on first finding a Λ -periodic function f : R n → R n , the score/distinguisher, such that

f ( t ) is large in the BDD-sample-case,
f ( t ) is small in the random-sample-case.

In practice, we then need to find some α ∈ R such that

if f ( t ) ≥ α , then with high probability we know that t must be a BDD-sample,
if f ( t ) < α , then with high probability we know that t must be a random-sample.

A reasonable candidate for such a score is given by

(1) f ( t ) = ∑ v ∈ Λ ρ s ( v − t )

as, for s > 0 and Var ( χ ) small enough, this function becomes large in the BDD-sample-case. However, this function is not easy to compute as it requires the sum over the whole lattice. This is where the important tool of the Poisson summation formula comes into play. Indeed, for a large enough subset W ⊂ Λ ˆ , we obtain by the Poisson summation formula that

(2) ∑ v ∈ Λ ρ s ( v − t ) = s n det ( Λ ) ∑ w ∈ Λ ˆ ρ 1 ⁄ s ( w ) e − 2 π i ⟨ t , w ⟩ = s n det ( Λ ) ∑ w ∈ Λ ˆ ρ 1 ⁄ s ( w ) cos ( 2 π ⟨ t , w ⟩ ) ≈ s n det ( Λ ) ∑ w ∈ W ρ 1 ⁄ s ( w ) cos ( 2 π ⟨ t , w ⟩ ) .

The set W ⊂ Λ ˆ is usually generated through a lattice-sieve algorithm (such as Blockwise-Korkine-Zolotarev [BKZ] lattice reduction algorithm). The generation of these dual vectors is treated as a black box in this paper. Later on we have to make some concrete assumptions on this output.

In this subsection we are, however, interested in an heuristic way to find a reasonable score. In order to do this, we heuristically assume that the output of the black box yields short dual vectors, which are all roughly of similar norm. This implies in particular that w ↦ ρ 1 ⁄ s ( w ) is almost a constant function on W . Let c denote this constant, i.e., ρ 1 ⁄ s ( w ) ≈ c for all w ∈ W . Then,

(3) ∑ v ∈ Λ ρ s ( v − t ) ≈ s n ⋅ c det ( Λ ) ∑ w ∈ W cos ( 2 π ⟨ t , w ⟩ ) .

As the prefactor s n ⋅ c det ( Λ ) does not depend on t , it is reasonable to simply discard this factor for the score function.

Combining (1), (2), and (3) intuitively justifies to choose the following function, which depends on a large enough subset W ⊂ Λ ˆ , as the score.

f W ( t ) = ∑ w ∈ W f w ( t ) , where f w ( t ) = cos ( 2 π ⟨ t , w ⟩ ) .

To sum up, the strategy for the dual attack against the BDD-decision-problem is to compute f W ( t ) , and to vote for the BDD-sample-case if f W ( t ) ≥ α and to vote for the random-sample-case otherwise.

The important remaining tasks are now

to compute the probabilities with which our strategy is successful,
and to find out how many dual vectors are needed so that the function f W is indeed a good choice, such that the computed probabilities are high enough.

We target these problems later in this paper under certain assumptions on the abovementioned black box.

3 Computing the covariances

The score f W can be seen as a sum of random variables, where these random variables are given by { cos ( 2 π ⟨ t , w ⟩ ) } w ∈ W , where the randomness is coming from both the sample t and the dual vectors in the set W . Whenever one wishes to compute such a sum f W of random variables, it is very handy to find and exploit any kind of independence between the random variables. Only under these independence assumptions one is allowed to apply standard results from probability theory such as the law of large numbers or the central limit theorem. This is why in many works [5,6] on the dual attack against BDD (or LWE), it is heuristically assumed that { cos ( 2 π ⟨ t , w ⟩ ) } w ∈ W is a family of independent random variables. We refer to this assumption in the following as the independence assumption.

However, for w , w ′ ∈ W with w ≠ w ′ , both random variables cos ( 2 π ⟨ t , w ⟩ ) and cos ( 2 π ⟨ t , w ′ ⟩ ) depend on the same sample t . Therefore, it is difficult to justify whether these two random variables are independent of each other. Under certain circumstances, it may be possible that cos ( 2 π ⟨ t , w ⟩ ) and cos ( 2 π ⟨ t , w ′ ⟩ ) are uncorrelated. For instance, this may happen when w and w ′ are orthogonal, which is indeed a consequence of Proposition 3.3. In any case, however, in order to apply standard results from probability theory such as the law of large numbers or the central limit theorem, we need independent and not only uncorrelated only random variables.

Of course, this lack of independence remains true if we assume that w , w ′ are drawn independently of each other (which in turn is a reasonable assumption though, which we will make in Sections 4 and 5). These theoretical doubts on the independence assumption are strengthened in the paper by Ducas and Pulles [7], where (among other results that they achieve) they show that this assumption leads also to contradictions in practical experiments.

In this section, we aim to fully refute the independence assumption by explicitly computing the covariance between the random variables and showing that these are not equal to zero.

We proceed as follows. We first provide some preparatory steps such as the precise definitions of the distributions in the two cases. Then we compute the covariances in the random-sample-case in Section 3.2 and in the BDD-sample-case in Section 3.3. Finally in this section, we provide some heuristic estimates, which underline that covariances in the BDD-sample-case are indeed nonzero.

Remark 3.1

As we change the perspective at some point in this paper, it is useful to emphasize, when and where we consider the sample t and the dual vectors in W as random or as fixed.

Formally, we always consider the family { cos ( 2 π ⟨ t , w ⟩ ) } w ∈ W as a family of random variables, where the randomness is coming from both the sample t and the dual vectors in W . However, in Sections 4 and 5, we will use the so-called probability measure P [ ⋅ ∣ t ] conditioned on t. As a consequence of the properties of P [ ⋅ ∣ t ] , we implicitly see t as already sampled a priori. One can now intuitively say that t is consequently seen as fixed and not random anymore. However, in reality it is still a random variable, but its randomness is not visible under P [ ⋅ ∣ t ] .

This is in contrast to this section. In this section, we consider (implicitly) the probability measure conditioned on W. That is, we consider informally the dual vectors in W as fixed. What we will obtain in Proposition 3.3 is an expression for the covariances in terms of the dual vectors in W . This expression shows that the covariance between cos ( 2 π ⟨ t , w ⟩ ) and cos ( 2 π ⟨ t , w ′ ⟩ ) is clearly nonzero in the usual case, when w and w ′ are not orthogonal. In order to obtain a more comprehensible expression for this covariance, we perform some heuristic computations with it in Appendix A.

3.1 Some preparation

We adopt the notation given in the paper by Ducas and Pulles [7] and repeat the approach described by them [7, Section 2.3]. Recall the definition of W , f W , and f w from above. In the following, we abbreviate m 0 = ∣ W ∣ , which is an important quantity we need to estimate (which in turn yields an indication on the complexity of the dual attack).

3.1.1 BDD-sample-case

In the BDD-sample-case, we are given a sample t = v + e 0 with v ∈ Λ and e 0 ← χ . Here we explicitly assume that χ is the n -dimensional, continuous Gaussian distribution with covariance matrix σ 0 2 ⋅ 1 n for some σ 0 > 0 . Note that for any dual vector w ∈ Λ ˆ , one has

⟨ t , w ⟩ ≡ ⟨ e 0 , w ⟩ mod 1 .

3.1.2 Random-sample-case

We now provide a more precise definition of a random-sample, which is taken from the paper by Laarhoven and Walter [5, Definition 1]. Let Λ be a full-rank n -dimensional lattice. Let B be a basis of Λ . The random-sample distribution for Λ corresponds to the distribution obtained by sampling target vectors t uniformly at random from the fundamental parallelepiped generated by the basis B . We can write t as t = B ψ , where the components of ψ are uniformly distributed on [ − 1 2 , 1 2 ] .

For two fixed dual vectors w , w ˜ ∈ W , w ≠ w ˜ , we write explicitly

w = ( B − 1 ) T μ , w ˜ = ( B − 1 ) T μ ˜ ,

where the components of μ and μ ˜ are integers.

3.1.3 Covariances

In the paper by Ducas and Pulles [7, Lemma 4], approximations of the expectation values and variances of a single f w ( t ) are given for the two cases “random-sample vs BDD-sample.” In general, we have for the variance of the score

V ( f W ( t ) ) = ∑ w ∈ W V ( f w ( t ) ) + ∑ w , w ˜ ∈ W , w ≠ w ˜ Cov ( f w ( t ) , f w ˜ ( t ) ) .

If the independence assumption (cf. [7, Heuristic 2]) is valid, the second sum over the single covariances is equal to 0. However, in the following, we want to derive approximations of this second sum in the BDD-sample-case, which contradicts this claim. In the end, this might explain that in the experiments in the paper by Ducas and Pulles [7, Table 1], the measured variance is much larger than predicted.

3.2 Computing the covariances for random-samples

We begin with the random-sample-case, where we will see that (at least pairwise) independence assumption holds. We obtain the following result.

Proposition 3.2

Let t be random-sample and w , w ˜ ∈ W be such that w and w ˜ are linearly independent. Let s , s ˜ ∈ [ − 1 ⁄ 2 , 1 ⁄ 2 ] . Then,

P ( ⟨ t , w ⟩ mod 1 ≤ s , ⟨ t , w ˜ ⟩ mod 1 ≤ s ˜ ) = s + 1 2 s ˜ + 1 2 .

In particular, we have that

Cov ( f w ( t ) , f w ˜ ( t ) ) = 0 ,

Proof

Recall the definition of ψ , μ , and μ ˜ from above. In particular recall that the components of μ and μ ˜ are integers. We consider the two-dimensional distribution of

⟨ t , w ⟩ ⟨ t , w ˜ ⟩ = ⟨ ψ , μ ⟩ ⟨ ψ , μ ˜ ⟩

and its reduction

⟨ ψ , μ ⟩ mod 1 ⟨ ψ , μ ˜ ⟩ mod 1

as a random variable in ψ . We want to compute the probabilities for − 1 ⁄ 2 ≤ s , s ˜ ≤ 1 ⁄ 2 .

P ( ⟨ ψ , μ ⟩ mod 1 ≤ s , ⟨ ψ , μ ˜ ⟩ mod 1 ≤ s ˜ ) = Vol ( ⟨ ψ , μ ⟩ mod 1 ≤ s , ⟨ ψ , μ ˜ ⟩ mod 1 ≤ s ˜ ) .

We can compute this volume as sub-volume of the n -dimensional cube by counting over the points ( k 1 p , … , k n p ) , k j integers with − p ⁄ 2 ≤ k j ≤ p ⁄ 2 , for very large prime p and going to the limit. As approximation we obtain the sum

∑ r , r ˜ , with r ⁄ p ≤ s , r ˜ ⁄ p ≤ s ˜ ∑ k j , with ∑ j μ j k j ⁄ p mod 1 = r ⁄ p , ∑ j μ ˜ j k j ⁄ p mod 1 = r ˜ ⁄ p 1 p n = ∑ r , r ˜ , with r ≤ s p , r ˜ ≤ s ˜ p ∑ k j , with ∑ j μ j k j mod p = r , ∑ j μ ˜ j k j mod p = r ˜ 1 p n ,

where r , r ˜ are integers in [ − p ⁄ 2 , p ⁄ 2 ] . Since μ and μ ˜ are linearly independent (over the rational numbers or the real numbers), the second sum has exactly p n − 2 solutions. In the end, we derive

∑ r , r ˜ , with r ≤ s p , r ˜ ≤ s ˜ p 1 p 2 = 1 p 2 # { [ − p ⁄ 2 , s p ] ∩ Z } ⋅ # { [ − p ⁄ 2 , s ˜ p ] ∩ Z } ⟶ p → ∞ s + 1 2 s ˜ + 1 2 .

This shows that the random variables ⟨ ψ , μ ⟩ mod 1 and ⟨ ψ , μ ˜ ⟩ mod 1 are independent and the covariances vanish.□

3.3 Computing the covariances for BDD-samples

We now assume that t is chosen as a BDD-sample. Recall that e 0 is sampled from an n -dimensional, continuous Gaussian distribution with covariance matrix σ 0 2 ⋅ 1 n . We obtain the following result.

Proposition 3.3

Let t be a BDD-sample and w , w ˜ ∈ W be such that w and w ˜ are linearly independent. Then,

(4) Cov ( f w ( t ) , f w ˜ ( t ) ) = 1 2 Δ a + 1 2 Δ b − Δ c ⋅ Δ d

where we set

Δ a = e − 2 π 2 ‖ w + w ˜ ‖ 2 σ 0 2 , Δ b = e − 2 π 2 ‖ w − w ˜ ‖ 2 σ 0 2 , Δ c = e − 2 π 2 ‖ w ‖ 2 σ 0 2 , Δ d = e − 2 π 2 ‖ w ˜ ‖ 2 σ 0 2 .

Proof

We fix two dual vectors w , w ˜ ∈ W , w ≠ w ˜ and consider the two-dimensional distribution of

⟨ e 0 , w ⟩ ⟨ e 0 , w ˜ ⟩

as a random variable in e 0 . This random variable is again Gaussian distributed with covariance matrix

Σ = σ 0 2 ‖ w ‖ 2 ⟨ w , w ˜ ⟩ ⟨ w , w ˜ ⟩ ‖ w ˜ ‖ 2 .

Since we assume that w and w ˜ are linear independent and hence, define a two-dimensional positive definite subspace of R n and Σ is invertible. We set

P ˜ ( z ) = 1 2 π det ( Σ ) e − 1 2 z T Σ − 1 z .

The distribution of the reduced random variable

c = c 1 c 2 = ⟨ e 0 , w ⟩ mod 1 ⟨ e 0 , w ˜ ⟩ mod 1

is equal to

P ( c ) = ∑ μ ∈ Z 2 P ˜ ( c + μ ) .

We use the Poisson summation formula and obtain

P ( c ) = ∑ μ ∈ Z 2 P ˜ ( c + μ ) = ∑ v ∈ Z 2 e − 2 π i ⟨ v , c ⟩ e − 2 π 2 v t Σ v .

We now start the computation by

E ( f w ( t ) ⋅ f w ˜ ( t ) ) = ∫ c 1 , c 2 cos ( 2 π c 1 ) ⋅ cos ( 2 π c 2 ) P ( c 1 , c 2 ) d c 1 d c 2 = ∑ v ∈ Z 2 e − 2 π 2 v t Σ v ∫ c 1 cos ( 2 π c 1 ) e − 2 π i v 1 c 1 d c 1 ⋅ ∫ c 2 cos ( 2 π c 2 ) e − 2 π i v 2 c 2 d c 2 .

It is easily seen that each univariate integral (in c 1 or c 2 , respectively) vanishes for all v 1 , except for v 1 = ± 1 and for v 2 = ± 1 , respectively. Namely, we have

2 ∫ 0 1 cos ( 2 π n t ) e − 2 π i m t d t = ∫ 0 1 e 2 π i ( n − m ) t d t + ∫ 0 1 e 2 π i ( − n − m ) t d t .

The first integral on the right-hand side vanishes except for n = m and the second integral vanishes except for n = − m . Both integrals are equal to 1 if they do not vanish and the claim follows.

Therefore, we obtain

E ( f w ( t ) ⋅ f w ˜ ( t ) ) = 1 4 ∑ v 1 = ± 1 , v 2 ± 1 e − 2 π 2 v t Σ v = 1 2 Δ a + 1 2 Δ b

Lemma 4 in the paper by Ducas and Pulles [7] states the equality for the expectation value

E ( f w ( t ) ) = e − 2 π 2 σ 0 2 ‖ w ‖ 2 .

In the end, we derive for the covariance

□ Cov ( f w ( t ) , f w ˜ ( t ) ) = 1 2 Δ a + 1 2 Δ b − Δ c ⋅ Δ d .

Remark 3.4

Of course, the just computed covariances seem to be non-vanishing (when w and w ˜ are not orthogonal). However, as the formulas are quite hard to be intuitively comprehensible, we provide in Appendix A some heuristic estimates, which underline that the covariances are indeed nonzero and imply the bias of the variance, which was observed experimentally in in the paper by Ducas and Pulles [7].

4 Dual attack

In this section, we estimate the success probability of the above described dual attack and provide a quantification of the number of dual vectors that are needed in theory for a successful attack. The strategy is based on the application of the so-called Hoeffding’s inequality for f W ( t ) with respect to the probability measure conditioned on t.

In order to do this, we proceed as follows. We first introduce some technical ingredients that we need in the proof. We then recall some known result on the (conditional) expectation of f W ( t ) that we will need. Afterwards, we state and prove the main results given by an estimate of the success probability and the cost of the attack.

4.1 Technical ingredients

4.1.1 Distribution of the dual vectors

In order to be able to quantify the distribution of the score, we need to make assumptions on the output of the lattice sieve algorithm, from which we obtain the dual vectors.

Heuristic 2

We make the following assumptions on the distribution of the family { w } w ∈ W of random dual vectors.

{ w } w ∈ W is a family of independent random variables.
{ w } w ∈ W is independent of the sample t .
For all w ∈ W , we have that w is distributed according to D Λ ˆ , τ 0 2 π , where τ 0 > 0 .

4.1.1.1 Justification of Heuristic 2

The first two bullet points are intuitively justified as the dual vectors are usually generated, via lattice reduction algorithms such as BKZ, independent of each other and also independently from the sample t . However, the justification of the last bullet point, where we make an assumption on the distribution of the dual vectors, is not obvious. While it is clear that some assumption on this distribution has to be made for a rigorous analysis, additional arguments are required to justify why we assume that the dual vectors are distributed according to a discrete Gaussian distribution.

We note that our assumption in Heuristic 2 aligns with the paper by Pouly and Shen [9], whereas in Ducas and Pulles [8] made the assumption that the dual vectors are uniformly distributed on a centered ball B r n ⊂ R n of a certain radius r . We believe that the most realistic assumption, which resembles the behavior of the output of the lattice reduction algorithms most suitably, would be a combination of both assumptions, where the dual vectors are uniformly distributed on ( B r n ∩ Λ ˆ ) ⁄ B r ′ n for some r > r ′ > 0 . However, under this assumption, the rigorous analysis becomes much more complicated as such an analysis would involve hypergeometric and Bessel functions (instead of the Gaussian kernel as in our case). Therefore, these both ways, i.e., the assumption from Heuristic 2 and the paper by Pouly and Shen [9] and the assumption from the paper by Ducas and Pulles [8], provide a first, meaningful step toward this most realistic assumption, the latter being left for future research.

4.1.2 Conditional expectation and conditional probability measure

In our proofs, we use the widely used concept in probability theory of conditional expectation. For an introduction into this topic we refer to any standard textbook in probability theory [12, Sections 33 and 34].

Here we only provide the intuitive description that the conditional expectation given t , denoted by E [ ⋅ ∣ t ] , is the best prediction under the condition that we know t . In particular, it is again a random variable. Intuitively, this object suits our setting as we are also given a sample t and afterwards we try to estimate the score dependent on this t .

We recall some important facts of the conditional expectation that we will use:

Many properties of the conditional expectation and the associated conditional probability measure are harvested from the usual expectation and the probability measure. Examples are the linearity and monotonicity of the expectation; refer [12, Sections 33 and 34] for more information.
The conditional expectation is only defined almost surely, i.e., everywhere except of a null set, i.e., a set of measure 0 for the underlying probability space. In our case, this probability space is given as the underlying probability space, on which the random variables t and { w } w ∈ W are formally defined. In this paper, we abuse notation and do not mention the fact that some of the equalities only hold almost surely. This has no effect on the final results as, for the practical relevance of these results, it suffices that the results hold almost everywhere (i.e., everywhere except of a nullset).
The conditional probability measure is, as for the usual probability measure, defined as the conditional expectation of the indicator function x ↦ 1 A ( x ) with respect to a Borel subset A . That is, in our case, we have that P [ A ∣ t ] = E [ 1 A ∣ t ] .
In the special case that a random variable X is independent of t , we have that (for any function G that satisfies some integrability condition)
(5) E [ G ( X , t ) ∣ t ] ( ⋅ ) = E X [ G ( X , t ( ⋅ ) ) ] ,
where the notation indicates that the left-hand side is a random variable (as per definition of the conditional expectation) and the right-hand side is an expectation over X but by taking t ( ⋅ ) as a given input parameter for the expectation.

A very important consequence of the above is given in the following lemma.

Lemma 4.1

Recall that m 0 = ∣ W ∣ and

f w ( t ) = cos ( 2 π ⟨ t , w ⟩ ) for a l l w ∈ W .

Write W = { w 1 , … , w m 0 } . Let { B i } i = 1 , … , m 0 be an arbitrary family of Borel subsets of R . Then, under Heuristic 2,

(6) P ⋂ i = 1 m 0 { f w i ( t ) ∈ B i } ∣ t = ∏ i = 1 m 0 P [ f w i ( t ) ∈ B i ∣ t ] .

In particular, we have that the sequence { f w i ( t ) } i = 1 , … , m 0 = { f w ( t ) } w ∈ W is independent with respect to the conditional probability measure P [ ⋅ ∣ t ] .

Proof

We have that

P ⋂ i = 1 m 0 { f w i ( t ) ∈ B i } ∣ t ( ⋅ ) = E ∏ i = 1 m 0 1 B i ( f w i ( t ) ) ∣ t ( ⋅ ) = E [ G ( t , w 1 , … , w m 0 ) ∣ t ] ( ⋅ ) ,

where we defined the function G : R n × R m 0 → R as G ( t , x 1 , … , x m 0 ) = ∏ i = 1 m 0 1 B i ( f x i ( t ) ) . Then, by applying (5),

E [ G ( t , w 1 , … , w m 0 ) ∣ t ] ( ⋅ ) = E [ G ( t ( ⋅ ) , w 1 , … , w m 0 ) ] = E ∏ i = 1 m 0 1 B i ( f w i ( t ( ⋅ ) ) ) .

Now we use that, inside the last expectation, t ( ⋅ ) is fixed. In particular, it acts as an additional parameter in the expectation and not as a running variable for the integration corresponding to the expectation. Hence, we can use the independence assumption from Heuristic 2 to obtain that

E ∏ i = 1 m 0 1 B i ( f w i ( t ( ⋅ ) ) ) = ∏ i = 1 m 0 E [ 1 B i ( f w i ( t ( ⋅ ) ) ) ] = ∏ i = 1 m 0 E [ 1 B i ( f w i ( t ) ) ] ( ⋅ ) .

This concludes the proof.□

Remark 4.2

We emphasize here that the independence claim from Lemma 4.1 is with respect to P [ ⋅ ∣ t ] . In particular, this is a different statement than the non-independence mentioned at the beginning of Section 3.

4.1.3 Hoeffding’s inequality

Inspired by the impressive work of Pouly and Shen [9], we will make use of the classic Hoeffding’s inequality, which is given as follows.

Lemma 4.3

Let X 1 , … , X m be independent random variables such that a i ≤ X i ≤ b i almost surely. Let S m = X 1 + … + X m . Then, for all r > 0 ,

P [ S m − E [ S m ] ≥ r ] ≤ e − 2 r 2 ( ∑ i = 1 m ( b i − a i ) 2 ) − 1 .

However, in this work we apply this inequality not for P [ ⋅ ] but for the conditional probability measure P [ ⋅ ∣ t ] . It is straightforward to see that this inequality also holds true for the conditional probability measure, since the original proof only relies on elementary properties of the expectation and the probability measure, and these properties also hold true for their conditional versions.

4.1.4 Threshold

Our threshold for the distinguishing between the random-sample- and the BDD-sample-case is given as follows:

(7) α = c ⋅ e − 2 π 2 ( n + 2 n 1 ⁄ 2 + ξ + 2 n ξ ) σ 0 2 τ 0 2

for some arbitrarily chosen c ∈ ( 0 , 1 ) and ξ ∈ ( 0 , 1 ⁄ 2 ) .

4.1.5 Concentration of chi-squared-distribution

Another important result that we use is the following concentration inequalities for the chi-squared-distribution.

Lemma 4.4

Let e be distributed according to an n-dimensional, continuous Gaussian distribution with covariance matrix σ 0 2 ⋅ 1 n for some σ 0 > 0 . Then, for all ξ ∈ ( 0 , 1 ⁄ 2 ) ,

(8) P ‖ e ‖ 2 σ 0 2 ∈ [ n − 2 n 1 ⁄ 2 + ξ , n + 2 n 1 ⁄ 2 + ξ + 2 n ξ ] ≥ 1 − e − n ξ .

Proof

A proof can be found in the paper by Laurent and Massart [13, Lemma 1].□

4.2 Estimating the conditional expectation of f W ( t )

A crucial step to successfully apply Hoeffding’s inequality for f W ( t ) is to find good estimates on the conditional expectation of f W ( t ) given t . This is done in the following. We first provide the estimates for the BDD-sample-case. Then, we consider two different approaches for the random-sample-case.

4.2.1 BDD-sample-case

Lemma 4.5

Let t = v + e 0 with v ∈ Λ and e 0 ← χ . Then,

E 1 m 0 f W ( t ) ∣ t ≥ e − 2 π 2 τ 0 2 ‖ e 0 ‖ 2 .

Proof

Using the linearity of the conditional expectation that all dual vectors are identically distributed as some w ∼ D Λ ˆ , τ 0 2 π and identity (5), we have that

(9) E 1 m 0 f W ( t ) ∣ t = 1 m 0 ∑ w ∈ W E [ cos ( 2 π ⟨ t , w ⟩ ) ∣ t ] = 1 m 0 ∑ w ∈ W E w ∼ D Λ ˆ , τ 0 2 π [ cos ( 2 π ⟨ t , w ⟩ ) ] = E w ∼ D Λ ˆ , τ 0 2 π [ cos ( 2 π ⟨ t , w ⟩ ) ] .

Applying now the Poisson summation formula, we obtain that

E w ∼ D Λ ˆ , τ 0 2 π [ cos ( 2 π ⟨ t , w ⟩ ) ] = ∑ w ∈ Λ ˆ cos ( 2 π ⟨ t , w ⟩ ) ρ τ 0 2 π ( w ) ∑ w ∈ Λ ˆ ρ τ 0 2 π ( w ) = ∑ w ∈ Λ ˆ e − 2 π i ⟨ t , w ⟩ ρ τ 0 2 π ( w ) ∑ w ∈ Λ ˆ ρ τ 0 2 π ( w ) = ρ ( τ 0 2 π ) − 1 ( Λ − t ) ρ ( τ 0 2 π ) − 1 ( Λ ) .

Moreover, using that t = v + e 0 and that v ∈ Λ , we obtain that

ρ ( τ 0 2 π ) − 1 ( Λ − t ) = ρ ( τ 0 2 π ) − 1 ( Λ − ( v + e 0 ) ) = ρ ( τ 0 2 π ) − 1 ( Λ − e 0 ) .

At this point, we can apply the following standard lower bound from the thesis by Stephens-Davidowitz [14, Lemma 1.3.10]

ρ ( τ 0 2 π ) − 1 ( Λ − e 0 ) = ∑ x ∈ Λ ρ ( τ 0 2 π ) − 1 ( x − e 0 ) = 1 2 ∑ x ∈ Λ ( ρ ( τ 0 2 π ) − 1 ( x − e 0 ) + ρ ( τ 0 2 π ) − 1 ( − x − e 0 ) ) = ρ ( τ 0 2 π ) − 1 ( e 0 ) ∑ x ∈ Λ ρ ( τ 0 2 π ) − 1 ( x ) cosh ( 4 π 2 τ 0 2 ⟨ x , e 0 ⟩ ) ≥ ρ ( τ 0 2 π ) − 1 ( e 0 ) ∑ x ∈ Λ ρ ( τ 0 2 π ) − 1 ( x ) = e − 2 π 2 τ 0 2 ‖ e 0 ‖ 2 ρ ( τ 0 2 π ) − 1 ( Λ ) .

We finally obtain that

E w ∼ D ( Λ ˆ , τ 0 2 π ) [ cos ( 2 π ⟨ e 0 , w ⟩ ) ] = ρ ( τ 0 2 π ) − 1 ( Λ − e 0 ) ρ ( τ 0 2 π ) − 1 ( Λ ) ≥ e − 2 π 2 τ 0 2 ‖ e 0 ‖ 2 .

This concludes the proof.□

4.2.2 Random-sample-case – Approach 1

Lemma 4.6

Let t be a random-sample. Then, whenever dist ( t , Λ ) ≥ r ≔ ( τ 0 2 π ) − 1 n ,

E 1 m 0 f W ( t ) ∣ t ≤ e − 2 π 2 τ 0 2 ( dist ( t , Λ ) − r ) 2 .

Proof

Using the same arguments as above, we have that

E 1 m 0 f W ( t ) ∣ t = E w ∼ D Λ ˆ , τ 0 2 π [ cos ( 2 π ⟨ t , w ⟩ ) ] = ρ ( τ 0 2 π ) − 1 ( Λ − t ) ρ ( τ 0 2 π ) − 1 ( Λ ) .

Now, we can proceed as in the paper by Pouly and Shen [9, Lemma 8] to bound the right-hand side and to conclude the proof.□

Remark 4.7

Note that the paper by Pouly and Shen [9, Lemma 8], which is the core of the proof of Lemma 4.6, relies heavily on the classic estimates from the paper by Banaszczyk [15]. In Section 5, where we heuristically reprove the statements of this section, we will directly apply the estimates from the paper by Banaszczyk [15]. In order to fully understand the parallels between the approach in this section and the one from Section 5, it is important to have this in mind.

4.2.3 Random-sample-case – Approach 2

Lemma 4.8

Let t be a random-sample. Let ζ ≥ 1 . Then,

(10) E 1 m 0 f W ( t ) ∣ t ≤ ( 2 ( 1 + ζ ) ) n e − π 2 τ 0 2 1 − 1 ζ dist ( t , Λ ) 2 .

Proof

Using the same arguments as above, we have that

(11) E 1 m 0 f W ( t ) ∣ t = ρ ( τ 0 2 π ) − 1 ( Λ − t ) ρ ( τ 0 2 π ) − 1 ( Λ ) .

We now estimate the numerator on the right-hand side. It consists of a sum of exponentials of the form e − 2 π 2 τ 0 2 ‖ v − t ‖ 2 , where v ∈ Λ . We focus on estimating the factor ‖ v − t ‖ 2 in the exponent. Here we are inspired by the paper by Chen et al. [16, Appendix A].

Let K ( t ) be a closest lattice vector to t and let δ = 1 2 ( 1 + ζ ) − 1 . Then, by the triangle inequality, we have that

‖ v − t ‖ 2 ≥ 1 2 ‖ v − t ‖ 2 + 1 2 ‖ K ( t ) − t ‖ 2 = 1 2 ‖ v − t ‖ 2 + 1 2 ‖ K ( t ) − t ‖ 2 − δ ( ‖ v − t ‖ + ‖ K ( t ) − t ‖ ) 2 + δ ( ‖ v − t ‖ + ‖ K ( t ) − t ‖ ) 2 ≥ 1 2 ‖ v − t ‖ 2 + 1 2 ‖ K ( t ) − t ‖ 2 − δ ( ‖ v − t ‖ + ‖ K ( t ) − t ‖ ) 2 + δ ( ‖ v − t + t − K ( t ) ‖ ) 2 = 1 2 − δ ‖ v − t ‖ 2 + 1 2 − δ ‖ K ( t ) − t ‖ 2 − 2 δ ‖ v − t ‖ ‖ K ( t ) − t ‖ + δ ‖ v − K ( t ) ‖ 2 .

Applying now the Peter–Paul inequality ( 2 a b ≤ ζ a 2 + ζ − 1 b 2 for a , b ∈ R ), we obtain that

‖ v − t ‖ 2 ≥ 1 2 − δ ‖ v − t ‖ 2 + 1 2 − δ ‖ K ( t ) − t ‖ 2 − δ ζ ‖ v − t ‖ 2 − δ 1 ζ ‖ K ( t ) − t ‖ 2 + δ ‖ v − K ( t ) ‖ 2 = 1 2 − δ − δ ζ ‖ v − t ‖ 2 + 1 2 − δ − δ 1 ζ ‖ K ( t ) − t ‖ 2 + δ ‖ v − K ( t ) ‖ 2 .

Due to our choice of δ , we have that

1 2 − δ − δ ζ = 0 , 1 2 − δ − δ 1 ζ = 1 2 1 − 1 ζ .

This yields that

‖ v − t ‖ 2 ≥ 1 2 1 − 1 ζ ‖ K ( t ) − t ‖ 2 + δ ‖ v − K ( t ) ‖ 2 .

Via this estimate, we obtain that

(12) ρ ( τ 0 2 π ) − 1 ( Λ − t ) = ∑ v ∈ Λ e − 2 π 2 τ 0 2 ‖ v − t ‖ 2 ≤ e − 2 π 2 τ 0 2 1 2 1 − 1 ζ ‖ t − K ( t ) ‖ 2 ∑ v ∈ Λ e − 2 π 2 τ 0 2 δ ‖ K ( t ) − v ‖ 2 = e − π 2 τ 0 2 1 − 1 ζ ‖ t − K ( t ) ‖ 2 ∑ v ∈ Λ e − 2 π 2 τ 0 2 δ ‖ v ‖ 2 = e − π 2 τ 0 2 1 − 1 ζ ‖ t − K ( t ) ‖ 2 ρ ( τ 0 2 π δ ) − 1 ( Λ ) .

Via the Poisson summation formula and since δ ≤ 1 , we have that

(13) ρ ( τ 0 2 π δ ) − 1 ( Λ ) = ( τ 0 2 π δ ) − n det ( Λ ) ∑ w ∈ Λ ˆ ρ τ 0 2 π δ ( w ) ≤ ( τ 0 2 π δ ) − n det ( Λ ) ∑ w ∈ Λ ˆ ρ τ 0 2 π ( w ) = δ − n ⁄ 2 ρ ( τ 0 2 π ) − 1 ( Λ ) .

Combining (13), (12), and (11) yields (10).□

4.3 Estimating the success probabilities

We now have collected everything we need to formulate some of the main results of this paper in the following three theorems. We first consider the BDD-sample-case. Then, we consider two different approaches for the random-sample-case.

4.3.1 BDD-sample-case

Theorem 4.9

Let t = v + e 0 with v ∈ Λ and e 0 ← χ . Let α , c , ξ be as in (7). Then,

(14) P [ f W ( t ) < m 0 α ] ≤ e − 1 2 m 0 ( 1 − c ) 2 e − 4 π 2 ( n + 2 n 1 ⁄ 2 + ξ + 2 n ξ ) σ 0 2 τ 0 2 + e − n ξ .

Proof

We begin with some elementary reformulations of the desired probability.

(15) P [ f W ( t ) < m 0 α ] = P [ − f W ( t ) > − m 0 α ] = P [ − f W ( t ) − E [ − f W ( t ) ∣ t ] > E [ f W ( t ) ∣ t ] − m 0 α ] ≤ P [ − f W ( t ) − E [ − f W ( t ) ∣ t ] > m 0 ( e − 2 π 2 ‖ e 0 ‖ 2 τ 0 2 − α ) ] ,

where we have used Lemma 4.5 in the last step.

We then rewrite the right-hand side of (15) by using basic rules of probability theory to see that

P [ − f W ( t ) − E [ − f W ( t ) ∣ t ] > m 0 ( e − 2 π 2 ‖ e 0 ‖ 2 τ 0 2 − α ) ] = P − f W ( t ) − E [ − f W ( t ) ∣ t ] > m 0 ( e − 2 π 2 ‖ e 0 ‖ 2 τ 0 2 − α ) , ‖ e 0 ‖ 2 σ 0 2 < n + 2 n 1 ⁄ 2 + ξ + 2 n ξ + P − f W ( t ) − E [ − f W ( t ) ∣ t ] > m 0 ( e − 2 π 2 ‖ e 0 ‖ 2 τ 0 2 − α ) , ‖ e 0 ‖ 2 σ 0 2 ≥ n + 2 n 1 ⁄ 2 + ξ + 2 n ξ ≤ P [ − f W ( t ) − E [ − f W ( t ) ∣ t ] > m 0 ( e − 2 π 2 ( n + 2 n 1 ⁄ 2 + ξ + 2 n ξ ) σ 0 2 τ 0 2 − α ) ] + P ‖ e 0 ‖ 2 σ 0 2 ≥ n + 2 n 1 ⁄ 2 + ξ + 2 n ξ = E [ P [ − f W ( t ) − E [ − f W ( t ) ∣ t ] > m 0 ( 1 − c ) e − 2 π 2 ( n + 2 n 1 ⁄ 2 + ξ + 2 n ξ ) σ 0 2 τ 0 2 ∣ t ] ] + P ‖ e 0 ‖ 2 σ 0 2 ≥ n + 2 n 1 ⁄ 2 + ξ + 2 n ξ ≤ e − 1 2 m 0 ( 1 − c ) 2 e − 4 π 2 ( n + 2 n 1 ⁄ 2 + ξ + 2 n ξ ) σ 0 2 τ 0 2 + e − n ξ ,

where we applied Hoeffding’s inequality for conditional probability measures and the concentration inequality (8) in the last step. This concludes the proof.□

4.3.2 Random-sample-case – Approach 1

Theorem 4.10

Let t be a random-sample and α , c , ξ be as in (7). Let ℓ ∈ N be some arbitrary positive integer. Suppose that σ 0 , τ 0 , and n are such that

(16) 1 τ 0 2 π + σ 0 1 − log ( c ⁄ 2 ) 2 π 2 τ 0 2 σ 0 2 n + 2 n − ( 1 ⁄ 2 − ξ ) + n − ( 1 − ξ ) ≤ 1 2 π e det ( Λ ) ℓ 1 n .

Then,

(17) P [ f W ( t ) > m 0 α ] ≤ e − 1 2 m 0 c 2 4 e − 4 π 2 ( n + 2 n 1 ⁄ 2 + ξ + 2 n ξ ) σ 0 2 τ 0 2 + 1 ℓ .

Proof

Let

δ n ≔ r + − log ( c ⁄ 2 ) ( 2 π 2 τ 0 2 ) − 1 + ( n + 2 n 1 ⁄ 2 + ξ + n ξ ) σ 0 2 ,

where r is defined in Lemma 4.6. Via some elementary reformulations, we obtain that

P [ f W ( t ) > m 0 α ] = P [ f W ( t ) − E [ f W ( t ) ∣ t ] > m 0 α − E [ f W ( t ) ∣ t ] ] = P [ f W ( t ) − E [ f W ( t ) ∣ t ] > m 0 α − E [ f W ( t ) ∣ t ] , dist ( t , Λ ) ≤ δ n ] + P [ f W ( t ) − E [ f W ( t ) ∣ t ] > m 0 α − E [ f W ( t ) ∣ t ] , dist ( t , Λ ) > δ n ] ≤ P [ dist ( t , Λ ) ≤ δ n ] + P [ f W ( t ) − E [ f W ( t ) ∣ t ] > m 0 α − m 0 e − 2 π 2 ( dist ( t , Λ ) − r ) 2 τ 0 2 , dist ( t , Λ ) > δ n ] ,

where we have used Lemma 4.6 in the last step, which is applicable since dist ( t , Λ ) > δ n > r .

Estimation of the first term. In order to estimate the first term, note that

(18) P [ dist ( t , Λ ) ≤ δ n ] = 1 det ( Λ ) Vol ( V ( Λ ) ∩ B δ n n ) ≤ 1 det ( Λ ) Vol ( B δ n n ) ,

where V ( Λ ) is the Voronoi cell of Λ and B δ n n is the n -dimensional ball of radius δ n around the origin. We show in Appendix B that

(19) Vol ( B δ n n ) ≤ 2 π e n δ n n .

Moreover, by assumption (16),

(20) δ n = 1 τ 0 2 π n + σ 0 n 1 − log ( c ⁄ 2 ) 2 π 2 τ 0 2 σ 0 2 n + 2 n − ( 1 ⁄ 2 − ξ ) + n − ( 1 − ξ ) ≤ n 1 2 π e det ( Λ ) ℓ 1 n .

Combining (18), (19), and (20) yields that

P [ dist ( t , Λ ) ≤ δ n ] ≤ 1 ℓ .

Estimation of the second term. For the second term, we observe that

P [ f W ( t ) − E [ f W ( t ) ∣ t ] > m 0 α − m 0 e − 2 π 2 ( dist ( t , Λ ) − r ) 2 τ 0 2 , dist ( t , Λ ) > δ n ] ≤ P [ f W ( t ) − E [ f W ( t ) ∣ t ] > m 0 α − m 0 e − 2 π 2 ( δ n − r ) 2 τ 0 2 ] = P f W ( t ) − E [ f W ( t ) ∣ t ] > m 0 α − m 0 c 2 e − 2 π 2 ( n + 2 n 1 ⁄ 2 + ξ + n ξ ) σ 0 2 τ 0 2 = P f W ( t ) − E [ f W ( t ) ∣ t ] > m 0 α 2 .

As above, we can now apply Hoeffding’s inequality by invoking the probability measure conditioned on t . This concludes the proof.□

4.3.3 Random-sample-case – Approach 2

Theorem 4.11

Let t be a random-sample and α , c , ξ be as in (7) and ζ as in Lemma 4.8. Let ℓ ∈ N be some arbitrary positive integer. Suppose that σ 0 , τ 0 , and n are such that

(21) π 2 τ 0 2 1 − 1 ζ − 1 1 2 log ( 2 ( 1 + ζ ) ) − log ( c ⁄ 2 ) n − 1 + 2 π 2 τ 0 2 σ 0 2 ( 1 + 2 n − 1 ⁄ 2 + ξ + 2 n − 1 + ξ ) ≤ 1 2 π e det ( Λ ) ℓ 1 n .

Then,

(22) P [ f W ( t ) > m 0 α ] ≤ e − 1 2 m 0 c 2 4 e − 4 π 2 ( n + 2 n 1 ⁄ 2 + ξ + 2 n ξ ) σ 0 2 τ 0 2 + 1 ℓ .

Proof

Let

δ n 2 ≔ π 2 τ 0 2 1 − 1 ζ − 1 ( log ( ( 2 ( 1 + ζ ) ) n ⁄ 2 ) − log ( α ⁄ 2 ) ) .

Then, we know that δ n = O ( n ) , since by the definition of α (see (7)),

δ n 2 = π 2 τ 0 2 1 − 1 ζ − 1 n 2 log ( 2 ( 1 + ζ ) ) − log ( c ⁄ 2 ) + 2 π 2 τ 0 2 σ 0 2 ( n + 2 n 1 ⁄ 2 + ξ + 2 n ξ ) = n π 2 τ 0 2 1 − 1 ζ − 1 1 2 log ( 2 ( 1 + ζ ) ) − log ( c ⁄ 2 ) n − 1 + 2 π 2 τ 0 2 σ 0 2 ( 1 + 2 n − 1 ⁄ 2 + ξ + 2 n − 1 + ξ ) .

We now use the same first step as in the proof of Theorem 4.10 but apply Lemma 4.8 instead of Lemma 4.6. Then, we obtain that

P [ f W ( t ) > m 0 α ] = P [ f W ( t ) − E [ f W ( t ) ∣ t ] > m 0 α − E [ f W ( t ) ∣ t ] , dist ( t , Λ ) ≤ δ n ] + P [ f W ( t ) − E [ f W ( t ) ∣ t ] > m 0 α − E [ f W ( t ) ∣ t ] , dist ( t , Λ ) > δ n ]

≤ P [ dist ( t , Λ ) ≤ δ n ] + P f W ( t ) − E [ f W ( t ) ∣ t ] > m 0 α − m 0 ( 2 ( 1 + ζ ) ) n e − π 2 τ 0 2 1 − 1 ζ dist ( t , Λ ) 2 , dist ( t , Λ ) > δ n ] ≤ P [ dist ( t , Λ ) ≤ δ n ] + P f W ( t ) − E [ f W ( t ) ∣ t ] > m 0 α − m 0 ( 2 ( 1 + ζ ) ) n ⁄ 2 e − π 2 τ 0 2 1 − 1 ζ δ n 2 = P [ dist ( t , Λ ) ≤ δ n ] + P [ f W ( t ) − E [ f W ( t ) ∣ t ] > m 0 α ⁄ 2 ] .

For the second term we can, as above, apply Hoeffding’s inequality. In order to estimate the first term, we again proceed similar to the proof of Theorem 4.10. That is, we have that

(23) P [ dist ( t , Λ ) ≤ δ n ] ≤ 1 det ( Λ ) Vol ( B δ n n ) ≤ 1 det ( Λ ) 2 π e n δ n n .

Moreover, by assumption (21),

(24) δ n = n π 2 τ 0 2 1 − 1 ζ − 1 1 2 log ( 2 ( 1 + ζ ) ) − log ( c ⁄ 2 ) n − 1 + 2 π 2 τ 0 2 σ 0 2 ( 1 + 2 n − 1 ⁄ 2 + ξ + 2 n − 1 + ξ ) ≤ n 1 2 π e det ( Λ ) ℓ 1 n .

Combining (23) and (24) yields that

P [ dist ( t , Λ ) ≤ δ n ] ≤ 1 ℓ .

This concludes the proof.□

4.4 Conclusion

In order for our strategy described in Section 2.3 to be successful we need that the probabilities on the left-hand side in (14) and (17) (or equivalently (22)) become small. This in turn is given if their respective upper bound, i.e., the respective right-hand sides in (14) and (17) (or equivalently (22)), become small.

In order to achieve this, we choose n and ℓ large enough so that the second terms on the right-hand sides in (14) and (17) (or equivalently (22)), respectively, become smaller than, say, 1/4. It remains to investigate the first terms on the right-hand sides in (14) and (17) (or equivalently (22)).

More precisely, we now need to show that, for some suitable ℓ ′ ≥ 4 , the following two conditions hold true.

e − 1 2 m 0 c 2 4 e − 4 π 2 ( n + 2 n 1 ⁄ 2 + ξ + 2 n ξ ) σ 0 2 τ 0 2 ≤ 1 ℓ ′ and e − 1 2 m 0 ( 1 − c ) 2 e − 4 π 2 ( n + 2 n 1 ⁄ 2 + ξ + 2 n ξ ) σ 0 2 τ 0 2 ≤ 1 ℓ ′ .

For that to be true, we derive the condition that

(25) 1 2 m 0 ≥ log ( ℓ ′ ) ⋅ max ( ( 1 − c ) − 2 , 4 c − 2 ) ⋅ e 4 π 2 ( n + 2 n 1 ⁄ 2 + ξ + 2 n ξ ) σ 0 2 τ 0 2 .

Obviously, the maximum in (25) is realized for c = 2 ⁄ 3 , i.e., max ( ( 1 − c ) − 2 , 4 c − 2 ) = 9 . Hence, we have the condition that

(26) m 0 ≥ log ( ℓ ′ ) ⋅ 18 ⋅ e 4 π 2 ( n + 2 n 1 ⁄ 2 + ξ + 2 n ξ ) σ 0 2 τ 0 2 .

This yields a lower bound on the necessary number of dual vectors for a successful dual attack.

Remark 4.12

Note that we can choose ℓ and ℓ ′ such that the probabilities computed in Theorems 4.9–4.11 vanish as n → ∞ . It even turns out that choosing ℓ and ℓ ′ in such a way only has a mild effect on the main results of this paper.

Indeed, as for ℓ ′ , its influence in (26) is of logarithmic order. Therefore, choosing ℓ ′ increasing in n (e.g., as ℓ ′ = n ) only has a mild effect on the total complexity of the attack determined by (26).

A similar observation is also true for ℓ . As shown in Section 4.5, its influence on the main results of this paper is given in the parameter constrains and is determined by a factor of the form ℓ 1 n . Now, due to the fact that this term is 1 + o ( 1 ) in both cases, if we choose ℓ = O ( 1 ) or if we, e.g., choose ℓ = n , we observe that choosing ℓ mildly increasing in n only has a mild effect on the parameter constraints of this paper; refer Section 4.5 for more details.

We omit the details concerning these aspects as they are dependent on the concrete scenario, where the results of this paper are used and can be adjusted easily to this scenario.

4.5 Parameter selection

It is important to ask, for which sets of parameters n , σ 0 , τ 0 , ξ , ℓ the results of this section hold true (recall that we already chose c = 2 ⁄ 3 ). We first note that the BDD-sample-result (Theorem 4.9) does not impose any further restrictions on the parameters. This is not the case for the results of Theorems 4.10 and 4.11 in the random-sample-case, where we need to consider conditions (16) and (21), respectively.

In this section, we investigate when these restrictions hold true. In doing so, we restrict our attention to the parameters τ 0 and σ 0 in the following. The reason is that the parameters c , ξ , ℓ do not have a significant influence, since (as n becomes large) they only appear in lower-order terms in conditions (16) and (21) as well as in the complexity estimate (26). Therefore, it is easy to find suitable parameters ξ , ℓ by mildly adjusting the dimension n .

Finally in this section, we investigate if and how the lower bound on m 0 given in (26) has implications on how realistic Heuristic 2 resembles the behavior of lattice reduction algorithms.

4.5.1 Intuition for the parameter τ 0

A very prominent role in this investigation will be played by the parameter τ 0 . Recall that this parameter, according to Heuristic 2, determines the distribution of the dual vectors. In particular, due to standard properties of the discrete Gaussian distribution, which are comparable to Lemma 4.4, this yields to the heuristic that, on average,

(27) ‖ w ‖ ≈ τ 0 n ,

for all w ∈ W . This intuition suggests to assume that, for some ϑ 0 > 1 , τ 0 is to be chosen as

(28) τ 0 n = ϑ 0 λ 1 ( Λ ˆ ) ≈ ϑ 0 1 2 π e det ( Λ ˆ ) 1 ⁄ n n ,

since we expect the output of the lattice reduction algorithms to have the length of a multiple of the shortest vector. This translates the search for suitable τ 0 to the search for appropriate ϑ 0 .

From our computations, we will find out that, in order for (16) or (21) to be fulfilled, we need a lower bound on ϑ 0 . In fact, one can construct special cases of lattices, where our formulas are not valid for small ϑ 0 . For example, if there is just one very small dual vector w 0 , the score function does not depend mainly on the length of e 0 but on the projection of e 0 onto span ( w 0 ) . However, we believe that the score function should still allow us to distinguish between the BDD-sample-case and the random-sample-case, even if ϑ 0 is small. It is left for future research to quantitatively investigate these cases.

We interpret our results, where we demand a lower bound on ϑ 0 , in the way that, on the one hand, an attacker can (or even must) rely on a more coarse lattice reduction algorithm, where ϑ 0 does not need to be so small that only the shortest dual vectors need to be sampled. While, on the other hand, ϑ 0 should not become too large, as the complexity estimate (26) increases exponentially in the parameter τ 0 (and thus in ϑ 0 ).

4.5.2 Intuition for the parameter σ 0

Another important task is to appropriately choose the parameter σ 0 . Also, here we have, according to Lemma 4.4, a similar heuristic, which states that, on average,

‖ e 0 ‖ ≈ σ 0 n .

Similarly as in (28), this suggests to assume that

(29) σ 0 n = θ 0 λ 1 ( Λ ) ≈ θ 0 1 2 π e det ( Λ ) 1 ⁄ n n ,

for some appropriate choice θ 0 > 0 . However, due to the fact that σ 0 determines the variance of the error distribution for a BDD-sample, we need to make the assumption 0 < θ 0 < 1 here.

4.5.3 Parameter choices for (16)

We first consider condition (16). We abbreviate

κ n = 1 − log ( c ⁄ 2 ) 2 π 2 τ 0 2 σ 0 2 n + 2 n − ( 1 ⁄ 2 − ξ ) + n − ( 1 − ξ ) .

Note that κ n = 1 + o ( 1 ) asymptotically in n . This transforms condition (16) into

1 τ 0 2 π + σ 0 κ n ≤ 1 2 π e det ( Λ ) ℓ 1 n .

With our choices (28) and (29), this becomes (by using that det ( Λ ˆ ) = det ( Λ ) − 1 )

det ( Λ ) 1 n e ϑ 0 2 π + θ 0 1 2 π e det ( Λ ) 1 ⁄ n κ n ≤ 1 2 π e det ( Λ ) ℓ 1 n ,

or equivalently,

(30) e ϑ 0 + θ 0 κ n ≤ 1 ℓ 1 n .

We immediately see that if ϑ 0 ≤ e , inequality (30) is violated (since σ 0 > 0 ). Then, one admissible choice that fulfills (30) (and thus, also (16)) would be

(31) ϑ 0 ≥ p e ℓ 1 n and θ 0 ≤ 1 q κ n ℓ 1 n

with p , q > 1 such that 1 p + 1 q = 1 . For example, we could choose p = q = 2 . We finally note that asymptotically in n (as also ℓ 1 n = 1 + o ( 1 ) ),

(32) ϑ 0 ≥ p e ( 1 + o ( 1 ) ) and θ 0 ≤ 1 q ( 1 + o ( 1 ) ) .

4.5.4 Parameter choices for (21)

We proceed similarly as for (16). We abbreviate

φ n = − log c 2 n − 1 + 2 π 2 τ 0 2 σ 0 2 ( 2 n − 1 ⁄ 2 + ξ + 2 n − 1 + ξ ) = − log c 2 n − 1 + ϑ 0 2 θ 0 2 2 e 2 ( 2 n − 1 ⁄ 2 + ξ + 2 n − 1 + ξ )

where we used (28), (29), and that det ( Λ ˆ ) = det ( Λ ) − 1 in the last step. Note that φ n = o ( 1 ) asymptotically in n . Then, again by using (28), (29), and det ( Λ ˆ ) = det ( Λ ) − 1 , condition (21) becomes

det ( Λ ) 1 n 2 π e ϑ 0 π 2 1 − 1 ζ − 1 1 2 log ( 2 ( 1 + ζ ) ) + ϑ 0 2 θ 0 2 2 e 2 + φ n ≤ 1 2 π e det ( Λ ) ℓ 1 n ,

or equivalently,

(33) 2 e 2 ϑ 0 2 1 − 1 ζ − 1 ( log ( 2 ( 1 + ζ ) ) + 2 φ n ) + 2 θ 0 2 1 − 1 ζ − 1 ≤ 1 ℓ 1 n .

Then, one admissible choice that fulfills (33) (and thus also (21)) would be

ϑ 0 ≥ ℓ 1 n p 2 e 2 1 − 1 ζ − 1 ( log ( 2 ( 1 + ζ ) ) + 2 φ n ) and θ 0 ≤ 1 q ℓ 1 n 1 2 1 − 1 ζ ,

with p , q > 1 such that 1 p + 1 q = 1 . Asymptotically in n , we obtain the conditions

(34) ϑ 0 ≥ p 2 e 2 1 − 1 ζ − 1 log ( 2 ( 1 + ζ ) ) ( 1 + o ( 1 ) ) θ 0 ≤ 1 q 1 2 1 − 1 ζ ( 1 + o ( 1 ) ) .

A suitable choice of ζ for the most prominent case p = q = 2 would be for example ζ = 3 . In this case, we obtain that

ϑ 0 ≥ 6 e 2 log ( 8 ) ( 1 + o ( 1 ) ) ≈ 3.532 e θ 0 ≤ 1 6 ( 1 + o ( 1 ) ) ≈ 0.408 .

which, for this particular choice of ζ , p , and q , shows that Approach 1 seems to induce better results. However, we decided to keep Approach 2 in this work as it has the potential for improvement by finding suitable replacements for the usage of rough inequalities like the Peter–Paul inequality.

4.5.5 Consequences of (26)

Recall that, following our findings in this section,

on the one hand, we need to sample at least a certain amount (given by the right-hand side of (26)) of dual vectors, and
on the other hand, following the heuristic according to (27), Heuristic 2 tacitly implies that the sampled dual vectors belong to B R n , the n -dimensional ball of radius R , where R = τ 0 n .

Hence, if ∣ B R n ∩ Λ ˆ ∣ is too small, it is very likely that, during the sampling process for our attack, some dual vectors are sampled more than once.

From a mathematical point of view, this implies no further issues as we modeled the output of the lattice reduction algorithm (via Heuristic 2) as a sequence of independent random variables and, as such, repetitions of the realizations of these random variables are admissible; see for instance coin tossing.

However, from a practical point of view, we have to make sure that Heuristic 2 indeed resembles the behavior of lattice reduction algorithms. Therefore, we have to take into account that, in practice, repeated sample outputs are usually discarded. This, of course, has the negative effect that in such scenarios the joint independence of the sample gets lost. This in turn implies that the independence assumption in Heuristic 2 does not resemble reality in situations, where many repetitions happen.

This is why we now investigate informally under which parameter selections repetitions can be avoided. In other words, we want to find out, when the condition

m 0 ≪ ∣ B R n ∩ Λ ˆ ∣

is fulfilled, while at the same time m 0 satisfies (26). In other words, we need that

(35) log ( ℓ ′ ) ⋅ 18 ⋅ e 4 π 2 ( n + 2 n 1 ⁄ 2 + ξ + 2 n ξ ) σ 0 2 τ 0 2 ≪ ∣ B R n ∩ Λ ˆ ∣ .

In order to quantify the right-hand side of (35), note that, according to Gaussian Heuristics,

∣ B R n ∩ Λ ˆ ∣ ≈ Vol ( B R n ) det ( Λ ˆ ) .

In particular, using (28) and similar arguments as in the proof of Theorem 4.10, this implies that

∣ B R n ∩ Λ ˆ ∣ ≈ Vol ( B R n ) det ( Λ ˆ ) ≈ ϑ 0 n .

As before, we now ignore the terms, which are (asymptotically in n ) of lower-order, as they can be considered by mildly adjusting n . Consequently, condition (35) becomes

e 4 π 2 n σ 0 2 τ 0 2 ≪ ϑ 0 n .

Using (28) and (29), this becomes

e n θ 0 2 ϑ 0 2 ⁄ e 2 ≪ ϑ 0 n ,

which in turn, for large n , is equivalent to the condition

(36) θ 0 2 ϑ 0 2 ⁄ e 2 < log ( ϑ 0 ) .

This imposes additional conditions on θ 0 and ϑ 0 .

4.5.6 Provable regime

To sum up, we found the following two sets of parameters, where our attacks work and are practically justified

Using Approach 1: The set of all θ 0 and ϑ 0 that fulfills (32) and (36).
Using Approach 2: The set of all θ 0 and ϑ 0 that fulfills (34) and (36).

We can simplify these conditions by plugging (32) into (36) (or, respectively, (34) into (36)) and ignoring terms that are of lower order in n . We obtain the following conditions.

For Approach 1: By (32) we choose ϑ 0 = p e and θ 0 = 1 q , so that (36) becomes
(37) 1 q 2 p 2 < log ( p ) + 1 .
This inequality holds true for many possible choices for p and q (recall that, in our derivation of (32), we needed to restrict to those p , q > 1 that satisfy 1 p + 1 q = 1 ). For example, we can choose p = q = 2 or even p = 2.363 , q = 2.363 ⁄ 1.363 . Note that in the latter case
θ 0 = 1.363 2.363 > 1 2 ,
which, to the best of our knowledge, is the first time that the provable regime is extended to the case where errors with norm larger than 1 2 λ 1 ( Λ ) are allowed [8, p. 11].
For Approach 2: Now, by (34), we choose
ϑ 0 2 = p 2 e 2 1 − 1 ζ − 1 log ( 2 ( 1 + ζ ) ) and θ 0 2 = 1 q 1 2 1 − 1 ζ ,
so that (36) becomes
(38) p q log ( 2 ( 1 + ζ ) ) < 1 2 log ( p ) + log ( 2 ) + 2 − log 1 − 1 ζ + log ( log ( 2 ( 1 + ζ ) ) ) .
Recall that, also here, we restrict to p , q > 1 such that 1 p + 1 q = 1 . (38) is, for example, fulfilled for p = q = 2 , ζ = 3 .

5 Dual attack – approach based on the conditional central limit theorem

In Section 4, we provided rigorous results for a quantification of the success of the dual attack. However, such rigorous results are always accompanied with many technicalities. Moreover, in order to establish the rigorous results, conservative assumptions and estimates (such as (16), (21), or (26)) are required, which, in many cases, can be relaxed in practice.

We believe that in order to have a full understanding of the subject, to find new results or to even find interdisciplinary connections, it is necessary to also intuitively and more directly understand the behavior of the investigated mathematical objects. This yields the need for a simpler and more intuitive approach to understand this attack, and to benefit from the fact there is no need to worry about mathematical provability of certain statements that can intuitively be justified but not mathematically. This is the goal of this section.

Hence, in this section, we aim to reprove the results from Section 4 via some intuitively justified heuristics and approximations. The strategy is based on a central limit theorem heuristic, which was also used, e.g., in [8]. We emphasize again that we believe that both approaches, the rigorous one from Section 4 and the intuitive one from this section, are essential for a full understanding of the subject.

We proceed as follows. We first provide some preparatory steps. Then, we estimate an important quantity that we need and then we compute the success probabilities both in the BDD-sample-case and in the random-sample-case. The mentioned heuristics are then backed in Section 6 by experiments.

5.1 Some preparation

5.1.1 Conditional central limit theorem

Our computations in this section depend on the following heuristic, which is justified later.

Heuristic 3

Suppose that we are given a sample t (which is either a BDD-sample or a random-sample). We assume that the scaled score function 1 m 0 f W ( t ) can be treated as a realization of the sum F ( t ) + ϕ ( t ) X ˜ , where t and X ˜ are independent random variables and

F ( t ) = E [ cos ( 2 π ⟨ t , w ⟩ ) ∣ t ] for some w ∈ W (recall that the family { w } w ∈ W is identically distributed), and
X ˜ is a Gaussian random variable with expectation value 0 and variance 1.
ϕ ( t ) 2 = 1 m 0 Var [ cos ( 2 π ⟨ t , w ⟩ ) ∣ t ] .

5.1.1.1 Representation of F ( t ) and the variance of X ˜

Before we justify Heuristic 3, we note that we can simplify the expressions in this heuristic as follows. Using property (5), we have that

(39) F ( t ) = E [ cos ( 2 π ⟨ t , w ⟩ ) ∣ t ] = E w ← D Λ ˆ , τ 0 2 π [ cos ( 2 π ⟨ t , w ⟩ ) ] .

And for the variance of X ˜ , we apply the identity cos 2 ( x ) = 1 ⁄ 2 ( 1 + cos ( 2 x ) ) (cf. [7, Proof of Lemma 4]) to obtain that

1 m 0 Var [ cos ( 2 π ⟨ t , w ⟩ ) ∣ t ] = 1 m 0 E w [ cos ( 2 π ⟨ t , w ⟩ ) 2 ∣ t ] − 1 m 0 E w [ cos ( 2 π ⟨ t , w ⟩ ) ∣ t ] 2 = 1 2 m 0 ( 1 + F ( 2 t ) − 2 F ( t ) 2 ) .

5.1.1.2 Justification of Heuristic 3

First note that by combining (9) and (39), we obtain that

E 1 m 0 f W ( t ) ∣ t = E w ← D Λ ˆ , τ 0 2 π [ cos ( 2 π ⟨ t , w ⟩ ) ] = F ( t ) .

Therefore, for all x ∈ R

P 1 m 0 f W ( t ) ≤ x = P 1 m 0 f W ( t ) − E 1 m 0 f W ( t ) ∣ t + E 1 m 0 f W ( t ) ∣ t ≤ x = P 1 m 0 ( f W ( t ) − E [ f W ( t ) ∣ t ] ) + F ( t ) ≤ x = E P 1 m 0 ( f W ( t ) − E [ f W ( t ) ∣ t ] ) + F ( t ) ≤ x ∣ t .

Then, by using the conditional central limit theorem formulated in the paper by Yuan et al. [17, Theorem 3.1], we have that 1 m 0 ( f W ( t ) − E [ f W ( t ) ∣ t ] ) converges, as m 0 → ∞ , in distribution to a normal distribution. We therefore have that, in distribution and for large enough m 0 ,

(40) P 1 m 0 f W ( t ) ≤ x ≈ E [ P [ ϕ ( t ) X ˜ + F ( t ) ≤ x ∣ t ] ] ,

where X ˜ is a Gaussian random variable with expectation value 0, variance 1, and

ϕ ( t ) 2 = 1 m 0 Var [ cos ( 2 π ⟨ t , w ⟩ ) ∣ t ] .

Then, by revoking the conditional probability measure, we have that

(41) E [ P [ ϕ ( t ) X ˜ + F ( t ) ≤ x ∣ t ] ] = P [ ϕ ( t ) X ˜ + F ( t ) ≤ x ] .

Combining (40) and (41) yields that 1 m 0 f W ( t ) can be treated as a realization of the sum F ( t ) + ϕ ( t ) X ˜ .

Another, maybe more elementary, way to understand the step from (40) to (41) is to note that the left-hand side of (41) is nothing but a double integral as in the situation of Fubini’s theorem, i.e.,

(42) E [ P [ ϕ ( t ) X ˜ + F ( t ) ≤ x ∣ t ] ] = ∫ ∫ 1 { ϕ ( s ) x ˜ + F ( s ) ≤ x } ρ X ˜ ( x ˜ ) d x ˜ ρ t ( s ) d s = ∫ 1 { ϕ ( s ) x ˜ + F ( s ) ≤ x } ρ X ˜ ( x ˜ ) ρ t ( s ) d ( x ˜ , s ) = P [ ϕ ( t ) X ˜ + F ( t ) ≤ x ] ,

where ρ X ˜ and ρ t denote the probability density functions of the random variables X ˜ and t , respectively.

Moreover, from the representation in (42), it is visible that the right-hand side in (41) is equal to the same expression but by taking random variables F ( t ) and X ˜ that are, in addition, assumed to be independent. We can do this as X ˜ does not depend on the random variable t . This justifies Heuristic 3.

Another justification for Heuristic 3 is that the experiments in Section 6 confirm the soundness of this assumption. Moreover, we note that Heuristic 3 is also used in the paper by Ducas and Pulles [8], where it was also justified by experimental evidence.

5.2 Estimating F ( t )

The expectation on the right-hand side of (39) is given by

(43) F ( t ) = ∑ w ∈ Λ ˆ cos ( 2 π ⟨ t , w ⟩ ) e − ‖ w ‖ 2 ⁄ ( 2 τ 0 2 ) ∑ w ∈ Λ ˆ e − ‖ w ‖ 2 ⁄ ( 2 τ 0 2 ) .

The denominator and the numerator in the quotient (43) can be expressed as a sum over the lattice Λ by using the Poisson summation formula. Therefore, we further derive

(44) F ( t ) = ∑ z ∈ Λ e − 2 π 2 τ 0 2 ‖ z − t ‖ 2 ∑ z ∈ Λ e − 2 π 2 τ 0 2 ‖ z ‖ 2 = e − 2 π 2 τ 0 2 ‖ t ‖ 2 + ∑ 0 ≠ z ∈ Λ e − 2 π 2 τ 0 2 ‖ z − t ‖ 2 1 + ∑ 0 ≠ z ∈ Λ e − 2 π 2 τ 0 2 ‖ z ‖ 2 .

In typical cases, we can expect that the value of ∑ 0 ≠ z ∈ Λ e − 2 π 2 τ 0 2 ‖ z ‖ 2 is very small. The paper by Banaszczyk [15, Lemma (1.5, (i))] gives a concrete bound for this value.

Lemma 5.1

Let c = 2 π n τ 0 λ 1 ( Λ ) and assume that c ≥ 1 2 π . We set C = c 2 π e ⋅ e − π c 2 . Then, we have the bound

∑ 0 ≠ z ∈ Λ e − 2 π 2 τ 0 2 ‖ z ‖ 2 ≤ C n ⁄ ( 1 − C n ) .

Proof

The smallest non-trivial vector of the stretched lattice 2 π τ 0 2 Λ is of length c n and the lemma from the paper by Banaszczyk [15, Lemma (1.5, (i))] applies.□

In the following, we simplify our computations by assuming the following heuristic.

Heuristic 4

We have that

(45) F ( t ) ≈ e − 2 π 2 τ 0 2 ‖ t ‖ 2 + ∑ 0 ≠ z ∈ Λ e − 2 π 2 τ 0 2 ‖ z − t ‖ 2 .

That is, ∑ 0 ≠ z ∈ Λ e − 2 π 2 τ 0 2 ‖ z ‖ 2 can be neglected in equation (44) for computing F ( t ) .

5.2.1 Justification of Heuristic 4

Recall the Gaussian heuristic Heuristic 1. And recall the definition of ϑ 0 in (28), which is determined through the equation

(46) τ 0 = ϑ 0 1 2 π e det ( Λ ) − 1 ⁄ n .

If ϑ 0 ≥ e , we can apply Lemma 5.1 with c = ϑ 0 e 2 π ≥ 1 2 π . As an example, for ϑ 0 ≥ 4 , (resp. ϑ 0 ≥ 5 ), we have the concrete bound C n ⁄ ( 1 − C n ) with C = 0.82 , (resp. C = 0.56 ). Note that in (44), we just need the approximation 1 + ∑ 0 ≠ z ∈ Λ e − 2 π 2 τ 0 2 ‖ z ‖ 2 ≈ 1 .

We continue the estimation of F ( t ) by splitting into the two cases (the BDD-sample-case and the random-sample-case).

5.2.2 Estimating F ( t ) in the BDD-sample-case

We now rely on the paper by Banaszczyk [15, Lemma (1.5, (ii))] for a simplification of the numerator of F ( t ) .

Lemma 5.2

Let c = 2 π n τ 0 ( λ 1 ( Λ ) − ‖ t ‖ ) and assume that c ≥ 1 2 π . We set C = c 2 π e ⋅ e − π c 2 . Then, we have the bound

∑ 0 ≠ z ∈ Λ e − 2 π 2 τ 0 2 ‖ z − t ‖ 2 ≤ 2 C n ∑ z ∈ Λ e − 2 π 2 τ 0 2 ‖ z ‖ 2 .

Proof

For every z ∈ Λ , z ≠ 0 , we certainly have ‖ z + t ‖ ≥ λ 1 ( Λ ) − ‖ t ‖ . We can apply the paper by Banaszczyk [15, Lemma (1.5, (ii))] to the stretched lattice 2 π τ 0 2 Λ .□

This lemma leads us to the following heuristic.

Heuristic 5

Suppose that the BDD-sample t = v + e 0 is such that ‖ e 0 ‖ is much smaller than λ 1 ( Λ ) , the length of the shortest vector in Λ . Then, approximately, for large enough n ,

F ( t ) ≈ e − 2 π 2 τ 0 2 ‖ e 0 ‖ 2 .

5.2.1.1 Justification of Heuristic 5

From Lemma 4.5, we already know that F ( t ) is greater or equal to e − 2 π 2 τ 0 2 ‖ e 0 ‖ 2 . It remains to show the other bound. Recall that F ( t ) = F ( v + e 0 ) = F ( e 0 ) for any v ∈ Λ (by using the same arguments as in the proof of Lemma 4.5). Then, by using (45),

(47) F ( t ) = F ( e 0 ) ≈ e − 2 π 2 τ 0 2 ‖ e 0 ‖ 2 + ∑ 0 ≠ z ∈ Λ e − 2 π 2 τ 0 2 ‖ z − e 0 ‖ 2 .

We now show, by applying Lemma 5.2, that the second term on the right-hand side of (47) is negligible with respect to the first term. Let c = 2 π n τ 0 ( λ 1 ( Λ ) − ‖ e 0 ‖ ) . Then, by using the parameters θ 0 and ϑ 0 from Section 4.5 and the same arguments as in Section 4.5 we obtain that

c = ϑ 0 e 2 π λ 1 ( Λ ) − ‖ e 0 ‖ λ 1 ( Λ ) ≈ ϑ 0 e 2 π λ 1 ( Λ ) − θ 0 λ 1 ( Λ ) λ 1 ( Λ ) = ϑ 0 e 2 π ( 1 − θ 0 ) .

For the corresponding parameter C from Lemma 5.2, we obtain that

C = c 2 π e ⋅ e − π c 2 ≈ ϑ 0 ( 1 − θ 0 ) e e − ϑ 0 2 ( 1 − θ 0 ) 2 ⁄ ( 2 e 2 ) .

By assuming that c ≥ 1 2 π , we conclude (via Lemma 5.2) that the second term on the right-hand side of (47) is bounded from above by

(48) 2 ϑ 0 n ( 1 − θ 0 ) n e n e − n ϑ 0 2 ( 1 − θ 0 ) 2 ⁄ ( 2 e 2 ) ∑ z ∈ Λ e − 2 π 2 τ 0 2 ‖ z ‖ 2 ≈ 2 ϑ 0 n ( 1 − θ 0 ) n e n e − n ϑ 0 2 ( 1 − θ 0 ) 2 ⁄ ( 2 e 2 ) ,

where we used in the last step that, according to Heuristic 4, the term ∑ 0 ≠ z ∈ Λ e − 2 π 2 τ 0 2 ‖ z ‖ 2 can be neglected. In particular, we implicitly assumed here that ϑ ≥ e in order to apply Heuristic 4.

We need to compare bound (48) to the first term on the right-hand side in (47). Using similar arguments as the ones that led to (48), we can show for the first term on the right-hand side in (47) that

e − 2 π 2 τ 0 2 ‖ e 0 ‖ 2 ≈ e − n ϑ 0 2 θ 0 2 ⁄ ( 2 e 2 ) .

To sum up, we need the following conditions to hold true in order to justify Heuristic 5:

(49) 2 ϑ 0 n ( 1 − θ 0 ) n e n e − n ϑ 0 2 ( 1 − θ 0 ) 2 ⁄ ( 2 e 2 ) ≪ e − n ϑ 0 2 θ 0 2 ⁄ ( 2 e 2 ) and ϑ 0 e 2 π ( 1 − θ 0 ) ≥ 1 2 π .

Note that the second condition in (49) is equivalent to

(50) ϑ 0 ≥ e ( 1 − θ 0 ) ,

which implicitly contains the condition that ϑ 0 ≥ e . As an example, for ϑ 0 = 4 (resp. ϑ 0 = 5 ) and θ 0 ≤ 1 ⁄ 6 , (resp. θ 0 = 1 ⁄ 4 ), condition (49) is fulfilled for n large enough. This justifies Heuristic 5.

We conclude that Z ( e 0 ) ≔ e − 2 π 2 τ 0 2 ‖ e 0 ‖ 2 is a valid approximation of F ( e 0 ) . The distribution function of Z can be explicitly computed as

t ↦ c 0 ( − ln ( t ) ) n ⁄ 2 − 1 t − 1 ⁄ ( 2 γ 0 ) − 1 ,

with a certain real number c 0 and t ∈ [ 0 , 1 ] and where

(51) γ 0 = − 2 π 2 σ 0 2 τ 0 2 .

If ∣ γ 0 ∣ is small, then the distribution function looks roughly like a Gaussian function. If ∣ γ 0 ∣ > 1 2 , the exponent of t is negative and the distribution function looks completely different.

Furthermore, we can compute the expectation value and the variance of Z ( e 0 ) . Similar computations as in Appendix A yield that its expectation is given by ( 1 − 2 γ 0 ) − n ⁄ 2 and its variance by

( 1 − 4 γ 0 ) − n ⁄ 2 − ( 1 − 2 γ 0 ) − n .

5.2.3 Estimating F ( t ) in the random-sample-case

In the random-sample-case, it is difficult to find a closed formula for F ( t ) . Therefore, we analyze its properties as a random variable. In particular, we now estimate its expectation and its variance. Note that here we compute the expectations with respect to t .

Lemma 5.3

Let t be a random-sample. Then,

E [ F ( t ) ] = 1 2 π τ 0 2 n det ( Λ ) ∑ z ∈ Λ e − 2 π 2 τ 0 2 ‖ z ‖ 2

and

E [ F ( t ) 2 ] = ∑ z ∈ Λ e − π 2 τ 0 2 ‖ z ‖ 2 2 n π τ 0 2 n det ( Λ ) ∑ z ∈ Λ e − 2 π 2 τ 0 2 ‖ z ‖ 2 2 .

Proof

Since ⟨ t , w ⟩ mod 1 is equally distributed on [ − 1 2 , 1 2 ] for fixed w ≠ 0 (cf. Section 3.2) we can compute (via elementary properties of the conditional expectation) the expectation value of F ( t ) as

E [ F ( t ) ] = E [ E [ cos ( 2 π ⟨ t , w ⟩ ) ∣ t ] ] = E [ cos ( 2 π ⟨ t , w ⟩ ) ] = E [ E [ cos ( 2 π ⟨ t , w ⟩ ) ∣ w ] ] = E [ E [ cos ( 2 π ⟨ t , w ⟩ ) ∣ w ] 1 w ≠ 0 ] + E [ E [ cos ( 2 π ⟨ t , w ⟩ ) ∣ w ] 1 w = 0 ] = E E u ← $ [ − 1 2 , 1 2 ] [ cos ( 2 π u ) ] 1 w ≠ 0 + P [ w = 0 ]

= E u ← $ [ − 1 2 , 1 2 ] [ cos ( 2 π u ) ] P [ w ≠ 0 ] + 1 ∑ w ∈ Λ ˆ e − ‖ w ‖ 2 ⁄ ( 2 τ 0 2 ) = 1 ∑ w ∈ Λ ˆ e − ‖ w ‖ 2 ⁄ ( 2 τ 0 2 ) = 1 2 π τ 0 2 n det ( Λ ) ∑ z ∈ Λ e − 2 π 2 τ 0 2 ‖ z ‖ 2 ,

where we used the Poisson summation formula in the last step.

Note that

∑ w , w ′ ∈ Λ ˆ cos ( 2 π ⟨ t , w ⟩ ) cos ( 2 π ⟨ t , w ′ ⟩ ) e − ‖ w ‖ 2 ⁄ ( 2 τ 0 2 ) e − ‖ w ′ ‖ 2 ⁄ ( 2 τ 0 2 ) = 1 2 ∑ w , w ′ ∈ Λ ˆ [ cos ( 2 π ⟨ t , w + w ′ ⟩ ) + cos ( 2 π ⟨ t , w − w ′ ⟩ ) ] e − ‖ w ‖ 2 ⁄ ( 2 τ 0 2 ) e − ‖ w ′ ‖ 2 ⁄ ( 2 τ 0 2 ) .

This property allows us to compute the second moment of F ( t ) as

∑ w ∈ Λ ˆ e − 2 ‖ w ‖ 2 ⁄ ( 2 τ 0 2 ) ∑ w ∈ Λ ˆ e − ‖ w ‖ 2 ⁄ ( 2 τ 0 2 ) 2 .

Again, we use the Poisson summation formula for the denominator and numerator of this quotient, which gives the claim.

□ E [ F ( t ) 2 ] = π τ 0 2 n det ( Λ ) ∑ z ∈ Λ e − π 2 τ 0 2 ‖ z ‖ 2 2 π τ 0 2 n det ( Λ ) ∑ z ∈ Λ e − 2 π 2 τ 0 2 ‖ z ‖ 2 2 = ∑ z ∈ Λ e − π 2 τ 0 2 ‖ z ‖ 2 2 n π τ 0 2 n det ( Λ ) ∑ z ∈ Λ e − 2 π 2 τ 0 2 ‖ z ‖ 2 2 .

This lemma leads us to the following heuristic.

Heuristic 6

Let t be a random-sample. In typical cases, we can expect that the distribution of F ( t ) is extremely close to 0 since both the expectation value and the standard deviation are negligible. Recall the definition of ϑ 0 in (46). Then, approximately,

E [ F ( t ) ] ≈ ϑ 0 e − n and E [ F ( t ) 2 ] ≈ ϑ 0 2 e − n .

5.2.2.1 Justification of Heuristic 6

We assume as in Heuristic 4 that we can neglect terms z ≠ 0 in the series in Lemma 5.3, which gives

E [ F ( t ) ] ≈ 1 2 π τ 0 2 n det ( Λ )

and

E [ F ( t ) 2 ] = 1 2 n π τ 0 2 n det ( Λ ) .

Again as above, we want to assume that the Gaussian Heuristic 1 is valid for both lattices Λ , Λ ˆ and that τ 0 n is not too small a multiple ϑ 0 of the shortest vector in Λ ˆ as in (46). Then, the expectation value of F ( t ) is of size

1 2 π τ 0 2 n det ( Λ ) = ϑ 0 e − n ,

and the second moment is of size

1 4 π τ 0 2 n det ( Λ ) = ϑ 0 2 e − n .

5.3 Parameter selection for assuming the Heuristics

We want to assume that the previous heuristics are valid, i.e., Heuristic 3, Heuristic 4, and Heuristic 5. Recall the definition of ϑ 0 in (46). Moreover, following the reasoning of Section 4.5.5, we will also assume an upper bound for m 0 . That is, we assume that the size m 0 of W is much smaller compared to the number of all lattice vectors of length ≤ τ 0 n . Here with respect to the conditions derived in Section 4.5.5, we even make the stronger assumption that

(52) 2 m 0 ≤ ( ϑ 0 ⁄ 2 ) n .

This stronger condition implies in particular that in the random sample case (cf. Heuristic 4),

ϑ 0 2 e − n ≪ ϑ 0 2 − n ≤ 1 2 m 0 .

Therefore, in the random-sample case, we can approximate 1 m 0 f W ( t ) by

F ( t ) + ϕ ( t ) X ˜ ≈ ϕ ( t ) X ˜ ≈ 1 2 m 0 X ˜ .

In the BDD-sample case, we expect (cf. Heuristic 5)

F ( t ) ≈ e − 2 π 2 τ 0 2 ‖ e 0 ‖ 2 ≈ e − n θ 0 2 ϑ 0 2 ⁄ ( 2 e 2 ) .

As an example, for θ 0 = 1 ⁄ 6 and ϑ 0 = 4 , (resp. θ 0 = 1 ⁄ 4 and ϑ 0 = 5 ), we have e − θ 0 2 ϑ 0 2 ⁄ ( 2 e 2 ) ≈ 0.97 , (resp. e − θ 0 2 ϑ 0 2 ⁄ ( 2 e 2 ) ≈ 0.90 ). For n ≥ 100 , we therefore expect that F ( t ) is very small compared to 1, so that in Heuristic 3 ϕ ( t ) 2 can be approximated by 1 2 m 0 .

5.4 Success probabilities

We want to distinguish the cases “random-samples vs BDD-samples” with good probability by checking if the score is higher or lower than a certain value α . If we have good approximations for the distribution of F ( e 0 ) , we can compute numerically a condition on m 0 based on the heuristics above. In the following, we want to derive a simple formula that gives us a plausible condition on m 0 . To this end, we assume that the approximations in the previous sections are valid. As argued in Section 5.3, we further approximate ϕ ( t ) 2 by 1 2 m 0 in Heuristic 3. We set X = 1 2 m 0 X ˜ .

Following the arguments of Section 5.3, if t is chosen uniformly in R n ⁄ Λ , we assume that F ( t ) is extremely small compared to X . Therefore, we approximate

(53) P 1 m 0 ∑ w ∈ W cos ( 2 π ⟨ t , w ⟩ ) ≤ α ∣ case “random-samples” = P ( F ( t ) + X ≤ α ∣ case “random-samples” ) ≈ P ( X ≤ α ) .

We therefore choose for simplicity

α = μ 0 2 m 0 ,

with a certain number μ 0 ≥ 1 , say μ 0 ∈ { 2 , 3 , 4 } .

On the other hand, in the BDD-sample-case, we want that

P 1 m 0 ∑ w ∈ W cos ( 2 π ⟨ e 0 , w ⟩ ) ≥ α ∣ case “BDD-samples”

is notably larger than 0.5 (note that we tacitly used that ⟨ t , w ⟩ = ⟨ v + e 0 , w ⟩ = ⟨ e 0 , w ⟩ , since v ∈ Λ and w is a dual vector). We consider

P 1 m 0 ∑ w ∈ W cos ( 2 π ⟨ e 0 , w ⟩ ) ≥ α ∣ case “BDD-samples” = P ( F ( e 0 ) + X ≥ α ∣ case “BDD-samples” ) ≈ P ( Z ( e 0 ) + X ≥ α ) .

We know the distribution function of Z and X , so that the probability P ( Z + X ≥ α ) can be computed numerically by a two-dimensional integral, since Heuristic 3 gives us independence of F ( t ) and X . However, here we want to derive a rough estimate on m 0 that gives us a simple formula. Since the standard deviation of X is equal to 1 2 m 0 = α μ 0 , we can approximate P ( Z + X ≥ α ) by P ( Z ≥ α ) for moderate μ 0 . We further compute

P ( Z + X ≥ α ) ≈ P ( Z ≥ α ) = P ( e γ 0 ‖ e 0 ‖ 2 ⁄ σ 0 2 ≥ α ) = P ‖ e 0 ‖ 2 ⁄ σ 0 2 ≤ ln ( α ) γ 0 ,

‖ e 0 ‖ 2 ⁄ σ 0 2 is χ 2 -distributed. If we choose

n + 2 n ≤ ln ( α ) γ 0 ⇔ α ≤ e γ 0 ( n + 2 n )

we obtain a “good” probability for P ( Z ≥ α ) . (For n ≥ 50 , this probability of the χ 2 -distribution is very near to 0.84.) In the end, we derive the condition

(54) e γ 0 ( n + 2 n ) ≥ μ 0 2 m 0 ⇔ 2 m 0 ≥ μ 0 2 e − 2 γ 0 ( n + 2 n ) = μ 0 2 e 4 π 2 σ 0 2 τ 0 2 ( n + 2 n ) .

Remark 5.4

Note the resemblance of condition (54) with condition (26) identified in the rigorous analysis. More precisely, the leading-order term on the right-hand side of (54) is given by
e − 2 γ 0 ( n + 2 n ) = e 4 π 2 σ 0 2 τ 0 2 ( n + 2 n ) ,
while the leading-order term on the right-hand side of (26) is given by
e 4 π 2 ( n + 2 n 1 ⁄ 2 + ξ + 2 n ξ ) σ 0 2 τ 0 2 ,
which only differs in the lower-order terms in the exponent. These differences are due to the fact that, in our rigorous approach, we needed to apply quite coarse concentration inequalities.
The conditions (26) or (54) are also very similar to the usual condition on m 0 computed under the independence heuristic. The main difference, ignoring μ 0 ≥ 1 , is a slightly higher number for the number of vectors m 0 due to the new term n + 2 n instead of n , which results in an additional factor of the form e − 2 γ 0 2 n .
Recall the condition formulated in (52), so that we have to consider
( ϑ 0 ⁄ 2 ) n ≥ 2 m 0 ≥ μ 0 2 e 4 π 2 σ 0 2 τ 0 2 ( n + 2 n ) ≈ μ 0 2 e 2 ( n + 2 n ) θ 0 2 ϑ 0 2 ⁄ ( 2 e 2 ) .
For typical parameter sets as θ 0 = 1 ⁄ 6 , ϑ 0 = 4 , (resp. θ 0 = 1 ⁄ 4 , ϑ 0 = 5 ), this condition can be fulfilled. For instance, in the case ϑ 0 = 4 , the upper bound on m 0 is given by 2 n − 1 , which for typical values for n becomes very large and hence, is not restrictive in practice.

6 Experiments

In order to check the reasonability of our heuristics we conducted a few experiments. In our experiments we chose the lattice Λ as generated by the columns of the n × n -matrix B with

B = 1 n ⁄ 2 0 C p 1 n ⁄ 2 ,

where p is the prime number 2 16 + 1 , C is a n ⁄ 2 × n ⁄ 2 -matrix with integer entries chosen uniformly between 0 and p − 1 . The dual lattice Λ ˆ is generated by the columns of the matrix

1 p p 1 n ⁄ 2 − C T 0 1 n ⁄ 2 .

In our experiments, we did not consider concrete lattice reduction algorithms for generating the subset W ⊂ Λ ˆ . Instead, we used a setting such that the distribution in W should be near to a Gaussian distribution.

We emphasize that we do not aim at an efficient implementation of our attacks here. Instead, we rather aim for a simple implementation, which resembles the theoretic setting of our paper, with the goal to practically verify the soundness of our heuristics.

6.1 Experiments for small n

For small n we chose the distribution in W by the following random process over m 0 repetitions:

Choose z 1 as realization of a Gaussian variable with expectation value 0 and covariance matrix τ 1 1 n .
Compute w as the closest vector in Λ ˆ to z 1 .

Note that we had to restrict ourselves to small n for being able to compute step 2 in reasonable time. Using the above notations, we further chose

n = 30 , m 0 = 5,000 , σ 0 = 18 , ϑ 0 = 4 τ 1 = ϑ 0 2 π e det ( Λ ) − 1 ⁄ n = ϑ 0 2 π e p = 0.0038 .

We can expect that the average of the length w j generated by this process is slightly larger than τ 1 n and we obtained in our experiments

1 m 0 ∑ w ∈ W ‖ w ‖ 2 = τ 0 2 n , where τ 0 = 0.0039 .

In Figure 1 (a), we computed the score function 1 m 0 ∑ w ∈ W cos ( 2 π ⟨ e 0 , w ⟩ ) for 800 repetitions in e 0 , where e 0 is the realization of a Gaussian random variable with expectation value 0 and covariance matrix σ 0 1 n . For the sake of comparison, we also include in Figure 1 (a) a red colored graph, which represents the distribution of the sum F ( e 0 ) + X with two independent random variables e 0 and X as in Heuristic 3 (and where X was introduced in Section 5.4).

$Figure 1 Experiments for small n n . (a) Score function, e 0 {e}_{0} variable and (b) score function, random-sample.$

Figure 1

Experiments for small n . (a) Score function, e 0 variable and (b) score function, random-sample.

In Figure 1 (b), we compute the score function 1 m 0 ∑ w ∈ W cos ( 2 π ⟨ t , w ⟩ ) for 800 repetitions in t , where t is chosen as a random-sample. As expected, this distribution is close to a Gaussian distribution with standard deviation 0.01, which is represented by the red colored graph in Figure 1 (b).

6.2 Experiments for moderate n

For moderate n , we generated a set W ˜ of many vectors in Λ ˆ that have length near ϑ 0 -times the shortest vector expected by the Gaussian heuristic. Then, we chose the distribution in W ˜ by the following ad hoc random process over m 0 repetitions:

Choose z 1 as realization of a Gaussian variable with expectation value 0 and covariance matrix τ 0 1 n .
Compute w as the closest vector from W ˜ to z 1 .

Let W be the collection of all these outputted w .

We were able to conduct experiments in this way for n = 70 . Using the above notations, we further chose

n = 70 , # W ˜ = 100 ⋅ m 0 , m 0 = 500 , σ 0 = 10 , ϑ 0 = 4 , τ 0 = ϑ 0 2 π e det ( Λ ) − 1 ⁄ n = ϑ 0 2 π e p = 0.0038 .

In our experiments, we observed that the average of the length w j generated by this process was very close to τ 0 n .

We computed the score function 1 m 0 ∑ w ∈ W cos ( 2 π ⟨ e 0 , w ⟩ ) for 800 repetitions in e 0 , where e 0 is the realization of a Gaussian random variable with expectation value 0 and covariance matrix σ 0 1 n . Furthermore, we compared this with the score function 1 m 0 ∑ w ∈ W ˆ cos ( 2 π ⟨ e 0 , w ⟩ ) for 800 repetitions in e 0 , where

e 0 is again the realization of a Gaussian random variable with expectation value 0 and covariance matrix σ 0 1 n , and
we chose w ∈ W ˆ not in the dual lattice, but as a realization of a continuous Gaussian random variable with expectation value 0 and covariance matrix τ 0 1 n .

The reason why we also computed this additional comparison is due to the observation that in our theoretic results, we never used the geometry of the lattice. Consequently, intuition tells us that the results should be similar, and we would like to see, whether this intuition is sound.

Figure 2 (a) shows both score functions in one figure. The distributions are similar, but we note a slight difference in the probabilities for larger F ( e 0 ) . As above, we also include in Figure 2 (a), a red colored graph, which represents the distribution of the sum F ( e 0 ) + X . As expected, the distributions in Figure 2 (a) are close to this theoretical distribution.

$Figure 2 Experiments for moderate n n . (a) Score function, e 0 {e}_{0} variable, yellow corresponds to W W and blue to W ˆ \hat{W} and (b) score function, random-sample.$

Figure 2

Experiments for moderate n . (a) Score function, e 0 variable, yellow corresponds to W and blue to W ˆ and (b) score function, random-sample.

In Figure 2(b), we compute the score function 1 m 0 ∑ w ∈ W cos ( 2 π ⟨ t , w ⟩ ) for 800 repetitions in t , where t is chosen as a random-sample. As expected, this distribution is close to a Gaussian distribution with standard deviation 1 2 m 0 ≈ 0.03 , which is represented by the red colored graph in this figure.

7 Discussion

7.1 Comparison with other recent works

As pointed out in Section 1 of this paper, there is a lot of recent progress and research on the dual attack for BDD or LWE. Most notably with respect to our results are the two interesting and impressive works by Pouly and Shen [9] and Ducas and Pulles [8]. We now briefly compare their results with our results in this work.

7.1.1 Results from the paper by Pouly and Shen [9]

The first obvious difference between our work and the work of [9] is that they studied LWE, while we studied BDD. However, it is possible to translate the results into the other respective framework by using the arguments from the paper by Pouly and Shen [9, Section 6]. The obvious similarity to our work is that they also assumed that the dual vectors are distributed according to the discrete Gaussian distribution.

Our approach in Section 4, which is based on a direct application of Hoeffding’s inequality, is inspired by the paper by Pouly and Shen [9]. However, we use their technique for the conditional probability measure, while they use it for the original probability measure. This is why they apply this strategy only in their “basic dual attack” where dual vectors are sampled afresh for many different samples (in order to obtain independence, which is required to apply Hoeffding’s inequality), while we use it for the case when there is only one sample and one family of dual vectors (i.e., the usual dual attack).

The main result in the paper by Pouly and Shen [9] is their “modern dual attack”. First note that via this attack, the authors solve the search version of LWE (where the secret is to be extracted directly) and not the decision version (as in our case). The main difference to our approach is that they do not choose the secret according to a single evaluation of the score function and checking whether it is above or below a threshold. What they do is that they evaluate many score functions and choose the candidate with the maximum score as the secret. We believe that this approach is more precise than ours, since in the approach by Pouly and Shen [9], they do not need to choose a threshold or rely on concentration inequalities. However, we did not rigorously translated our results into their setting in order to make a precise comparison. They also relied on geometric properties of the lattice in order to estimate the success probabilities of their attack, which is more involved than the strategy based on Hoeffding’s inequality.

7.1.2 Results from the paper by Ducas and Pulles [8]

In the other independent work by Ducas and Pulles [8], similar results as in the present paper were obtained. The work by Ducas and Pulles [8] is a follow-up of their previous paper [7] and builds upon the observations made there. Also, our work can be seen as being initiated by the observations from the paper by Ducas and Pulles [7], our setting and notation is very similar to that in the paper by Ducas and Pulles [8]. However, a key difference between their work and the results of this paper is the assumption on the output of the dual lattice sieve algorithm. While we assume that the obtained dual vectors are distributed according to a discrete Gaussian distribution on the dual lattice, Ducas and Pulles [8] assume that the dual vectors are uniformly distributed on a ball of a certain radius.

The strategy in the paper by Ducas and Pulles [8] is, as in Section 5, based on a central limit theorem-type argument. In this way, they derive estimates on the cumulative distribution function of the score function. Besides, in their justification of their central limit heuristic ([8, Heuristic 3]), they also implicitly apply conditional independence in their computations in the paper by Ducas and Pulles [8, Section 3.3] by changing from a spherical error distribution to a radial error distribution.

Their theoretic results are accompanied by impressive experimental results, which underline and substantiate the former.

7.2 Future work

There are a lot of directions for improvements and future research with respect to our results in this paper. In the following, we would like to emphasize some of them.

What happens in the asymptotic case, when m 0 is extremely large? In (53), we assume that F ( t ) is extremely small compared to X , if e 0 is chosen uniformly in R n ⁄ Λ . This is certainly not true, when m 0 is extremely large. Instead, we consider
P 1 m 0 ∑ w ∈ W cos ( 2 π ⟨ t , w ⟩ ) ≤ α ∣ case “random-samples” = P ( F ( t ) + X ≤ α ∣ case “random-samples” ) ≈ P ( F ( t ) ≤ α ∣ case “random-samples” ) .
If we assume that the approximations in Section 5 are valid, we have different distributions for F ( t ) in both cases. This allows us to distinguish the cases “random-samples vs BDD-samples” with certain fixed probabilities that do not depend on m 0 .
A natural question arises: What are good weights in the formula for the total score taking into account the approach above? We therefore consider weights β w in
f β ( t ) = ∑ w ∈ W β w cos ( 2 π ⟨ t , w ⟩ ) .
We can adapt the formulas from Section 5 if we restrict ourselves by choosing
β w = e − γ 0 ‖ w ‖ 2 .
In this way, one can find an optimal γ 0 that gives a certain lower condition on m 0 compared to (5). Also, the results from Section 4 should be easily transferable to this new score function. This is left for future research.
It is very interesting to see whether Heuristic 2 can be modified or generalized in order to be more suitable for modern lattice sieve algorithms. For instance, as we pointed out after Heuristic 2, we believe that the most realistic assumption, which resembles the behavior of the output of the lattice reduction algorithms most suitably, would be the case, where the dual vectors are uniformly distributed on ( B r n ∩ Λ ˆ ) ⁄ B r ′ n for some r > r ′ > 0 .
As outlined in Section 4.5, our theoretic results demand a lower bound on ϑ 0 . It would be interesting to also find quantitative results on the success probabilities and the cost estimates for our attacks in the cases, where ϑ 0 is small.
The main purpose of our experiments from Section 6 is the verification of the soundness of Heuristic 3, which in turn (via the computations from Section 5) verifies the soundness of the results from Section 4. In that sense, the main focus of this work is the theoretical part. Hence, there are many possible ways to further conduct the experimental aspects related to this paper. In particular, it would be interesting
1. to see how good Heuristic 2 resembles the behavior of concrete lattice reduction algorithms, or
2. to check whether the choice, where the dual vectors are uniformly distributed on ( B r n ∩ Λ ˆ ) ⁄ B r ′ n , is indeed the most suitable case.

Acknowledgment

We gratefully acknowledge helpful discussions with Léo Ducas, Stephan Ehlen, and Ludo N. Pulles. We moreover would like to express our gratitude toward the anonymous referees for numerous helpful suggestions and remarks.

Funding information: Authors state no funding involved.
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
Conflict of interest: The authors state no conflict of interest.

Appendix A Some approximations of the covariances in the BDD-sample-case

In this section, we provide some heuristic estimates, which underline that the covariances computed in Section 3.3 are indeed nonzero.

Our goal is to intuitively estimate the sum over all single covariances

∑ w , w ˜ ∈ W , w ≠ w ˜ Cov ( f w ( t ) , f w ˜ ( t ) ) .

Instead of computing this sum via (4) directly, we aim to find plausible approximations that give simple formulas. Recall that m 0 = ∣ W ∣ , and note that

(A1) 1 m 0 2 ∑ w , w ˜ ∈ W , w ≠ w ˜ Cov ( f w ( t ) , f w ˜ ( t ) ) ,

can be interpreted as a computation of a mean value (by using that m 0 2 − m 0 ≈ m 0 2 is the approximate number of summands in the sum). In order to find good approximations on this mean value, we suppose (similarly as in the Sections 4 and 5) that w , w ˜ ∈ W , w ≠ w ˜ are random variables. In the simplest approximation, these random variables are Gaussian distributed with covariance matrix τ 0 2 1 n . Note that here we make a more simplified assumption than in Heuristic 2, since our only goal here is to intuitively comprehend why the covariances do not vanish.

By using again the techniques of conditional independence and other simple elementary techniques from probability theory, we can assume that (A1) is a sum of uncorrelated, square-integrable random variables so that we are in the setting of (a generalized form of) the law of large numbers. We omit the details here as they are similar to the detailed computations in Sections 4 and 5.

Consequently, we can expect that the expression (A1) is close to the expectation value of the Cov ( f w ( t ) , f w ˜ ( t ) ) . Recall from Proposition 3.3 that

Cov ( f w ( t ) , f w ˜ ( t ) ) = 1 2 Δ a + 1 2 Δ b − Δ c ⋅ Δ d ,

where

Δ a = e − 2 π 2 ‖ w + w ˜ ‖ 2 σ 0 2 , Δ b = e − 2 π 2 ‖ w − w ˜ ‖ 2 σ 0 2 Δ c = e − 2 π 2 ‖ w ‖ 2 σ 0 2 , Δ d = e − 2 π 2 ‖ w ˜ ‖ 2 σ 0 2 .

Hence, the expectation value of the Cov ( f w ( t ) , f w ˜ ( t ) ) (by considering w and w ˜ as the random variables) is a sum of terms of the form

E ( e γ Y ) ,

where Y is (standard)- χ -square distributed. For γ < 0.5 , this is identical to

E ( e γ Y ) = ( 1 − 2 γ ) − k ⁄ 2 ,

where k denotes the degrees of freedom of Y . Δ a (resp. Δ b ) depends on

‖ w + w ˜ ‖ 2 , resp. ‖ w − w ˜ ‖ 2 ,

which has n degrees of freedom, whereas Δ c ⋅ Δ d depends on

‖ w ‖ 2 + ‖ w ˜ ‖ 2 ,

which has 2 n degrees of freedom. In the end, we derive as an approximation of (A1)

( 1 − 4 γ 0 ) − n ⁄ 2 − ( 1 − 2 γ 0 ) − n ,

where γ 0 was defined in (51). For the total variance, we therefore expect as approximation

V ( f W ( t ) ) = ∑ w ∈ W V ( f w ( t ) ) + ∑ w , w ˜ ∈ W , w ≠ w ˜ Cov ( f w ( t ) , f w ˜ ( t ) ) ≈ m 0 2 + m 0 2 [ ( 1 − 4 γ 0 ) − n ⁄ 2 − ( 1 − 2 γ 0 ) − n ] .

This yields heuristically the bias of the variance, which was observed experimentally in [7].

B Bound on the volume of a ball

Lemma B.1

For all R > 0 , we have that

Vol ( B R n ) ≤ 2 π e n R n .

Proof

It is known that

(A2) Vol ( B R n ) = ( π R ) n Γ ( 1 + n 2 ) ,

where Γ is the gamma function. Recall the estimate on the gamma function given by

(A3) Γ 1 + n 2 > n 2 e n 2 n 2 + 1 2 e − 1 ⁄ 6 n 2 + 3 8 2 e 4 9 ,

which is proven in e.g. [18, Corollary 1.2]. Note that for all n ,

n 2 + 1 2 e − 1 ⁄ 6 n 2 + 3 8 2 e 4 9 ≥ 1 ,

which, together with (A2) and (A3), concludes the proof.□

References

[1] Arora S, Ge R. New algorithms for learning in presence of errors. In: International colloquium on automata, languages, and programming. Berlin, Heidelberg: Springer; 2011. p. 403–15. 10.1007/978-3-642-22006-7_34Suche in Google Scholar

[2] Kirchner P, Fouque PA. An improved BKW algorithm for LWE with applications to cryptography and lattices. In: Advances in Cryptology-CRYPTO 2015: 35th Annual Cryptology Conference, Santa Barbara, CA, USA, August 16-20, 2015, Proceedings, Part I 35. Berlin, Heidelberg: Springer; 2015. p. 43–62. 10.1007/978-3-662-47989-6_3Suche in Google Scholar

[3] Gama N, Nguyen PQ. Predicting lattice reduction. In: Advances in Cryptology-EUROCRYPT 2008: 27th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Istanbul, Turkey, April 13–17, 2008. Proceedings 27. Berlin, Heidelberg: Springer; 2008. p. 31–51. Suche in Google Scholar

[4] Aharonov D, Regev O. Lattice problems in NP ∩ coNP. JACM. 2005;52(5):749–65. 10.1145/1089023.1089025Suche in Google Scholar

[5] Laarhoven T, Walter M. Dual lattice attacks for closest vector problems (with preprocessing). In: Cryptographers Track at the RSA Conference. Cham: Springer; 2021. p. 478–502. 10.1007/978-3-030-75539-3_20Suche in Google Scholar

[6] Guo Q, Johansson T. Faster dual lattice attacks for solving LWE with applications to CRYSTALS. In: Advances in Cryptology-ASIACRYPT 2021: 27th International Conference on the Theory and Application of Cryptology and Information Security, Singapore, December 6–10, 2021, Proceedings, Part IV 27. Cham: Springer; 2021. p. 33–62. 10.1007/978-3-030-92068-5_2Suche in Google Scholar

[7] Ducas L, Pulles LN. Does the dual-sieve attack on learning with errors even work? In: Annual International Cryptology Conference. Cham: Springer; 2023. p. 37–69. 10.1007/978-3-031-38548-3_2Suche in Google Scholar

[8] Ducas L, Pulles LN. Accurate score prediction for dual-sieve attacks. Cryptology ePrint Archive, Paper 2023/1850. https://eprint.iacr.org/2023/1850. Suche in Google Scholar

[9] Pouly A, Shen Y. Provable dual attacks on learning with errors. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques. Springer; 2024. p. 256–85. 10.1007/978-3-031-58754-2_10Suche in Google Scholar

[10] Regev O. New lattice-based cryptographic constructions. JACM. 2004;51(6):899–942. 10.1145/1039488.1039490Suche in Google Scholar

[11] Peikert C. A decade of lattice cryptography. Foundations Trends® Theoret Comp Sci. 2016;10(4):283–424. 10.1561/0400000074Suche in Google Scholar

[12] Billingsley P. Probability and Measure. New York: John Wiley & Sons Inc.; 2012. Suche in Google Scholar

[13] Laurent B, Massart P. Adaptive estimation of a quadratic functional by model selection. Ann Statist. 2000;28(5):1302–38. 10.1214/aos/1015957395Suche in Google Scholar

[14] Stephens-Davidowitz N. On the Gaussian measure over lattices. USA: New York University; 2017. Suche in Google Scholar

[15] Banaszczyk W. New bounds in some transference theorems in the geometry of numbers. Mathematische Annalen. 1993;296:625–35. 10.1007/BF01445125Suche in Google Scholar

[16] Chen Y, Hu Z, Liu Q, Luo H, Tu Y. LWE with quantum amplitudes: algorithm, hardness, and oblivious sampling; 2023. Cryptology ePrint Archive, Paper 2023/1498. https://eprint.iacr.org/2023/1498. Suche in Google Scholar

[17] Yuan DM, Wei LR, Lei L. Conditional central limit theorems for a sequence of conditional independent random variables. J Korean Math Soc. 2014;51(1):1–15. 10.4134/JKMS.2014.51.1.001Suche in Google Scholar

[18] Batir N. Inequalities for the gamma function. Archiv der Math. 2008;91(6):554–63. 10.1007/s00013-008-2856-9Suche in Google Scholar

Received: 2024-04-11

Revised: 2025-03-26

Accepted: 2025-05-05

Published Online: 2025-07-03

This work is licensed under the Creative Commons Attribution 4.0 International License.

Artikel in diesem Heft

https://doi.org/10.1515/jmc-2024-0028

Schlagwörter für diesen Artikel

dual attack; learning with errors; lattices; cryptanalysis

Creative Commons

BY 4.0