Revisiting linearly extended discrete functions

Claude Gravel; Daniel Panario

doi:10.1515/jmc-2024-0010

Article Open Access

Revisiting linearly extended discrete functions

Claude Gravel and Daniel Panario

Published/Copyright: December 10, 2024

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Journal of Mathematical Cryptology Volume 18 Issue 1

Abstract

The authors introduced a new family of cryptographic schemes in a previous research article, which includes many practical encryption schemes, such as the Feistel family. Given a finite field of order q , any n > m ≥ 0 , the authors described a new way to extend discrete functions with domain size q m and range size q n − m to a permutation over q n elements using theory from linear error correcting codes. The authors previously showed that the knowledge about the differentials and correlations of the resulting permutation reduces solely to those of the extended discrete function. We show how the perfect secrecy of extended nonlinear functions transfers to the family of bijective linear extensions. We investigate how the concrete security of the family of nonlinear functions relates to the family of permutations obtained by such a type of linear extension. We also explore how the interplay between the entropy and the total variation distance (near-perfect secrecy with unbounded adversary) affects the mixing rate (number of iterations of the feedback linear extensions) with respect to the uniform distribution of the permutations over q n elements. We give a new proof that a distribution close to the uniform distribution has a large entropy.

Keywords: pseudo-random objects; finite field; linear algebra; mixing rate; entropy; total variation distance

MSC 2010: 94A60; 37A25

1 Introduction

In this article, we analyze the construction from the study by Gravel and Panario [1] in further detail and prove new results. In this section, we recall the construction; readers can consult Section 2 for more background.

Let q be a power of a prime and let n > m ≥ 0 be integers. Consider a finite field F q and the vector spaces F q n , F q m , and F q n − m denoted by V , V 1 , and V 2 , respectively. Let T : V → V , A : V → V 1 , and B : V → V 2 be full-rank linear transformations such that ker A ⊊ colsp B t . As usual, B t denotes the transpose of B , ker A denotes the kernel of A , and colsp B t denotes the column space of B t . Hence, we have A B t = 0 , where the latter is the null linear transformation. For example, if the matrices A and B are the generator and parity-check matrices of a linear error correction code over F q n , respectively, then we have A B t = 0 . Let K be a finite set in one-to-one correspondence with a subset of the functions from V 1 to V 2 . For a function f : K × V 1 → V 2 , we may write f ( s , x ) ∈ F q n − m for 0 ≤ s < ∣ K ∣ and x ∈ F q m or write f s ( x ) . We write f i ( s , x ) for the coordinates of f ( s , x ) , that is, f i ( s , x ) for 0 ≤ i ≤ n − m − 1 such that f = ( f 0 , … , f n − m − 1 ) . The case of interest is when f is not linear. The mathematical objects of interest here are permutations over V and, more precisely, a particular set of keyed permutations with keys from K inspired by Feistel networks. For all f : K × V 1 → V 2 , the function F : K × V → V defined by

(1) ( s , x ) ↦ T ( x + B t f ( s , A x ) )

is a permutation over V as shown in [1]. Again, we may refer to F ( s , x ) ∈ F q n for 0 ≤ s < ∣ K ∣ and x ∈ F q n or write F s ( x ) . In some intuitive sense, the function f is “squeezed” between B t and A . The map F is a permutation extending the smaller function f linearly. The message space ℳ of F is F q n . The keyspace of F is the set K mentioned earlier from which f s is chosen uniformly at random. In the following, we write F = ( F 0 , … , F i , … , F n − 1 ) , where F ( s , x ) = T ( x + B t f ( s , A x ) ) for all ( s , x ) ∈ K × ℳ . Section 2 gives some background, recalls some notions, or refers the readers to textbooks that might be useful to be acquainted with. We briefly state the results of this article.

In Section 3, we prove the perfect secrecy of our family of random permutations, which is Theorem 1.

Theorem 1

Let A , B , and T be full rank linear transformations as before. Let X be a random variable with some distribution over the message space ℳ , and let Z = A X . Let f : K × F q m → F q n − m be a function chosen uniformly at random with respect to K . For 0 ≤ i ≤ n − m − 1 , let Y i be a random variable such that Y i = f i ( S , Z ) where S is uniformly distributed over K . For 0 ≤ i ≤ n − 1 , and let X i ′ be a random variable such that X i ′ = F i ( S , X ) . If, for 0 ≤ i ≤ n − m − 1 , the ith coordinate f i satisfies

P { Y i = f i ( S , z 1 ) ∣ Z = z 1 } = P { Y i = f i ( S , z 2 ) ∣ Z = z 2 } for a l l z 1 a n d z 2 ,

then, for 0 ≤ i ≤ n − 1 , the ith coordinate F i satisfies

P { X i ′ = F i ( S , x 1 ) ∣ X = x 1 } = P { X i ′ = F i ( S , x 2 ) ∣ X = x 2 } for a l l x 1 a n d x 2 .

In Section 4, we prove Theorem 2, which, states that if at least one coordinate of an extension is not secure, then the extension cannot be secure.

Theorem 2

Let τ = ⌈ log q ( n − m ) ⌉ , and let { g s } s ∈ K be a ( t , ε ) -secure family of functions with domain F q m + τ and codomain F q . Let f : K × F q m → F q n − m be defined as follows:

f ( s , z ) = ( f 0 ( s , z ) , … , f n − m − 1 ( s , z ) ) and f i ( s , z ) = g s ( z , i ) for 0 ≤ i < n − m .

Let the matrices A , B , and T be as before such that F ( s , x ) = T ( x + B t f ( s , A x ) ) for ( s , x ) ∈ K × F q n . Then, for 0 ≤ i ≤ n − 1 , the coordinate F i of F is ( t ⁄ τ , q n − m ⁄ ∣ K ∣ + ε ) -secure.

In Section 5, we prove that a probability distribution close to uniform has a high entropy. More precisely, we prove Theorem 3 where the symbol d represents the total variation distance and H represents the entropy, concepts reviewed in Section 2. The use of the total variation distance d to analyze the mixing rate with respect to the uniform distribution of several compositions of equation (1) is explained at the end of Section 5.

Theorem 3

Let Ω be a sample space such that ∣ Ω ∣ ≥ q 2 for some q ≥ 2 . Let U be uniform on Ω and V be distributed on Ω with some distribution. If

d ( X , Y ) < 1 log ( ∣ Ω ∣ ) , then H ( Y ) ≥ log ( ∣ Ω ∣ ) − 2 .

In Section 6.1, we recall a few practical instances of symmetric cryptographic schemes that fall within our family of extensions; it includes a discussion of the theoretical results of other sections, as well as from [1]. Section 6.2 details a possible efficient choice for the matrices A , B , and T . The focus of this article and [1], is on the linear transformations represented by A , B , and T ; thus, we do not deal with the constructing the nonlinear part, that is, f in equation (1). Section 6.3 provides an implementation in C++ of our scheme.

2 Background

We briefly introduce the concepts of total variation distance, entropy, and cryptographic schemes. We require standard notions from probability theory; see [2,3]. We refer the reader to [4,5] for the theory and concepts in cryptography including the concept of probabilistic algorithms. To maintain this article as compact as possible, the reader can consult [6] about finite fields and [7] for linear algebra.

We write P { X = x } for the probability that X is equal to x whenever X is discrete. Uppercase letters denote random variables, and lowercase letters denote their corresponding realizations, except random functions, which might be denoted by lowercase letters. Random variables are often random vectors of functions here, but we do not use bold letters for random vectors. We reserve bold letters for matrices except for the symbol P , a generic symbol for probability distributions. In addition, the number of outcomes of random variables involved in this article are finite or countably infinite.

Let us consider two random variables, X and Y , defined on some space Ω , with probability distributions μ and ν , respectively. The total variation distance between μ and ν is denoted d ( μ , ν ) and defined as follows:

d ( μ , ν ) = sup A ⊆ Ω { ∣ μ ( A ) − ν ( A ) ∣ } .

Recall that Ω is countable for us. Therefore, it can be proven that

d ( μ , ν ) = 1 2 ∑ ω ∈ Ω ∣ μ ( ω ) − ν ( ω ) ∣ = 1 2 ∑ i ∈ Ω ∣ P { X = i } − P { Y = i } ∣ = def d ( X , Y ) .

The factor of one-half makes the quantity normalized. The space Ω is finite here, and the probabilities of the singletons are the only quantities that matter to define a distribution; hence, it is understood that { X = i } means the singleton { ω ∈ Ω } for which X ( ω ) = i , and hence, μ ( ω ) = P { X = i } .

The entropy of a discrete random variable X is the quantity denoted H ( X ) defined by

H ( X ) = ∑ i ∈ Ω P { X = i } log 1 P { X = i } .

We now introduce the concept of an encryption scheme. Let K and ℳ be two nonempty sets, the keyspace and message space, respectively. For our purpose, both the message and key spaces are vector spaces. The letter S denotes a random key, which is always uniformly distributed over K . The letter X denotes a random message following some distribution over ℳ . The letter Y denotes a random ciphertext or a random encrypted message.

Definition 1

(Encryption scheme) An encryption scheme with message space ℳ and keyspace K is a triple ( G , E , D ) of probabilistic algorithms such that

G : N → K , E : K × ℳ → ℳ , D : K × ℳ → ℳ ,

which satisfies the following requirement

(2) P { D ( s , E ( s , x ) ) = x } = 1 for all ( s , x ) ∈ K × ℳ .

The requirement from equation (2) is the correctness property. The symbols G , E , and D stand for “Generation,” “Encryption,” and “Decryption,” respectively. We could include in our definition a ciphertext space C such that E : K × ℳ → C , and D : K × C → ℳ . However, it is common to have C = ℳ for simplicity. We do not focus on the key generation algorithm G , but assume it is efficient.

Definition 2

(Perfect secrecy) An encryption scheme ( G , E , D ) is perfectly secret if, for any probability distribution on ℳ , we have that

(3) P { X = x ∣ Y = y } = P { X = x } .

Equality (3) means that the distribution of messages is independent of the distribution of ciphertexts; equivalently, it means that no information about a message can be obtained from a ciphertext because the a priori distribution P { X = x } coincides with the a posteriori distribution P { X = x ∣ Y = y } computed by an attacker. Suppose we see the encryption as a stochastic process. In that case, the equality of distribution in Definition 2 implies that the stochastic matrix representing the transitions from { X = x } to { Y = y } after encryption is a doubly stochastic matrix. Appendix A explains a relaxation of Definition 2 that incorporates the total variation distance. Where is the random key in the previous equality? Without loss of generality, we may assume P { X = x } > 0 for all x ∈ ℳ . We can easily show that a scheme is perfectly secret if and only if, for any messages x 0 , x 1 , and uniform random key S ,

P { Y = E S ( x 0 ) ∣ X = x 0 } = P { Y = E S ( x 1 ) ∣ X = x 1 } .

A necessary condition for perfect secrecy to hold is that ∣ K ∣ ≥ ∣ ℳ ∣ ; Shannon [8] shows that this inequality can be an equality, that is, ∣ K ∣ = ∣ ℳ ∣ .

Example 1

The one-time pad for which ℳ = K = R d for some d > 0 and ring R is perfectly secret. The message space and keyspace are modules over R of rank d or vector spaces of dimension d if R is a field. Given a uniformly random key s ∈ K , the one-time pad is defined as E s ( x ) = x + s = y and D s ( y ) = y − s , where addition and subtraction are performed element-wise in R over x , y , and s .

Example 2

The one-time pad is a particular rank-degenerate case of the construction defined by equation (1) if we admit the possibility of m = 0 for the dimension of the vector spaces. Then, we take A as the zero map and B as the identity map. The message space and keyspace are such that ℳ = K = F q n . We recall that { 0 } is a trivial vector space, and f s : { 0 } → F q n − m is defined by f s ( 0 ) = s for all s ∈ K . Thus, f s can be seen as the random key of a one-time pad over F q n , with addition being carried out in F q .

Definition 3 further relaxes Definition 2 by adding a time constraint (computational framework) and a distinguishing gap. In the following definition, A stands for an “Attack,” “Attacker,” or “Adversary.” A is a probabilistic algorithm that takes a pair in ℳ × C as input and outputs a boolean value. The attacker A is a boolean random variable defined on the finite space ℳ × C .

Definition 3

(Concrete ( t , ε ) -security) Let X be a random message distributed according to some distribution on the message space ℳ . An encryption scheme ( G , E , D ) is ( t , ε ) -secure if there is a random variable Y independent of X such that for any probabilistic algorithm A that runs in time at most t ,

(4) ∣ P { A ( X , E S ( X ) ) = 1 } − P { A ( X , Y ) = 1 } ∣ ≤ ε .

The left side of inequality (4) is a discrepancy. Although not explicitly part of the symbolism defining the attacker A just before Definition 4, the attacker can sample uniformly the keyspace K . Definition 3 is a concrete definition because no asymptotic quantity is involved. Definition 3 came up initially in the work on pseudo-random functions [9]. See Appendix A for why Definition 3 is equivalent to the total variation distance when t = ∞ .

3 Perfect secrecy of the linear extension

In this section, we prove Theorem 1 by combining different results. The implication of this theorem provides a necessary condition for the perfect secrecy of a cryptographic scheme that uses as a family of pseudo-random permutations the family of bijective functions described in the second paragraph of Section 1; we recall the description of this family now. Let q be a power of a prime, and let n > m ≥ 0 be integers. Consider a finite field F q and the vector spaces F q n , F q m , and F q n − m denoted by V , V 1 , and V 2 , respectively. Let T : V → V , A : V → V 1 , and B : V → V 2 be full-rank linear transformations such that ker A ⊊ colsp B t so that A B t = 0 . Let f : K × V 1 → V 2 be a function chosen uniformly at random with respect to K . We recall that K is a finite set in one-to-one correspondence with a subset of the functions from V 1 to V 2 . We emphasize that choosing a key s specifies a function’s description from F q m to F q n ; hence for x ∈ F q m , we have f ( s , x ) = ( f 1 ( s , x ) , … , f i ( s , x ) , … , f n − m − 1 ( s , x ) ) . The function F ( s , x ) = T ( x + B t f ( s , A x ) ) is a permutation over V .

Theorem 1

Let A , B , and T be full rank linear transformations as before. Let X be a random variable with some distribution over the message space ℳ = F q n , and let Z = A X . Let f : K × F q m → F q n − m be a function chosen uniformly at random with respect to K . For 0 ≤ i ≤ n − m − 1 , and let Y i be a random variable such that Y i = f i ( S , Z ) where S is uniformly distributed over K . For 0 ≤ i ≤ n − 1 , and let X i ′ be a random variable such that X i ′ = F i ( S , X ) . If, for 0 ≤ i ≤ n − m − 1 , the ith coordinate f i satisfies

P { Y i = f i ( S , z 1 ) ∣ Z = z 1 } = P { Y i = f i ( S , z 2 ) ∣ Z = z 2 } for all z 1 a n d z 2 ,

then, for 0 ≤ i ≤ n − 1 , the ith coordinate F i satisfies

P { X i ′ = F i ( S , x 1 ) ∣ X = x 1 } = P { X i ′ = F i ( S , x 2 ) ∣ X = x 2 } for a l l x 1 a n d x 2 .

In light of Definition 2 from Section 2, the implication of this theorem is a necessary condition for perfect secrecy of a cryptographic scheme that would use the family { F s } s ∈ K as a pseudo-random family of permutations. If the coordinates F i of F for 0 ≤ i ≤ n − 1 do not satisfy the conclusion of Theorem 1, then F cannot be used in schemes that would require perfect secrecy.

We recall simple results that are Lemma 1 and 2. The proof of Lemma 1 is simple and not given here.

Lemma 1

Let X 1 and X 2 be uniformly distributed on F q . Let a 1 , a 2 ∈ F q so that a 1 a 2 ≠ 0 . Then we have

X 1 + X 2 is uniformly distributed;
a X 1 for a ≠ 0 is uniformly distributed;
a 1 X 1 + a 2 X 2 is uniformly distributed.

Lemma 2

Let X be a random variable with some distribution over the message space ℳ = F q n , and let Z = A X . Let c ∈ F n − m be such that c ≠ 0 . Let S be uniform over K . Let f = ( f 0 , f 1 , … f n − m − 1 ) with f i : K × F q m → F q for 0 ≤ i ≤ n − m − 1 be such that

P { Y = f i ( S , z 1 ) ∣ Z = z 1 } = P { Y = f i ( S , z 2 ) ∣ Z = z 2 } for a l l z 1 a n d z 2 ,

Then, for any x 1 , x 2 ∈ F q n such that z 1 = A x 1 and z 2 = A x 2 ,

P ∑ i = 0 n − m − 1 c i f i ( S , A x 1 ) ∣ X = x 1 = P ∑ i = 0 n − m − 1 c i f i ( S , A x 2 ) ∣ X = x 2 .

Proof

For any realization x of X , we can assume P { X = x } > 0 . Let Z denote the random vector given by Z = A X . Because A is full rank, P { Z = z } > 0 for all z ∈ F q m .

Let x 1 , x 2 ∈ F q n be any messages, and let z 1 = A x 1 , z 2 = A x 2 . Let c ∈ F q n − m be nonzero. Let Y i = f i ( S , Z ) , and consider the conditional distributions of Y i given Z = z for z ∈ F q m . We have by hypothesis that for all pairs of outcomes ( z 1 , z 2 ) ∈ F q m × F q m and for 0 ≤ i ≤ n − m − 1

(5) P { Y i = f i ( S , z 1 ) ∣ Z = z 1 } = P { Y i = f i ( S , z 2 ) ∣ Z = z 2 } .

Because equation (5) holds for all i , the conditional distributions of Y i given Z = z are uniform for all z . From now on, we can use Lemma 1 to combine the random values of the Y i ’s. By induction on n − m , if we combine linearly Y 0 and Y 1 as a basis case, then, by using Lemma 1, the distribution of { c 0 Y 0 + c 1 Y 1 ∣ Z = z } is uniform. Write W = c 0 Y 0 + … + c n − m − 2 Y n − m − 2 and assume { W ∣ Z = z } is uniform. Then { W + c n − m − 1 Y n − m − 1 ∣ Z = z } is uniform for all z ∈ F q m by Lemma 1. Hence, for all pairs ( z 1 , z 2 ) ∈ F q m × m ,

P { c 0 Y 0 + c 1 Y 1 + … + c n − m − 1 Y n − m − 1 ∣ Z = z 1 } = P { c 0 Y 0 + c 1 Y 1 + … + c n − m − 1 Y n − m − 1 ∣ Z = z 2 } ,

or, equivalently,

P { c 0 f 0 ( S , z 1 ) + c 1 f 1 ( S , z 1 ) + … + c n − m − 1 f n − m − 1 ( S , z 1 ) ∣ Z = z 1 } = P { c 0 f 0 ( S , z 2 ) + c 1 f 1 ( S , z 2 ) + … + c n − m − 1 f n − m − 1 ( S , z 2 ) ∣ Z = z 2 } .

The proof is complete.□

Proof of Theorem 1

We write the i th row B t as ( b i , 0 , … , b i , n − m − 1 ) , which plays the role of c in Lemma 2. Applications of Lemma 2 a number n of times completes the proof because we assumed that the transformation T is full-rank.□

4 Concrete security of the linear extension

In this section, we show how the concrete security of the family of nonlinear functions relates to the coordinates of their bijective linear extensions. The extension cannot be secure if one or more coordinates are not secure. Given a family of keyed nonlinear functions, the following theorem helps assert the security of the coordinates of the extension. We recall this family of extensions from equation (1), where n > m ≥ 0 , A ∈ F q m × n , B ∈ F q ( n − m ) × n , and T ∈ F q n × n are full rank matrices. Let f : K × F q m → F q n − m , and consider the permutation

F ( s , x ) = T ( x + B t f ( s , A x ) ) ,

where f ( s , x ) = ( f 1 ( s , x ) , … , f i ( s , x ) , … , f n − m − 1 ( s , x ) ) . We emphasize that choosing a key s ∈ K specifies a function’s description from F q m to F q n . The message space ℳ is F q n . The function F ( s , x ) = T ( x + B t f ( s , A x ) ) is a permutation over F q n , and F has n coordinates, say F i for 0 ≤ i ≤ n − 1 , that are functions from F q n to F q .

Theorem 2

f ( s , z ) = ( f 0 ( s , z ) , … , f n − m − 1 ( s , z ) ) and f i ( s , z ) = g s ( z , i ) for 0 ≤ i < n − m .

Proof

We observe first that m + τ ≤ m + ( n − m ) = n . Let us denote by Θ and Φ the following two families of functions defined by

Θ = { g : K × F q m + τ → F q ; ( s , z ) ↦ g ( s , z ) = def g s ( z ) } , Φ = { f : K × F q m → F q n − m ; ( s , z ) ↦ f ( s , z ) = ( g s ( z , 0 ) , … , g s ( z , n − m − 1 ) ) } .

If Θ is ( t , ε ) -secure, then Φ is ( t ⁄ τ , ε ) -secure; see Section 3.2.3 of [5]. Using Definition 3, if Φ is ( t ⁄ τ , ε ) -secure, then there exists a distribution on the set of functions from F q m to F q n − m that is distance at most ε from the distribution of uniformly chosen random functions from F q m to F q n − m , for any probabilistic algorithm with running time at most t ⁄ τ ; let this discrete finite distribution be represented by the random variable R . Here, the realizations of R are functions, and R must be independent of g S . Let A be a probabilistic algorithm that runs in time at most t ⁄ τ . From Definition 3,

∣ P { A ( z , R ) = 1 ∣ Z = z } − P { A ( z , f ( S , z ) ) = 1 ∣ Z = z } ∣ ≤ ε ,

and, as a direct consequence, for all 0 ≤ i ≤ n − m − 1 , we have

∣ P { A ( z , R ) = 1 ∣ Z = z } − P { A ( z , f i ( S , z ) ) = 1 ∣ Z = z } ∣ ≤ ε .

We consider linear combinations of the coordinates of a function f ( s , z ) ∈ Φ denoted by f i ( s , z ) = g s ( z , i ) . Let a = ( a 0 , … , a n − m − 1 ) ∈ F q n − m be nonzero and consider

ℓ ( s , z ) = def ∑ i = 0 n − m − 1 a i f i ( s , z ) for ( s , z ) ∈ K × F q m .

For example, the vector a can be a row of B t . For notational simplicity, we let

δ 0 = def P { A ( z , R ) = 1 ∣ Z = z } − P { A ( Z , ℓ ( S , z ) ) = 1 ∣ Z = z } , δ 1 = def P { A ( z , ℓ ( S , z ) ) = 1 ∣ Z = z } − P { A ( z , f i ( S , z ) ) = 1 ∣ Z = z } .

We have that

(6) ∣ δ 0 ∣ − ∣ δ 1 ∣ ≤ ∣ ∣ δ 0 ∣ − ∣ δ 1 ∣ ∣ ≤ ∣ δ 0 + δ 1 ∣ ≤ ε implying ∣ δ 0 ∣ ≤ ∣ δ 1 ∣ + ε .

The two distributions in δ 1 are both in terms of the uniform random variable S over K . The denominator is ∣ K ∣ for both quantities in ∣ δ 1 ∣ . If algorithm A uses sources of randomness other than what is provided by S , its choices over those additional sources of randomness can always be hardcoded as explained in [5]. The set of all linear combinations over { f 0 , … , f n − m − 1 } has cardinality q n − m , which bounds P { A ( z , ℓ ( S , z ) ) = 1 ∣ Z = z } . We have that P { A ( z , ℓ ( S , z ) ) = 1 ∣ Z = z } ≤ q n − m ⁄ ∣ K ∣ and, for 0 ≤ i < n − m − 1 , we have that P { A ( z , f i ( S , z ) ) = 1 ∣ Z = z } ≤ 1 ⁄ ∣ K ∣ . Thus, δ 1 is bounded by ∣ q n − m ⁄ ∣ K ∣ − 1 ⁄ ∣ K ∣ ∣ that is less than q n − m ⁄ ∣ K ∣ , which combined with inequality (6) yields

(7) ∣ P { A ( z , R ) = 1 ∣ Z = z } − P { A ( z , ℓ ( S , z ) ) = 1 ∣ Z = z } ∣ ≤ q n − m ∣ K ∣ + ε .

De-conditioning expression (7) with respect to Z gives

(8) ∣ P { A ( z , R ) = 1 } − P { A ( z , ℓ ( S , z ) ) = 1 } ∣ = ∑ z ∈ F q m ∣ P { A ( z , R ) = 1 ∣ Z = z } − P { A ( z , ℓ ( S , z ) ) = 1 ∣ Z = z } ∣ P { Z = z } ≤ q n − m ∣ K ∣ + ε ∑ z ∈ F q m P { Z = z } .

Substituting z = A x in inequality (8) gives

(9) ∣ P { A ( A x , R ) = 1 } − P { A ( A x , ℓ ( S , A x ) ) = 1 } ∣ ≤ q n − m ∣ K ∣ + ε ∑ x ∈ F q n z = A x P { X = x } ≤ q n − m ∣ K ∣ + ε ,

where X follows some distribution over F q n . We observe that inequality (9) holds no matter the distribution on the original space F q n . As mentioned previously, instances of the vector a = ( a 0 , … , a n − m − 1 ) involved in the expression of ℓ ( S , A x ) with ( S , x ) ∈ K × F q n can be any row of B t . Regardless of x , the random coordinates of T ( x + B t f ( S , A x ) ) satisfy the bound (9).□

We end this section by pointing out that the result proved in Section 3.2.3 of [5] can be easily modified to match also the case when n − m = 1 for which τ = max { 1 , ⌈ log q ( n − m ) ⌉ } = 1 .

5 Total variation distance, entropy, mixing rate, and composing several bijective extensions

We recall equation (1), where n > m ≥ 0 , A ∈ F q m × n , B ∈ F q ( n − m ) × n , and T ∈ F q n × n are full rank matrices. Let f : K × F q m → F q n − m , and consider the permutation

F ( s , x ) = T ( x + B t f ( s , A x ) ) .

We remind the readers that this article as well as Gravel and Panario [1] do not deal with the choice or design of a secure, efficient-to-sample family of functions f : K × F q m → F q n − m . We prove facts that connect the statistical distance and the entropy. We also discuss the number of iterations to reach a desired distance from truly random permutations over q n elements, assuming that keys are uniform and independent.

5.1 Connecting the total variation distance and the entropy

Often, we say that if a probability distribution is close to the uniform probability distribution, then its entropy is large; we prove that statement here. We state and prove Lemma 3 and Theorem 3. The algebraic field is F q with q some power of a prime. All the vector spaces and probability spaces are finite. ∣ ⋅ ∣ can be used either for the cardinality of a set or for the absolute value of a real number. The finiteness of the probability spaces implies, for instance, that the entropy is maximal when the distribution under consideration is uniform. Given two discrete finite probability distributions p 1 and p 2 , we recall that d ( p 1 , p 2 ) represents the total variation distance between the distributions p 1 and p 2 , and 0 ≤ d ( p 1 , p 2 ) ≤ 1 . Also, H ( p ) represents the entropy of some distribution p . For instance, if U and V are discrete finite random variables with distributions p 1 and p 2 , respectively, we may abuse the notation slightly and write d ( U , V ) in place of d ( p 1 , p 2 ) and similarly for H .

Lemma 3

Let U be the uniform discrete random variable over Ω , and let V be any discrete random variable over Ω . Let 0 < α < 1 be a point that splits Ω into two disjoint sets, that is, Ω = Ω α ∪ ( Ω \ Ω α ) with

Ω α = { i ∈ Ω : P { V = i } ≤ α } .

For notational convenience, let

τ = 1 2 1 − 1 α ∣ Ω ∣ > 0 .

For 0 < β < 1 , if d ( U , V ) ≤ β , then we have

H ( V ) ≥ log 1 α max 0 , 1 − β τ .

Proof of Lemma 3

First, we observe that if i ∉ Ω α , then P { V = i } > α = α ∣ Ω ∣ P { U = i } from which we have that

(10) β ≥ d ( U , V ) = 1 2 ∑ i ∈ Ω ∣ P { U = i } − P { V = i } ∣ ≥ 1 2 ∑ i ∉ Ω α ∣ P { U = i } − P { V = i } ∣ ≥ 1 2 ∑ i ∉ Ω α P { V = i } − P { V = i } α ∣ Ω ∣ = 1 2 P { V ∉ Ω α } 1 − 1 α ∣ Ω ∣ .

From the bound (10), we obtain that P { V ∉ Ω α } ≤ min { 1 , β ⁄ τ } , where we recall that τ = ( 1 ⁄ 2 ) ∣ 1 − 1 ⁄ ( α ∣ Ω ∣ ) ∣ from which we deduce that

P { V ∈ Ω α } ≥ 1 − min 1 , β τ = max 0 , 1 − β τ .

Second, we split H ( V ) into two sums as follows:

H ( V ) = ∑ i ∈ Ω α P { V = i } log 1 P { V = i } + ∑ i ∉ Ω α P { V = i } log 1 P { V = i } ≥ ∑ i ∈ Ω α P { V = i } log 1 P { V = i } .

For i ∈ Ω α , we have that log ( P { V = i } ) ≤ log ( α ) , and therefore,

H ( V ) ≥ P { V ∈ Ω α } log 1 α ≥ log 1 α max 0 , 1 − β τ .

The proof is complete.□

Theorem 3

Let Ω be a sample space such that ∣ Ω ∣ ≥ q 2 for some q ≥ 2 . Let U be uniform on Ω and V be distributed on Ω with some distribution. If d ( U , V ) < 1 log ( ∣ Ω ∣ ) , then

H ( V ) ≥ log ( ∣ Ω ∣ ) − 2 .

Proof of Theorem 3

First, we take α = q ⁄ ∣ Ω ∣ in Lemma 3, and then bound τ . Since ∣ Ω ∣ ≥ q 2 and q ≥ 2 , we first observe that

α = q ∣ Ω ∣ ≥ 2 ∣ Ω ∣ implying 1 2 = 1 − 1 2 ≤ 1 − 1 α ∣ Ω ∣ .

Therefore, we have

1 2 ≤ 1 − 1 α ∣ Ω ∣ = 1 − 1 q < 1 if and only if 1 4 ≤ τ < 1 2 .

Second, we have some freedom for the inequality β < τ to hold, implying that max { 0 , 1 − β ⁄ τ } = 1 − β ⁄ τ ; for that, we can take β = 1 ⁄ ( ∣ Ω ∣ log ( ∣ Ω ∣ ) ) . Hence, we have

β = 1 ∣ Ω ∣ log ( ∣ Ω ∣ ) < 1 4 ≤ τ implying max 0 , 1 − β τ = 1 − β τ .

Substitute α , β , and τ into the conclusion of Lemma 3 to obtain that

(11) H ( V ) ≥ ( log ( ∣ Ω ∣ ) − 1 ) 1 − β τ .

For bounding the quantity (11), we use the previous fact that 2 ≤ 1 ⁄ τ ≤ 4 , and therefore, 1 − β τ ≥ 1 − 4 β , which yields

H ( V ) ≥ ( log ( ∣ Ω ∣ ) − 1 ) 1 − β τ = ( log ( ∣ Ω ∣ ) − 1 ) ( 1 − 4 β ) .

By using ∣ Ω ∣ ≥ q 2 ≥ 4 , and β > 0 so that, respectively, − 4 ⁄ ∣ Ω ∣ ≥ − 1 and 4 β > 0 , we obtain

H ( V ) ≥ log ( ∣ Ω ∣ ) − 1 − 4 ∣ Ω ∣ + 4 β ≥ log ( ∣ Ω ∣ ) − 2 .

The proof is complete.□

Let U : Ω → Ω be a uniform random variable, and suppose Ω is the symmetric group of order ( q n ) ! . Consider Ω 0 ⊊ Ω and a uniform random variable V : Ω 0 → Ω 0 . We observe that if ω ∈ Ω \ Ω 0 then P { V ( ω ) } = 0 . The random variable V is defined by

P { V ( ω ) } = 1 ∣ Ω 0 ∣ if ω ∈ Ω 0 , 0 if ω ∉ Ω 0 .

We evaluate the total variation distance from the uniform random variable U :

d ( U , V ) = 1 2 ∑ ω ∈ Ω ∣ P { U ( ω ) } − P { V ( ω ) } ∣

= 1 2 ∑ ω ∈ Ω 0 ∣ P { U ( ω ) } − P { V ( ω ) } ∣ + 1 2 ∑ ω ∉ Ω 0 ∣ P { U ( ω ) } − P { V ( ω ) } ∣ = 1 2 ∑ ω ∈ Ω 0 1 ∣ Ω 0 ∣ − 1 ∣ Ω ∣ + 1 2 ∑ ω ∉ Ω 0 1 ∣ Ω ∣ = 1 2 ∣ Ω 0 ∣ ∣ ∣ Ω ∣ − ∣ Ω 0 ∣ ∣ ∣ Ω 0 ∣ ∣ Ω ∣ + 1 2 ∣ Ω ∣ − ∣ Ω 0 ∣ ∣ Ω ∣ = 1 2 ∣ Ω ∣ − ∣ Ω 0 ∣ ∣ Ω ∣ + 1 2 ∣ Ω ∣ − ∣ Ω 0 ∣ ∣ Ω ∣ = 1 − ∣ Ω 0 ∣ ∣ Ω ∣ .

5.2 A note on the mixing rate when composing several bijective extensions

From now on until the end of this section, we wish to instill in readers a few ideas about the mixing rate, which are the ingredients of further research. The inspiration came to us by reading [10–13]; the recent book [14] contains an exhaustive list of references about the aforementioned topics. Let ℓ > 0 and let s 1 , … , s ℓ be different indexing keys, that is,

F ( s i , x ) = T ( x + B t f ( s i , A x ) ) for 1 ≤ i ≤ ℓ .

The quantity ℓ is often called the number of rounds or iterations in cryptography. In the context of shuffling theory, the quantity ℓ would play the role of a stopping time, which is the number of times we must repeat a shuffling algorithm for the final output to be nearly uniformly distributed. Write s = ( s ℓ , … , s 1 ) for notational convenience. With a slight abuse of notation, the ℓ -tuple s means a sequence ( f ( s ℓ , x ) , … , f ( s 1 , x ) ) of functional keys for all x ∈ F q m . It is time to specify Ω 0 from before. We have

Ω 0 = { F ( s ℓ , F ( s ℓ − 1 , … , F ( s 1 , x ) … ) ) ∣ F : K × F q n → F q n } .

We have a one-to-one correspondence between K and Ω 0 , that is, we have

∣ Ω 0 ∣ = ∣ K ℓ ∣ = ∣ K ∣ ℓ .

How far is Ω 0 given ℓ from the set of permutations over q n , that is, from Ω ? The previous question pertains to the mixing rate, and a few details might be necessary to give this properly. For a desired distance 0 ≤ d 0 ≤ 1 , let ℓ 0 ∈ [ 0 , ∞ ) such that

d 0 = 1 − ∣ K ∣ ℓ 0 ∣ Ω ∣ if and only if ℓ 0 = log ( ∣ Ω ∣ ( 1 − d 0 ) ) log ( ∣ K ∣ ) .

The maximum number of rounds is at most ℓ 0 unless one can store a random structure larger than or equal to the storage size of a permutation over q n elements. From the definition of ℓ 0 , we have

ℓ 0 = log ( ∣ Ω ∣ ( 1 − d 0 ) ) log ( ∣ K ∣ ) if and only if ℓ 0 log ( ∣ K ∣ ) = log ( ∣ Ω ∣ ( 1 − d 0 ) ) .

Using the definition of entropy, the latter equality, and the assumption that U and V are uniform in their respective spaces, we obtain that

(12) H ( V ) = H ( U ) + log ( 1 − d 0 ) ≤ H ( U ) .

The bound (12) could be improved. The importance of the quantity (12) is better understood in the context of mixing rate. We write λ for the uniform distribution over Ω = S q n , and ρ for the probability distribution of V . For two elements σ , π ∈ S q n , the probability of obtaining π after ℓ > 0 passes is denoted by ρ * ℓ ( π ) and given by

(13) ρ * ℓ ( π ) = ∑ τ ∈ S q n ρ ( τ ) ρ * ( ℓ − 1 ) ( π τ − 1 ) .

To end up in the permutation represented by π after ℓ rounds, and if the system is at τ after ( ℓ − 1 ) rounds, then, because π = ( π τ − 1 ) τ , we obtain equation (13), which defines a probability distribution with ρ = def ρ * 1 . We assume that the distribution of k consists of ℓ independently and identically uniform keys.

An important question is about to quantify

d ( ρ * ℓ , λ ) = max A ∈ S q n ∣ ρ * ℓ ( A ) − λ ( A ) ∣ ,

which is still open for even simpler permutations like those arising from Thorp shuffles from [13] as detailed in [11,12]. Quantifying this for our family of permutations given by equation (1) might be a complex problem that we leave for further research.

6 Examples

Section 6.1 recalls some concrete constructions from the literature that fall into the bijective extensions given by (1). In Section 6.2, we show one way to build the linear transformations A , B , and T that are likely to be efficient and scalable in practice, assuming that elements from the keyspace are sampleable efficiently. We mention briefly the consequences of applying Theorem 2 to the examples in this section; the resulting consequences do not indicate any practical weaknesses and are purely for illustrative purposes. We also use the letter k instead of s to denote keys. In addition, we write f k ( x ) or F k ( x ) instead of f ( k , x ) or F ( k , x ) , respectively.

6.1 Known instances of our extension

Example 3 shows how our scheme encapsulates the original family of Feistel block ciphers. Example 4 shows a more specific instance, the FOX block cipher. Figure 1 is a visual representation of a generalized Type I Feistel network.

Figure 1

Generalized type I Feistel network.

Before recalling the examples, it would be necessary to study and obtain exact knowledge of the statistical or security properties of interest (concrete security, first-order differentials, linear correlations or Fourier analysis, entropic bounds, mixing rate, etc.) of the nonlinear functions involved in the examples that follow and then rework the results of Gravel and Panario [1] as well as the previous results herein to obtain sharper results. Those results should be a general guideline to derive properties for our family of bijective extensions with large parameters q , m , n , and ℓ rather than being applied to specific implementations. We leave for future work to consider these specific implementations.

Example 3

(Feistel block cipher family) Feistel block ciphers form a family of symmetric block ciphers, as detailed in the study by Hoang and Rogaway [15]. This example shows that they are a subfamily of our scheme. The nonlinear part is chosen from a certain family of functions. The working field is F 2 that we denote by F for this example. Each index k of a nonlinear function is an index of the linear extension. With the values m = 1 and n = 2 m , the nonlinear f k can be specified to design particular instances of a Feistel network such as Data Encryption Standard (DES). An input x is a two-block column vector given by

x = x 0 x 1

with x 0 , x 1 ∈ F d for some d > 1 , and similarly for an output vector. The nonlinear function f k is defined over F d . If I = I d and 0 = 0 d , d , and then the linear transformations A , B , and T are, respectively, given by

A = I 0 , B = 0 I , and T = 0 I I 0 .

Finally, we have that

F k ( x ) = F k x 0 x 1 = T ( x + B t f k ( A x ) ) = 0 I I 0 x 0 x 1 + 0 I f k I 0 x 0 x 1 = x 1 + f k ( x 0 ) x 0 .

To use Theorem 2 on Example 3, we have n = 2 , m = 1 , q = 2 d and ∣ K ∣ = 2 2 d ; hence, τ = 1 and q n − m ⁄ ∣ K ∣ = 2 − d so that coordinates are ( t , 2 − d + ε ) -secure if the underlying nonlinear transformations are ( t , ε ) -secure.

The matrix T from Example 3 is a permutation. Another example is IDEA NXT, which uses the Lai-Massey scheme from [16] as a building primitive.

Example 4

(IDEA NXT–FOX) As in the previous example, the working field is F 2 . Here, d = 16 or d = 32 , m = 2 , and n = 2 m . An input x is written columnwise as four consecutive blocks L 0 , L 1 , R 0 , R 1 ∈ F d = F d × 1 , that is,

x = L 0 L 1 R 0 R 1 .

We denote I d and 0 d , d by I and 0 , respectively. The matrices A , B , and T are given by

A = I 0 I 0 0 I 0 I , B = A , T = 0 I 0 0 I I 0 0 0 0 I 0 0 0 0 I .

The nonlinear keyed function f k : F d × F d ↦ F d × F d has k ∈ F d × F d × F d × F d ; see [17]. Let ( z 0 , z 1 ) t = f k ( y 0 , y 1 ) t for z 0 , z 1 , y 0 , and y 1 ∈ F d . Given an input x = ( L 0 , L 1 , R 0 , R 1 ) t and a round key k ∈ F d × F d × F d × F d , one round of FOX is given by

F k ( x ) = T ( x + B t f k ( A x ) ) = T x + B t f k L 0 + R 0 L 1 + R 1

= T L 0 L 1 R 0 R 1 + I 0 0 I I 0 0 I z 0 z 1 = z 1 + L 1 z 0 + z 1 + L 0 + L 1 z 0 + R 0 z 1 + R 1 .

To use Theorem 2 on Example 4, we have n = 4 , m = 2 , q = 2 d and ∣ K ∣ = 2 4 d ; hence, τ = 1 and q n − m ⁄ ∣ K ∣ = 2 − 2 d so that coordinates are ( t , 2 − 2 d + ε ) -secure if the underlying nonlinear transformations are ( t , ε ) -secure.

Example 5

(Type-1 generalized Feistel network) In this example, the working field is F 2 d for some d > 1 that we denote by F . We have m = 1 , n = 4 m , I ∈ F 2 d × d , 0 ∈ F 2 d × d . An input x is written columnwise as four consecutive blocks x i ∈ F for 1 ≤ i ≤ 4 . The nonlinear part f is defined over F . (In the study by Hoang and Rogaway [15], F denotes a smaller nonlinear function, which is denoted f by us. Also, blocks are denoted by B i and their length by n . The length of a block here is d .) The template matrices A ∈ F 1 × 4 , B ∈ F 1 × 4 , and T ∈ F 4 × 4 are given by

A = I 0 0 0 , B = 0 I 0 0 , T = 0 I 0 0 0 0 I 0 0 0 0 I I 0 0 0 .

To use Theorem 2 on Example 5, we have n = 4 , m = 1 , q = 2 d , and ∣ K ∣ = 2 4 d ; hence, τ = 1 and q n − m ⁄ ∣ K ∣ = 2 − d so that coordinates are ( t , 2 − d + ε ) -secure if the underlying nonlinear transformations are ( t , ε ) -secure.

Alternating Feistel and unbalanced Feistel, as shown in the study by Hoang and Rogaway [15] can also be embedded in our scheme with minor changes. We use two instances of A , B , T , and f for alternating Feistel. Our scheme allows f to be noninvertible and hence include unbalanced Feistel.

Interestingly, Type-2 and Type-3 generalized Feistel networks from Hoang and Rogaway [15] do not fit our model as presented so far. However, we can expand our scheme to include such networks. For conciseness, we show this only for Type-3 generalized Feistel networks; Type-2 can be easily derived by simplifying the Type-3 model.

Example 6

(Type-3 generalized Feistel network) The working field is as in the previous example. We have m = 1 , n = 4 m , I ∈ F 2 d × d , 0 ∈ F 2 d × d . An input x is written columnwise as four consecutive blocks x i ∈ F for 1 ≤ i ≤ 4 . There are three nonlinear functions f j : F → F for j = 1 , 2 , 3 . The template matrices A j ∈ F 1 × 4 , and B j ∈ F 1 × 4 are given by

A 1 = ( I 0 0 0 ) , B 1 = ( 0 I 0 0 ) , A 2 = ( 0 I 0 0 ) , B 2 = ( 0 0 I 0 ) , A 3 = ( 0 0 I 0 ) , B 3 = ( 0 0 0 I ) .

The matrix T is as in Example 5. Finally, we have

F k x 1 x 2 x 3 x 4 = T x 1 x 2 x 3 x 4 + ∑ j = 1 3 B j t f j A j x 1 x 2 x 3 x 4 .

To use Theorem 2 on Example 5, we have n = 4 , m = 1 , q = 2 d and ∣ K ∣ = ( 2 4 d ) 3 ; hence, τ = 1 and q n − m ⁄ ∣ K ∣ = 2 − 9 d so that coordinates are ( t , 2 − 9 d + ε ) -secure if the underlying nonlinear transformations are ( t , ε ) -secure.

6.2 Examples of linear extensions

As an efficient linear extension, we seek to compute the matrix-vector products A x and B t y for x ∈ F q n and y ∈ F q n − m as efficiently as possible. The matrix-vector multiplication of a Fourier matrix is performed efficiently using the discrete Fourier transform, as explained by von zur Gathen and Gerhard [18]. For that purpose, let g ∈ F q with g ≠ 0 , 1 and let n = ord ( g ) , where ord ( g ) denotes the order of g . By Lagrange theorem, ord ( g ) ∣ ( q − 1 ) . For some integer 0 ≤ x < ( q − 1 ) ⁄ ord ( g ) , the entry located at i th row and j th column of the matrices A : F q n → F q m and B : F q n → F q n − m are denoted a i j and b i j , respectively, and given by

a i j = g ( i + x ) j for 0 ≤ i < m and 0 ≤ j < n , b i j = g ( i + m + x ) j for 0 ≤ i < n − m and 0 ≤ j < n .

For m = n − 1 and m = 1 , we have always that x = 1 and x = 0 , respectively.

Clearly, we have A B t = 0 whenever ord ( g ) = n . For 0 ≤ i < m and 0 ≤ j < n − m , if row i ( A ) and col j ( B t ) denotes the i th row of A and the j th column of B t , then

(14) row i ( A ) col j ( B t ) = ∑ k = 0 n − 1 g ( i + j + m + 2 x ) k = g ( i + j + m + 2 x ) n − 1 g i + j + m + 2 x − 1 = 0 ,

provided that i + j + m + 2 x is not a multiple of ord ( g ) for all i and j . As explained by Hill [19], A , B , and the more general Vandermonde matrices, are well known in linear error correction. The Fourier matrices are indeed special cases of Vandermonde matrices. The value of x solely depends on m , ord ( g ) = n , the degree of the field representation, but not upon the field representation itself.

Example 7

In the following examples, p ( X ) is a monic irreducible polynomial in F 2 [ X ] , which serves as a representation for F q . The degree of p ( X ) is d , and hence, q = 2 d . The definitions of n , m , and x are as in (14). We do in-place computations for given input vectors to the matrices A and B t .

Case 1:

p ( X ) = 1 + X + X 2 + X 3 + X 4 + X 6 + X 10 + X 11 + X 12 ( d = 12 )
g as 1 + X 3 + X 6 + X 8 + X 10
m = 195
n = ord ( g ) = q − 1 = 4,095
x = 3,998

Case 2:

p ( X ) = 1 + X + X 2 + X 3 + X 5 + X 6 + X 8 + X 15 + X 16 ( d = 16 )
g as 1 + X 3 + X 7 + X 10 + X 11 + X 12 + X 13 + X 14
m = 51
n = ord ( g ) = 255
x = 230

To use Theorem 2 on case 2 of Example 7, we have n = 255 , m = 51 , q = 2 16 ; hence, τ = 1 and q n − m ⁄ ∣ K ∣ = 2 3264 ⁄ ∣ K ∣ so that coordinates are ( t , 2 3264 ⁄ ∣ K ∣ + ε ) -secure if the underlying nonlinear transformations are ( t , ε ) -secure.

Example 8

We take F = Z p with p = 2 64 − 2 32 + 1 , and hence,

ϕ ( p ) = p − 1 = 2 64 − 2 32 = 2 32 ( 2 32 − 1 ) = 2 32 ( 2 16 + 1 ) ( 2 8 + 1 ) ( 2 4 + 1 ) ( 2 2 + 1 ) ( 2 1 + 1 ) .

An element g of order 2 32 can be, for instance, g = 14 , 97 , 96 , 60 , 05 , 97 , 01 , 85 , 128 , which yields a large dimension n . From the latter value of g , it is easy to find generators with smaller orders. The quantity x is 0 for all cases.

To use Theorem 2 on Example 8, if we choose g = 2305843008676823040 so that n = 64 , and then choose m = 32 , for instance, q = p ; hence, τ = 1 and q n − m ⁄ ∣ K ∣ ≈ 2 2048 ⁄ ∣ K ∣ so that coordinates are approximately ( t , 2 2048 ⁄ ∣ K ∣ + ε ) -secure if the underlying nonlinear transformations are ( t , ε ) -secure.

To not sacrifice security for speed, a reasonable choice of T is a circulant permutation. This choice is likely to yield a safe system from the point of view of differential and linear correlation analysis. By the circulant permutation, we mean the matrix T defined as follows:

T = 0 0 … 0 1 1 0 … 0 0 0 1 … 0 0 ⋮ ⋱ ⋮ 0 0 … 1 0 .

We observe that the matrix T involves only moving values in memory.

6.3 Implementation

We made a C++ implementation of our scheme using the 2023 dialect. The goal of simply implementing our scheme is to analyze the feasibility of our scheme in practice for some parameters given in Section 6.2. Our implementation uses only the standard library and does not rely on any external or third-party library. Our implementation works with any field with characteristic two and the prime field F p with p = 2 64 − 2 32 + 1 . The field representation is hardcoded for fields with characteristic 2 as a std::bitset object. No advanced matrix multiplication algorithms are implemented at this time. Some careful programming is done to obtain fast algebraic operations over the fields mentioned above, including inversions. Secret keys, which are functions, are simply the sum of inversion with their input and output masked by random vectors reminiscent of truncated harmonic sums adapted for finite fields. The matrices A , B , and T are as in Section 6.2.

7 Conclusion

We continue the exploration of the properties of the family of permutations introduced by Gravel and Panario [1]. We have shown how to obtain bounds about the concrete security for linear bijective extensions from the family of nonlinear discrete extended functions. We have also analyzed the relation between the total variation distance, the entropy, and the number of rounds of the family of resulting permutations and discussed how to use the latter result to study the mixing rate of several compositions.

Acknowledgements

The authors thank Luc Devroye and Gilles Brassard for several helpful discussions. The authors are grateful for the reviewer’s valuable comments that improved the manuscript.

Funding information: The work of D. Panario is supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC), reference number RPGIN-2018-05328.
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
Conflict of interest: Prof. Daniel Panario is a member of the Editorial Board of the Journal of Mathematical Cryptology but was not involved in the review process of this article.
Informed consent: Not applicable.
Ethical approval: The conducted research is not related to either human or animals use.
Data availability statement: Not applicable.

Appendix A relaxation on perfect security

We recall that an encryption scheme ( G , E , D ) is perfectly secret if, for any probability distribution on ℳ , we have that

(A1) P { X = x ∣ Y = y } = P { X = x } .

The previous equality means that the distribution of messages is independent of the distribution of ciphertexts; equivalently, it means that no information about a message can be obtained from a ciphertext because the a priori distribution P { X = x } coincides with the a posteriori distribution P { X = x ∣ Y = y } computed by an attacker. Suppose we see the encryption as a stochastic process. In that case, the equality of distribution in Definition 2 implies that the stochastic matrix representing the transitions from { X = x } to { Y = y } after encryption is a doubly stochastic matrix. Appendix A explains a relaxation of Definition 2 that incorporates the total variation distance. Where is the random key in the previous equality? Without loss of generality, we may assume P { X = x } > 0 for all x ∈ ℳ . We can easily show equation (A1) is equivalent to assert that for any messages x 0 , x 1 , and uniform random key S ,

(A2) P { Y = E S ( x 0 ) ∣ X = x 0 } = P { Y = E S ( x 1 ) ∣ X = x 1 } .

A necessary condition for perfect secrecy to hold is that ∣ K ∣ ≥ ∣ ℳ ∣ ; Shannon [8] shows that this inequality can be an equality, that is, ∣ K ∣ = ∣ ℳ ∣ .

We rewrite equation (A2) more succinctly as follows:

(A3) P { C = c ∣ M = m 0 } = P { C = c ∣ M = m 1 } for all m 0 and m 1 .

The significance of equation (A3) is that for all m 0 and m 1 , the two conditional events { C = c ∣ M = m 0 } and { C = c ∣ M = m 1 } are equivalent under the counting measure, and their distributions are at distance 0. Hence, equation (A3) can be written as follows:

d ( P { C = c ∣ M = m 0 } , P { C = c ∣ M = m 1 } ) = 0 ,

or using the usual abuse of language as follows:

d ( C = c ∣ M = m 0 , C = c ∣ M = m 1 ) = 0 .

For notational convenience, let μ 0 and μ 1 denote the conditional distributions of { C = c ∣ M = m 0 } and { C = c ∣ M = m 1 } so that

μ 0 ( c ) = P { C = c ∣ M = m 0 } and μ 1 ( c ) = P { C = c ∣ M = m 1 } .

Hence, a natural relaxation on perfect secrecy is by requiring that

d ( μ 0 , μ 1 ) ≤ ε for some ε > 0 .

By the definition of the total variation distance, we have

(A4) d ( μ 0 , μ 1 ) = sup A ⊆ Ω { ∣ μ 0 ( A ) − μ 1 ( A ) ∣ } ≤ ε .

The supremum in equation (A4) can be replaced by max because of the finiteness of the underlying sample space Ω upon which the conditional random variable C given M = m is defined.

The event A from equation (A4) models an attacker with an unbounded time resource in a cryptographic setting. In statistical terms, A is a statistical test that distinguishes between μ 0 and μ 1 within a distance of at most ε , regardless of the time. Such statistical tests in discrete spaces can be viewed as a boolean function A : ℱ → { 0 , 1 } , where ℱ is the underlying σ -algebra of the measurable space. Because of the finiteness of Ω , the σ -algebra ℱ can be as large as the power set of Ω . Therefore, for all m 0 and m 1 the events { C = E S ( m 0 ) ∣ M = m 0 } and { C = E S ( m 1 ) ∣ M = m 1 } in equation (A1) can be written, perhaps more conveniently, as { A ( m b , E S ( m b ) ) = 1 } for b = 0 , 1 for all event A ; it must be true for all A because of the supremum in the definition of the metric. In cryptography, the event A is an attacker that we previously denoted by A . Hence, ( G , E , D ) is ( ∞ , ε ) -secure or simply ε -secret if and only if for all m 0 and m 1 it holds that

d ( μ 0 , μ 1 ) = max A ⊆ Ω { ∣ P { A ( m 0 , E S ( m 0 ) ) = 1 } − P { A ( m 1 , E S ( m 1 ) ) = 1 } ∣ } ≤ ε .

References

[1] Gravel C, Panario D. Feedback linearly extended discrete functions. J Algebra Appl. 2023;22(2):2350051. 10.1142/S0219498823500512. Search in Google Scholar

[2] Feller W. An introduction to probability theory and its applications. vol. 1 of probability and mathematical statistics. 3rd ed. New York: John Wiley and Sons Inc.; 1968. Search in Google Scholar

[3] Feller W. An introduction to probability theory and its applications. vol. 2 of probability and mathematical statistics. 2nd ed. New York: John Wiley and Sons Inc.; 1971. Search in Google Scholar

[4] Katz J, Lindell Y. Introduction to modern cryptography. Chapman & Hall/CRC Cryptography and Network Security Series. New York (US): CRC Press; 2020. Search in Google Scholar

[5] Lindell Y, editor. Tutorials on the foundations of cryptography. Cham (Switzerland): Springer International Publishing; 2017. 10.1007/978-3-319-57048-8. Search in Google Scholar

[6] Mullen GL, Panario D. Handbook of finite fields. New York (US): Chapman & Hall/CRC; 2013. 10.1201/b15006Search in Google Scholar

[7] Godement R. Cours d’algèbre. 3rd ed. Paris (France): Hermann; 1997. Search in Google Scholar

[8] Shannon CE. Communication theory of secrecy systems. Bell Syst Tech J. 1949;28(4):656–715. 10.1002/j.1538-7305.1949.tb00928.xSearch in Google Scholar

[9] Goldreich O, Goldwasser S, Micali S. How to construct random functions. J ACM. 1986;33(4):792–807. 10.1145/6490.6503Search in Google Scholar

[10] Dodis Y. Shannon impossibility, Revisited. In: Smith A, editor. Information theoretic security. Berlin, Heidelberg: Springer Berlin Heidelberg; 2012. p. 100–10. 10.1007/978-3-642-32284-6_6Search in Google Scholar

[11] Morris B. Improved mixing time bounds for the Thorp shuffle. Combinatorics Probability Comput. 2013;22(1):118–32. 10.1017/S0963548312000478Search in Google Scholar

[12] Morris B, Rogaway P, Stegers T. Deterministic Encryption with the Thorp Shuffle. J Cryptol. 2018;31(2):521–36. 10.1007/s00145-017-9262-zSearch in Google Scholar

[13] Thorp EO. Nonrandom shuffling with applications to the game of Faro. J Amer Stat Assoc. 1973;68(344):842–7. 10.1080/01621459.1973.10481434Search in Google Scholar

[14] Diaconis P, Fulman J. The mathematics of shuffling cards. Providence (Rhode Island state, US): American Mathematical Society; 2023. 10.1090/mbk/146Search in Google Scholar

[15] Hoang VT, Rogaway P. On generalized Feistel networks. Santa Barbara: IACR Cryptol ePrint Arch. 2010. p. 301. 10.1007/978-3-642-14623-7_33Search in Google Scholar

[16] Lai X, Massey JL. A proposal for a new block encryption standard. In: Proceedings of the Workshop on the Theory and Application of Cryptographic Techniques on Advances in Cryptology. EUROCRYPT ’90. Berlin, Heidelberg: Springer-Verlag; 1991. p. 389–404. 10.1007/3-540-46877-3_35Search in Google Scholar

[17] Junod P, Vaudenay S. FOX Specifications Version 1.2. Switzerland: École Polytechnique Fédérale de Lausanne; 2005. Search in Google Scholar

[18] von zur Gathen J, Gerhard J. Modern computer algebra. 3rd ed. USA: Cambridge University Press; 2013. 10.1017/CBO9781139856065Search in Google Scholar

[19] Hill R. A first course in coding theory. Oxford applied mathematics and computing science series. Oxford (UK): Clarendon Press; 1991. Search in Google Scholar

Received: 2024-02-20

Revised: 2024-08-19

Accepted: 2024-09-13

Published Online: 2024-12-10

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/jmc-2024-0010

Keywords for this article

pseudo-random objects; finite field; linear algebra; mixing rate; entropy; total variation distance

Creative Commons

BY 4.0