Home Reproducible families of codes and cryptographic applications
Article Open Access

Reproducible families of codes and cryptographic applications

  • Paolo Santini , Edoardo Persichetti EMAIL logo and Marco Baldi
Published/Copyright: September 17, 2021
Become an author with De Gruyter Brill

Abstract

Structured linear block codes such as cyclic, quasi-cyclic and quasi-dyadic codes have gained an increasing role in recent years both in the context of error control and in that of code-based cryptography. Some well known families of structured linear block codes have been separately and intensively studied, without searching for possible bridges between them. In this article, we start from well known examples of this type and generalize them into a wider class of codes that we call ℱ-reproducible codes. Some families of ℱ-reproducible codes have the property that they can be entirely generated from a small number of signature vectors, and consequently admit matrices that can be described in a very compact way. We denote these codes as compactly reproducible codes and show that they encompass known families of compactly describable codes such as quasi-cyclic and quasi-dyadic codes. We then consider some cryptographic applications of codes of this type and show that their use can be advantageous for hindering some current attacks against cryptosystems relying on structured codes. This suggests that the general framework we introduce may enable future developments of code-based cryptography.

MSC 2010: 11T71; 94A60

1 Introduction

Defining linear block codes that possess a certain inner structure and verify some regularity properties is a natural process in coding theory. Arguably, the most relevant example is represented by the class of cyclic codes, which includes several families of codes that proved to be important throughout the history of communications, such as BCH and Hamming codes, as well as the binary Golay codes, Reed–Solomon codes, and many others. This class is defined by the property of having codewords that are invariant under the action of a specific permutation, namely the cyclic (circular) shift, which consists of cyclically rotating a vector by one position to the right (equivalently, to the left). Other examples which are well known in the literature include constacyclic codes, negacyclic codes, quasi-cyclic codes, and many others.

Recently, this research direction has been investigated further: Misoczki and Barreto in 2009 introduced quasi-dyadic codes [1], which contain codewords invariant under a different type of permutation. The work was motivated by its implications for the McEliece cryptosystem [2], and in particular by the necessity of having a family of codes whose generator and parity-check matrices can be represented in a compact way. This is because, in code-based cryptography, the public key of an encryption (or signature) scheme usually consists precisely of a generator or parity-check matrix of a linear block code. With the size of the codes used in code-based cryptography (typical code lengths are in the order of 1 0 3 to 1 0 4 ), describing a whole matrix results in a public key of several kilobytes, and this size increases quadratically in the code length. This has historically prevented the use of the original McEliece cryptosystem, which exploits random-looking public codes, in many applications. On the other hand, structured codes admit a generator and parity-check matrix which can be entirely described by one or few rows; this allows for a very important reduction in public key size, and it is arguably a fundamental step toward making code-based cryptography truly practical. Previous efforts to reduce key size were centered on quasi-cyclic algebraic codes [3] and have been since then extended to codes of a different nature, namely the Low-Density Parity-Check (LDPC) codes [4] and their recent generalization known as Moderate-Density Parity-Check (MDPC) codes [5]. These codes are characterized by sparse parity-check matrices and admit matrices in quasi-cyclic form, formed by circulant square blocks. Due to their efficient decoding algorithms and the lack of additional algebraic structure that could lead to structural attacks, schemes based on Quasi-Cyclic Low-Density Parity-Check (QC-LDPC) codes [6] and Quasi-Cyclic Moderate-Density Parity-Check (QC-MDPC) codes [5] are among the most promising solution in this area.

The importance of code-based cryptography has risen dramatically in modern times due to the work of Shor [7], who showed how it will be possible to effectively break cryptography based on “classical” number theory problems by introducing polynomial-time algorithms for factoring large integers and computing discrete logarithms on a quantum computer. This calls for cryptographic primitives that rely on different hard problems, which will not be affected once quantum computers of an appropriate size will be available. Code-based cryptography is one of the most important areas in this scenario, and ever since McEliece’s seminal work in 1978 [2], it has shown no vulnerabilities against quantum attackers. Moreover, generic decoding attacks, which have exponential complexity, have improved only marginally over nearly 40 years of cryptanalysis. Together with lattice-based schemes, code-based cryptography is at the basis of many candidates for the post-quantum standardization call recently launched by NIST [8].

In this article, we provide a general framework for the definition of structured codes, which are of increasing interest in several McEliece and Niederreiter cryptosystem variants. First, we introduce the notion of -reproducible codes as a general framework for describing both structured and unstructured codes. Then, we introduce some special families of -reproducible codes, that we denote as compactly reproducible (CR) codes, which require a smaller-than-maximum number of degrees of freedom for the representation of each code belonging to the same family. This generalizes existing families of structured codes used in code-based cryptosystems. We also propose a framework for constructing -reproducible codes of any kind and present concrete families of non-trivial CR codes which have not appeared in literature before. Our goal is to provide a generic framework to serve as a basis for future constructions, as indeed was the case in ref. [9], which references a preprint version of this work.

To highlight the importance of these codes in cryptography, we mention that among the 26 candidates that were admitted to the second round of the NIST’s standardization effort [10], 5 are based on structured random and pseudo-random codes, which are the focus of this article. In particular, BIKE and LEDAcrypt are two public-key encryption schemes based on, respectively, QC-MDPC and QC-LDPC codes, which naturally fit into the general framework we describe in this article. The same occurs for the system named HQC, in which part of the public key consists in a random QC code. Although we focus on the Hamming metric case, the framework we describe could also be applied to the generation of structured codes in the rank metric (with the proper modifications). ROLLO and RQC are other two candidates that could be encompassed by such a framework in the rank metric domain.

The article is organized as follows. In Section 2, we recall some basic concepts and introduce the notation we use throughout the article. In Section 3, we introduce -reproducible matrices, and we use them to define the new class of codes in Section 4. Section 5 is devoted to the study of their possible use in code-based cryptosystems and provides some practical constructions for this purpose. In Section 6, we draw some conclusions.

2 Preliminaries and notation

We denote with F q the finite field with q elements, where q is a prime power. For two sets X and Y, X Y denotes the set of all maps from Y to X. For a set S we then denote by 2 S its power set, i.e., the set containing all possible subsets of S, exploiting the well known bijection with the set of functions from S to { 0 , 1 } . We use bold letters to denote vectors and matrices. Given a vector a , we refer to its element in position i as a i . The size-k identity matrix is denoted as I k , while the k × n null matrix is denoted as 0 k × n . Finally, we use the term pseudo-ring to denote a structure that satisfies all the ring axioms, apart from the existence of the multiplicative identity. Such a structure is also typically known as rng.

2.1 Coding theory background

A linear code C is a k-dimensional subspace of the n-dimensional vector space over the finite field F q . The parameters n (length) and k (dimension) are positive integers with k n . The value r = n k is known as codimension of the code.

Definition 2.1

(Hamming metric) The Hamming weight wt ( x ) of a vector x F q n is the number of its non-zero entries. The Hamming distance d ( x , y ) between two vectors x , y F q n is defined as the weight of their difference, i.e., d ( x , y ) = wt ( x y ) . The minimum distance d of a code C is defined as the minimum distance between any two different codewords of C , or equivalently as the minimum weight over all non-zero codewords.

A linear code of length n, dimension k, and minimum distance d is called an [ n , k , d ] -code.

The error-correcting capability of a linear code is connected to its minimum distance, and in particular it corresponds to ( d 1 ) / 2 under bounded distance decoding. When soft-decision decoding is used, a linear block code with distance d may correct up to d 1 symbol errors.

Definition 2.2

(Generator and parity-check matrices) Let C be a linear code over F q . We call generator matrix of C a k × n matrix G whose rows form a basis for the vector space defined by C , i.e.:

C = { x G : x F q k } .

For any matrix H and any vector x , the vector H x T is called syndrome of x . We then call parity-check matrix of C a full rank r × n matrix H such that every codeword belonging to C has syndrome 0 with respect to H , i.e.,

C = { x F q n : H x T = 0 } .

Note that the parity-check matrix of a code C is also a generator matrix of the dual code C , i.e., the linear code formed by all the words of F q n that are orthogonal to C . It follows that for any generator matrix G and parity-check matrix H of a code, we have H G T = 0 r × k .

Both matrices are required to have full rank. Moreover, note that, clearly, neither matrix is unique: for instance, given a generator matrix G it is always possible to obtain another generator matrix for the same code by a linear transformation, that is, the left multiplication by an invertible k × k matrix S , so that G = S G . This corresponds to a change of basis for the vector space. A similar property is verified by the parity-check matrix. Finally, two generator matrices generate equivalent codes if one is obtained from the other by a permutation of columns. These two facts are at the basis of the McEliece cryptosystem.

Joining the two properties above, we can write any generator matrix G in systematic form as G = [ I k A ] , where denotes concatenation. If C is generated by G = [ I k A ] , then a (systematic) parity-check matrix for C is H = [ A T I r ] .

2.2 The McEliece cryptosystem

The McEliece public-key encryption scheme [2] was introduced by R.J. McEliece in 1978. The original scheme uses binary Goppa codes, with which it remains unbroken (with a proper choice of parameters), but the scheme can be used with any class of codes for which an efficient decoding algorithm is known.

2.2.1 Key generation

Let G be a generator matrix of a linear [ n , k , d ] -code over F q with an efficient decoding algorithm D which can correct up to t = ( d 1 ) / 2 errors under bounded-distance decoding. Let S be an invertible k × k matrix and P be a random n × n permutation matrix over F q . The private key is ( S , G , P ) and the public key is G S G P .

2.2.2 Encryption

To be able to encrypt a plaintext, it has to be represented as a vector m of length k over F q . The encryption algorithm chooses a random error vector e of weight t in F q n and computes the ciphertext c = m G + e .

2.2.3 Decryption

The decryption algorithm first computes c ˆ = c P 1 = m S G + e P 1 . As P is a permutation matrix, e P 1 has the same weight as e . Therefore, D can be used to decode the errors and obtain m ˆ = m S = D ( c ˆ ) . Finally, the plaintext is retrieved as m = m ˆ S 1 .

In successive papers, the original McEliece cryptosystem was refined and tweaked many times; for example, it is now common practice to replace the scrambling method given by S and P with the computation of the systematic form, i.e., G is the systematic form of G . This is possible when the McEliece cryptosystem is embedded into a larger framework to convert it into an IND-CCA2[1] secure Public Key Encryption (PKE) scheme or Key Encapsulation Mechanism (KEM), and has the additional advantage (beyond the obvious simpler formulation) of a smaller public key (since only the non-identity submatrix needs to be stored).

The (one-way) security of McEliece is based on the following hard problem.

Problem 2.3

(Syndrome decoding problem) Given an  r × n full-rank matrix H and a vector s , both with entries in F q , and a non-negative integer t; find a vector e F q n of weight t such that H e T = s T .

The Syndrome Decoding Problem (SDP) is a well known problem in complexity theory, and it has been shown to be NP complete [11]. Note that, since the McEliece cryptosystem uses an [ n , k , d ] code, the number of error vectors of weight t is n t ( q 1 ) t , while the number of possible syndromes is q r . Therefore,

n t ( q 1 ) t < q r

is a necessary condition for the existence of at most one solution to the problem, i.e., for the decoding process to have a unique solution.

2.3 Sparse-matrix codes

One of the most delicate points about the McEliece cryptosystem is that, in order for the security to reduce to the SDP, it is assumed that the matrix used as the public key is indistinguishable from a uniformly random matrix of the same size. This is a plausible assumption, which however has been shown to be false in several cases. For many variants of McEliece (e.g., ref. [12]), in fact, this opened up avenues of attack which simply ruled out the variant altogether. Even the long-standing binary Goppa codes have been shown to be distinguishable from random codes [13] when the code rate is chosen carelessly (too high). This is arguably one of the main reasons that pushed researchers away from algebraic codes and toward codes of a different nature.

LDPC codes are defined by parity-check matrices whose main requirement is to be sparse, with a very low row and column weight. These codes are easy to generate and moreover admit a variety of choices for the decoding algorithm D , inspired by the Bit Flipping (BF) decoder of Gallager [14], which is very efficient in practice. For these reasons, this class of codes is a natural candidate for the McEliece cryptosystem. A first instantiation was studied in ref. [4], where a private LDPC matrix was considered, along with a linearly transformed version of the same matrix used as the public key. As highlighted in ref. [4], security of the private LDPC code is not preserved unless the public matrix is dense. Thus, in such a framework, the private LDPC code C is represented through its sparse parity-check matrix H , while the public key corresponds to a dense generator matrix G for C . It is important to note that, from the knowledge of G , the opponent can compute several parity-check matrices H for C , but they will not lead to an efficient decoding, unless they are sparse. As explained in Section 2.2, typically having G in systematic form is enough to guarantee such a property. Indeed, we can always write H = [ H 0 H 1 ] , where H 0 and H 1 have size r × k and r × r , respectively, and H 1 is full rank. Then, the corresponding generator matrix in systematic form is obtained as G = [ I k H 0 T H 1 T ] . Typically (unless for particular choices of H ), the inverse of a sparse matrix is dense, and so H 1 T is dense: in such a case, the multiplication of H 0 T by H 1 T is enough to hide the structure of H into the one of G .

It is important to note that, due to their probabilistic nature, decoding algorithms for LDPC codes are characterized by a non-trivial Decoding Failure Rate (DFR). This means that, in the case of a decoding failure, Bob must ask Alice for a retransmission of the plaintext, encrypted with a different error vector. In order to avoid frequent retransmissions, which would obviously increase the latency of the system, the DFR must be kept sufficiently low; typically, values are in the range of 1 0 6 to 1 0 9 . As we will discuss later, this fact represents a crucial difference, with respect to the case of algebraic codes, since it leads to a new family of attacks aimed at recovering the secret key by observing Bob’s reactions. This also has implications on the security model against a Chosen Ciphertext Attack (CCA) for these systems [15]. Therefore, finding reliable models for their DFR is necessary to ensure that its value is negligible for those instances designed to achieve indistinguishability under chosen ciphertext attack (IND-CCA) [16].

2.4 Main attacks

We briefly recall the two main types of attacks that can be mounted against the McEliece cryptosystem and its variants when using sparse-matrix codes.

2.4.1 Decoding attacks

Decoding attacks are aimed at recovering the plaintext from the ciphertext by performing decoding through the public code. In fact, being unable to retrieve the private code representation that enables efficient decoding, an attacker can still try to perform decoding through the public code, which looks like a general random code.

At the current state of the art, the best procedure for this task is the Information-Set Decoding (ISD) algorithm, which was first introduced by Prange in 1962 [17] and has received many improvements during the years [18,19, 20,21]. However, ISD and all its variants are characterized by an exponential complexity: the search for a weight-w codeword has asymptotic complexity equal to 2 α w , where the value of the constant α depends on the code parameters and on the particular algorithm we are analyzing. Even in a quantum setting, ISD algorithms are still characterized by exponential complexity: indeed, the only known application of a quantum algorithm to an ISD algorithm, which consists in using Grover’s algorithm [22] to speed up the search, leads to a reduction in the complexity, with respect to the classical case, which cannot be larger than half the exponent α [23].

2.4.2 Key-recovery attacks

When LDPC codes are used, key recovery attacks boil down to recovering low-weight codewords from the dual of the public code, which is again a decoding problem. Let us denote by C the dual code of C , having generator matrix H . Since the rows of H are sparse, and of maximum weight w n , they are minimum-weight codewords in C with overwhelming probability, and so can be searched with a generic algorithm for finding low-weight words, for which ISD algorithms can be used as well.

Since the difficulty of such a task increases with the weight of the searched codewords, it makes sense to relax the notion of “low-density”: the authors in ref. [5] introduce the notion of “moderate-density” by increasing the allowed row weight in the parity-check matrix from O ( log ( n ) ) to O ( n ) , thus defining moderate-density parity-check (MDPC) codes. It is still possible to decode MDPC codes with the previously mentioned algorithms; the error-correction capacity gets obviously worse, but the gain in security makes this tradeoff worth it. In the end, the adoption of LDPC and MDPC codes in modern variants of the McEliece cryptosystem does not reduce the security against key recovery attacks, since attacks deriving from the structure of the secret code can be easily avoided by fixing the minimum weight of the rows of H .

2.5 Structured sparse-matrix codes

Using generic LDPC and MDPC codes without any structure in the McEliece cryptosystem is not a practical choice, as pointed out in ref. [4]. This is because the need to avoid sparse public matrices makes the resulting public key sizes significantly larger than the ones we can obtain with other families of codes, like Goppa codes. In fact, even if the private sparse parity-check matrix can be compactly represented through the positions of its non-null entries (and so, a row with Hamming weight equal to w can be stored just with w log 2 n log 2 q bits), applying this technique to the public key is not possible, since a sparse G might compromise the security of the system. One way to avoid this issue is to add some structure to the code family. This idea was first introduced by considering Quasi-Cyclic (QC) codes [3] and was then extended to LDPC codes [24] and algebraic codes [25]. In all cases, the authors propose to use QC codes to reduce the public key size. A QC code can be simply seen as a code which admits parity-check and generator matrices made of circulant blocks. A circulant matrix is a matrix in which every row is obtained as the cyclic shift of the previous one; an example of a circulant matrix is

A = a 0 a 1 a p 1 a p 1 a 0 a p 2 a 1 a 2 a 0 .

Any circulant matrix is fully described by one of its rows, conventionally the first one. This means that, in the McEliece cryptosystem, we can describe the public key completely using just the first row of each one of its circulant blocks; it is clear that this results in a significant reduction in the public key size with respect to instances using non-structured public matrices. However, this additional structure presents some drawbacks, since it exposes the system to structural weaknesses. In particular, the QC structure summed to the algebraic structure of the underlying codes provides a lot of information to the attacker and opens up the possibility of structural attacks aimed at recovering the private code. The most famous structural attack of this type is known as FOPT [26] and works by solving a multivariate algebraic system with Gröbner bases techniques together with the QC property, which greatly reduces the number of unknowns of the system. As a result, it seems very hard to provide secure schemes which involve QC algebraic codes (Goppa, GRS etc.), while still obtaining an effective key reduction: the recent NIST proposal BIG QUAKE [27] shows a reduction of about 1/4 in the key size compared to what would be obtained in a “classical” McEliece using unstructured binary Goppa codes.

Therefore, once again, it seems safer to deploy code-based schemes using sparse-matrix codes, since in this case there is no additional algebraic structure, and the QC property alone is not enough to provide a structural attack. However, some care is still necessary when using sparse-matrix codes. In particular, two main aspects have to be considered:

  • ISD algorithms might obtain a speed up from the QC structure. This results in a complexity reduction for the relevant attacks. Such a speedup is achieved for both key recovering attacks and decoding attacks (following from the Decoding One Out of Many [DOOM] approach [28]). The attack complexity remains exponential in the key length, but the attack speedup leads to an increase in the row weight of H and in the number of errors to be used during encryption, which in turn results in an increase in the key length.

  • It has been recently shown that the probability of a decoding failure depends on the number of overlapping ones between the error vector and rows of H [29]. In addition, in a circulant matrix, all the rows are characterized by the same set of cyclic distances between set symbols (given two ones at positions i and j, the corresponding cyclic distance is computed as min { ± ( i j ) mod p } , with p being the circulant size). Based on these considerations, it has been shown in ref. [29] that an adversary can mount a key recovery attack by impersonating Alice, producing many ciphertexts and requesting Bob to decrypt them. The adversary can then exploit Bob’s reactions concerning decoding failures, which are of public knowledge, in order to gather information about the secret key structure. The set of all distances of the rows of H is called distance spectrum and can be used to reconstruct H . This problem can be related to a graph problem, in which a row of H corresponds to a clique with maximum size. For a sparse QC matrix, such a graph is sparse as well, which gives a small number of cliques. This means that, once the distance spectrum is known, recovering the corresponding parity-check matrix is not a hard task in most cases.

Currently, the countermeasures that have been devised against the aforementioned reaction attacks exploit the use of ephemeral keys [30,31], of special iterative decoders that allow theoretical modeling of their failure rate [32,33], or of particular families of codes that make the reconstruction of the secret key unfeasible [34]. However, all these solutions come with some price to key pair must be generated for each encryption (in the first case) or the size of the public key must be increased (in the second and third cases).

As we will see in the rest of this article, the idea of using some structure to reduce the public key size can be strongly generalized. In particular, we will show that existing solutions are just very special cases of a wider framework, characterized by a large variety of options. This generalization comes with no increase in public key size, while on the other hand potentially allows us to avoid DOOM and/or reaction attacks, or at least to reduce their efficiency.

3 Reproducibility

We now introduce the main notions we use to provide a generalized approach to the design of structured codes.

Definition 3.1

Let n , k N , with k = m where also , m N . Let = { σ 0 , , σ 1 } be a family of linear maps, with σ i : F q n F q n (thus, we can think of each σ i as a square matrix of size n and values in F q ). We say that a k × n matrix A is an -reproducible matrix if there exists an m × n matrix a such that

(3.1) A = a σ 0 a σ 1 a σ 1 .

We call m the reproducible order and a the signature set and write A = ( a ) . We say that a code C F q n is an -reproducible code if it admits a generator matrix and/or a parity-check matrix which are -reproducible.

Let us consider an -reproducible code described by an -reproducible generator matrix G F q k × n such that, for = { σ 0 , , σ 1 } , we have

(3.2) G = g σ 0 g σ 1 g σ 1 ,

where g is the m × n signature set of G . Then, for the fixed family of linear maps, the code is completely represented through g . The same reasoning applies to an -reproducible code described by an -reproducible parity-check matrix H F q r × n with signature set h .

Proposition 3.2

Any [ n , k , d ] -code over F q is an -reproducible code for at least one choice of and the corresponding signature set. Such a choice corresponds to = 1 , m = k , g = G , and = { I n } , where I n is the n × n identity matrix. Equivalently, the code can be described through the parity-check matrix H considering = 1 , m = r , h = H , and = { I n } .

Once the family is defined, an -reproducible matrix can be described just by its signature set. Consequently, when the family of maps is fixed and universally known, having an -reproducible generator matrix (or equivalently parity-check matrix) with > 1 leads to a more compact representation of the code with respect to storing its full generator or parity-check matrix. This happens because is universally known, and it does not need to be included in the code representation, thus the signature set alone is sufficient for representing the code.

If we consider a single code, then it is always possible to find some family according to which such a code has an -reproducible generator matrix (or equivalently parity-check matrix) with > 1 . This is detailed in the following two propositions.

Proposition 3.3

Any single [ n , k , d ] -code over F q admits multiple generator and parity-check matrices, thus it can be an -reproducible code for several choices of and the corresponding signature set.

Proof

The proof is straightforward and omitted for saving space.□

Proposition 3.4

For any single [ n , k , d ] -code C over F q , a family with = k entries can be defined according to which such a code admits an -reproducible generator matrix with reproducible order m = 1 . Similarly, a family with = r entries can be defined according to which C admits an -reproducible parity-check matrix with reproducible order m = 1 .

Proof

Let G F q k × n be a valid generator matrix for the code C . Let us consider the ith row g i of G and define σ i , i [ 1 ; k ] , as the n × n matrix F q n × n having its first row equal to g i , and all the other rows filled with arbitrary entries. Then, G is easily obtained as G = F ( a ) , with a = [ 1 , 0 , 0 , , 0 ] . The fact that C admits an -reproducible parity-check matrix with reproducible order m = 1 can be proved with a similar reasoning.□

From Proposition 3.4, we know that any single code is -reproducible for some family yielding > 1 and m < k (considering the generator matrix) or m < r (considering the parity-check matrix). However, if instead of a single code we consider a group of codes and aim at representing all of them as -reproducible codes for the same, universally known family of maps , then it is not always possible to find a solution with > 1 and m < k (considering the generator matrix) or m < r (considering the parity-check matrix). The only trivial solutions that always exist are those of the type considered in Proposition 3.2, yielding = 1 and m = k (considering the generator matrix) or m = r (considering the parity-check matrix), and thus not enabling more compact code representations than those corresponding to storing the full generator or parity-check matrix. We are instead interested in group of codes that, besides these trivial solutions, also admit -reproducible generator and parity-check matrices for a fixed with > 1 and m < k or m < r , as detailed in the next definition.

Definition 3.5

We say that a group of [ n , k , d ] -codes over F q are Compactly Reproducible (CR) codes if, for a fixed with > 1 , each of them admits at least one -reproducible generator matrix with m < k , or at least one -reproducible parity-check matrix with m < r , thus enabling a more compact code representation with respect to storing the full generator or parity-check matrix.

The condition for a code to be CR can be generalized, in order to take into account other structures that enable a compact representation.

Definition 3.6

Let A i , j F q k i , j × n i , j be -reproducible matrices, each with its own dimensions, signature set a i , j F q m i , j × n i , j , and family of linear functions i , j . Let A be a matrix obtained using as building blocks the matrices A i , j ; then, we say that A is -quasi-reproducible.

Definition 3.7

Let us consider a group of linear codes over F q . If, for a fixed with > 1 , any code C in such a group can be described by an -quasi-reproducible generator matrix G F q k × n such that m < k , and/or an -quasi-reproducible parity-check matrix H F q r × n such that m < r , then we say that C is a quasi-compactly reproducible (QCR) code.

It is clear that, in order to describe an -quasi-reproducible matrix, we just need the ensemble of the signature sets of its building blocks, together with the corresponding families of linear functions. Quasi-reproducibility generalizes the concept of reproducibility, since each reproducible code can be seen as a particular quasi-reproducible code, with a generator matrix described just by one signature set. A particular type of quasi-reproducible code is the one in which the blocks A i , j are square matrices, defined by the same family .

We are now ready to introduce a very important notion regarding the set of -reproducible matrices obtained via a given family of transformations. Specifically, consider a family of linear functions = σ 0 , σ 1 , , σ p m 1 , where each σ i is a p × p matrix over F q . We denote by q , m the set of all -reproducible matrices over F q obtained via signatures of size m × p and , equipped with the usual operations of matrix sum and multiplication. Then the following results[2] hold.

Theorem 3.8

The set q , m is an abelian group with respect to the sum.

Proof

Showing that q , m is an additive abelian group is quite straightforward. In fact, the signature of the sum of two matrices corresponds to the sum of the original signatures. Commutativity and associativity follow from the element-wise sum between two matrices. The identity is given by the null signature (i.e., the signature made of all zeros), while the inverse of a matrix with signature a is the matrix with signature a .□

On the other hand, it is possible to show that the set, with respect to the multiplication, is a semigroup; in this case, the only requirements are closure and associativity. While associativity easily follows from the properties of the multiplication between two matrices, in order to guarantee closure, we must make an additional assumption.

Theorem 3.9

q , m is a semigroup with respect to the multiplication if and only if for every matrix M q , m , we have

σ i M = M σ i , i N , 0 i p m 1 .

Proof

We show that commutativity is necessary first. For what we discussed above, we only need to prove closure. Let A and B be two matrices of q , m , with respective signatures a 0 , b 0 , that is,

A = a 0 a 0 σ 1 a 0 σ p m 1 = a 0 a 1 a p m 1 , B = b 0 b 0 σ 1 b 0 σ p m 1 = b 0 b 1 b p m 1 .

Multiplying these two matrices we get

(3.3) C = A B = a 0 B a 1 B a p m 1 B = a 0 B a 0 σ 1 B a 0 σ p m 1 B = c 0 c 1 c p m 1 .

Now by hypothesis

(3.4) c i = a 0 σ i B = a 0 B σ i = c 0 σ i ,

for all i p m 1 . It follows that C is -reproducible and defined by .

Conversely, suppose q , m is a semigroup, and in particular that it is closed with respect to multiplication. Consider again two matrices A and B and their product, defined as in equation (3.3). Since by hypothesis C q , m , and therefore is -reproducible, we have that c i = c 0 σ i for all i p m 1 . It follows that

(3.5) a 0 σ i B = c i = c 0 σ i = a 0 B σ i .

Now, since equation (3.5) holds in general for every signature a 0 , it must be that σ i B = B σ i , which concludes the proof.□

Finally, note that multiplication distributes over addition, as usual. This means that, if Theorem 3.9 holds, q , m verifies all the requisites of a mathematical pseudo-ring, i.e., a ring without multiplicative identity, as defined in Section 2. We call this the -reproducible pseudo-ring induced by over F q .

3.1 Pseudo-rings induced by families of permutations

In the particular case of signatures made of just one row (i.e., reproducible order m = 1 ) and the functions σ i being permutations, we have a further result, which is described in Theorem 3.10. We point out that all the results we present in this section can be generalized, in order to consider the case m > 1 , but we will not go into further details here. Since a p × p permutation corresponds to a matrix in which every row and column has weight equal to 1, it can equivalently be described as a bijection over [ 0 , p 1 ] N . Given a permutation matrix σ i , we denote the corresponding bijection as f σ i . If the element of σ i in position ( v , z ) is equal to 1, then f σ i ( v ) = z . The inverse of f σ i is denoted as f σ i 1 , which is the bijection associated with the permutation matrix σ i 1 = σ i T ; if f σ i ( v ) = j , then f σ i 1 ( j ) = v . Let a and a be two row vectors with entries { a 0 , a 1 , a 2 , } and { a 0 , a 1 , a 2 , } , respectively, such that a = a σ i . Then, a j = a f σ i 1 ( j ) . If instead a T = σ i a T , then a j = a f σ i ( j ) . We use f σ i f σ j to denote the bijection defined by the application of f σ i after f σ j . In other words, f σ i f σ j corresponds to the permutation matrix σ i σ j , and f σ i f σ j ( v ) = f σ i ( f σ j ( v ) ) . The identity I p can be seen as the particular permutation that does not change the order of the elements; the corresponding bijection, which will be denoted as f I p , is such that each element is mapped into itself (in other words, f I p ( v ) = v ).

Theorem 3.10

Let = { σ 0 = I p , σ 1 , , σ p 1 } be a family of linear transformations, with each σ i being a permutation, and suppose that induces the -reproducible pseudo-ring q , 1 over F q . Then, the following relation must be satisfied

σ j σ i = σ f σ i ( j ) , i , j N , 0 i p 1 , 0 j p 1 .

Proof

Since q , 1 is a pseudo-ring, we know from Theorem 3.9 that, for every matrix B q , 1 and every function σ i , it must be σ i B = B σ i . In particular, the left-hand term multiplication of σ i by B corresponds to a row permutation, such that

(3.6) σ i B = b f σ i ( 0 ) b f σ i ( 1 ) b f σ i ( p 1 ) = = b 0 σ f σ i ( 0 ) b 0 σ f σ i ( 1 ) b 0 σ f σ i ( p 1 ) ,

where b i denotes the ith row of B . The product B σ i instead defines a column permutation of B , and can be expressed as

(3.7) B σ i = b 0 σ 0 b 0 σ 1 b 0 σ p 1 σ i = b 0 σ 0 σ i b 0 σ 1 σ i b 0 σ p 1 σ i .

Putting together equations (3.6) and (3.7), we obtain

(3.8) σ j σ i = σ f σ i ( j ) ,

which must be satisfied for every pair of indexes ( i , j ) .□

Starting from the result of Theorem 3.10, we can easily derive some other properties that must satisfy.

Corollary 3.11

Let be a family of permutations such that the induced q , m is a pseudo-ring. Then, has the following properties:

  1. f σ i ( 0 ) = i , i ;

  2. i j s.t. f σ i f σ j = f I p .

Proof

Since satisfies the hypothesis of Theorem 3.10, we have

(3.9) σ f σ i ( 0 ) = σ 0 σ i = I p σ i = σ i ,

which can be satisfied only if f σ i ( 0 ) = i , and this proves property ( a ) .

Since each f σ i is a bijection of the integers in [ 0 , p 1 ] , we know that, for a fixed value of i, there is a value j [ 0 , p 1 ] such that f σ i ( j ) = 0 . Then, we have

(3.10) σ j σ i = σ f σ i ( j ) = σ 0 = I p .

In other words, the bijections corresponding to f σ i and f σ j are one the inverse of the other, and this proves property ( b ) .□

Corollary 3.12

Let be a family of permutations such that the induced q , m is a pseudo-ring. Then, q , 1 is a ring, which we call, by analogy, -reproducible ring induced by .

Proof

Let us show that q , 1 contains the multiplicative identity, i.e., the p × p identity matrix. Because of Corollary 3.11, is formed by p × p permutations such that f σ i ( 0 ) = i , i . If we generate the element of q , 1 corresponding to the signature u = [ 1 , 0 , , 0 ] , we easily obtain the p × p identity matrix I p .□

Theorem 3.13

Let be a family of permutations such that the induced q , m is a pseudo-ring. Then, q , 1 is an -reproducible ring and the invertible elements of q , 1 form a multiplicative group.

Proof

Based on Corollary 3.12, q , 1 is an -reproducible ring provided with multiplicative identity. Now, we need to prove that any non-singular matrix in q , 1 admits inverse in q , 1 . Let us consider a matrix A q , m , with signature a , and let B be its inverse. Since A B = I p , we have

A B = a a σ 1 a σ p 1 B = I p = u u σ 1 u σ p 1 ,

with u = [ 1 , 0 , , 0 ] as in Corollary 3.12. Then we have a σ i B = u σ i . For i = 0 , we have u = a B . Hence, for whichever value i, we get

a σ i B = u σ i = a B σ i ,

which can be satisfied for whichever a only if σ i and B commute. Because of Theorem 3.9, this means that B q , 1 .□

Sum and multiplication are not the only matrix operations we consider. In Theorem 3.14, we analyze how transposition acts on the matrices belonging to an -reproducible pseudo-ring q , 1 .

Theorem 3.14

Let q , 1 be an -reproducible pseudo-ring; if

f σ j 1 ( i ) = f σ v 1 ( 0 ) , v = f σ i 1 ( j ) , i , j s.t. 0 i p 1 , 0 j p 1

then q , 1 is closed under the transposition operation.

Proof

Let A q , 1 , with signature a , and denote as B = A T its transpose. The ith row of B corresponds to the ith column of A . In particular, the ith column of A is defined as

a i a f σ 1 1 ( i ) a f σ 2 1 ( i ) a f σ p 1 1 ( i ) .

Because B is the transpose of A , the ith row of B corresponds to the ith column of A . Let us denote as b 0 the first row of B , that is,

(3.11) b 0 = [ a 0 , a f σ 1 1 ( 0 ) , , a f σ p 1 1 ( 0 ) ] = [ a f σ 0 1 ( 0 ) , a f σ 1 1 ( 0 ) , , a f σ p 1 1 ( 0 ) ] .

Let us consider the ith row of B , and denote it as b i ; if transposition has closure in q , 1 , then it must be

(3.12) b i = [ a i , a f σ 1 1 ( i ) , , a f σ p 1 1 ( i ) ] = [ a f σ 0 1 ( i ) , a f σ 1 1 ( i ) , , a f σ p 1 1 ( i ) ] = b 0 σ i .

Now suppose that f σ i ( v ) = j ; then, the jth entry of b i corresponds to the vth entry of b 0 , that is, a f σ v 1 ( 0 ) . In other words, we have b i , j = a z , with

(3.13) z = f σ v 1 ( 0 ) , v = f σ i 1 ( j ) .

In order to satisfy eq. (3.12), a z must be equal to the jth entry of the ith column of A , that is, a f σ j 1 ( i ) . Then, it must be f σ j 1 ( i ) = z , that is,

(3.14) f σ j 1 ( i ) = f σ v 1 ( 0 ) , v = f σ i 1 ( j ) ,

which concludes the proof.□

Depending on the properties stated in the previous theorems, the family might induce different algebraic structures over F q p × p . In particular, let us consider the case of corresponding to q , 1 satisfying both Theorems 3.13 and 3.14. Let A be a square matrix whose elements are picked from q , 1 . By definition, we have A 1 = det ( A ) 1 adj ( A ) , where det ( A ) is the determinant of A and adj ( A ) is the adjugate of A . Computing det ( A ) involves only sums and multiplications: this means that det ( A ) q , 1 ; because of Theorem 3.13, det ( A ) 1 q , 1 . Computing adj ( A ) involves sums, multiplications and transpositions: because of Theorem 3.14, we have that the entries of adj ( A ) are again elements of q , 1 . This means that A 1 is a matrix whose elements belong to q , 1 , and so has the same -quasi-reproducible structure of A .

3.2 Known examples of -reproducible pseudo-rings

In Section 3.1, we have described some properties that a family of permutations must have to guarantee that it induces algebraic structures on F q p × p . Well-known cases of such objects, with common use in cryptography, are circulant matrices and dyadic matrices.

3.2.1 Circulant matrices

As we have seen before, a circulant matrix is a p × p matrix for which each row is obtained as the cyclic shift of the previous one. In particular, a circulant matrix can be seen as a square -reproducible matrix, whose signature corresponds to the first row and the functions σ i defining correspond to π i , where π is the unitary circulant permutation matrix with entries

(3.15) π l , j = 1 if l + 1 j mod p 0 otherwise .

Basically, the bijection representing π is defined as

(3.16) f π ( v ) = v + 1 mod p .

It can be easily shown that

(3.17) f σ i ( v ) = f π i ( v ) = f π f π f π i times ( v ) = v + i mod p ,

which leads to π p = I p and π i π j = π i + j mod p . Since permutation matrices are orthogonal, their inverses correspond to their transposes, and thus ( π i ) T = π p i . With these properties, we have

(3.18) σ i σ j = π i + j mod p = σ i + j mod p ,

which is compliant with Theorem 3.10, since f σ i ( j ) = i + j mod p . With some simple computations, it can be easily shown that circulant matrices satisfy Theorem 3.14 and that the multiplication between two circulant matrices is commutative.

3.2.2 Dyadic matrices

A dyadic matrix is a p × p matrix, with p being a power of 2, whose signature is again its first row. The rows of a dyadic matrix are obtained by permuting the elements of the signature, such that the element at position ( i , j ) is the one in the signature at position i j , where denotes the bitwise XOR between i and j. Then, a dyadic matrix can be written as an -reproducible matrix, for which each function σ i is the dyadic matrix whose signature has all-zero entries, except that at position i. This means that σ i can be described by the following bijection:

(3.19) f σ i ( v ) = v i mod p .

If we combine two transformations, we obtain

(3.20) f σ i f σ j ( v ) = ( v j ) i = v ( i j ) = f σ i j ( v ) .

Since f σ i ( j ) = i j , this proves that the family of dyadic matrices is compliant with Theorem 3.10. It can be straightforwardly proven that dyadic matrices are symmetric (and so satisfy Theorem 3.14), and that the multiplication between two dyadic matrices is commutative.

Circulant and dyadic matrices are just two particular cases of -reproducible pseudo-rings and can obviously be further generalized by considering signatures that are composed by more than one row. In addition, several more constructions can be obtained. For instance, for every permutation matrix ψ and every -reproducible pseudo-ring q , m , induced by = σ 0 = I p , σ 1 , , σ p m 1 , we can obtain a new -reproducible pseudo-ring as

(3.21) q , m = { M M = ψ M ψ T , M q , m } .

The corresponding family of transformations is = σ 0 , σ 1 , , σ p m 1 , with σ i = σ f ψ ( i ) ψ T . Proving that actually induces a pseudo-ring is quite simple; indeed, for any two matrices A = ψ M A ψ T and B = ψ M B ψ T , with M A , M B , m , we have

(3.22) A + B = ψ M A ψ T + ψ M B ψ T = ψ ( M A + M B ) ψ T ,

(3.23) A B = ψ M A ψ T ψ M B ψ T = ψ M A M B ψ T ,

which return matrices belonging to q , m , since M A + M B q , m and M A M B q , m . In addition, if multiplication is commutative in q , m , then it will be commutative in q , m too. To prove this fact, let us consider two matrices M A , M B q , m , such that M A M B = M B M A . Then, for A = ψ M A ψ T and B = ψ M B ψ T , we have

A B = ψ M A ψ T ψ M B ψ T = ψ M A M B ψ T = ψ M B M A ψ T = ψ M B ψ T ψ M A ψ T = B A .

It is easy to prove that, if q , m is closed under transposition, q , m is too.

4 Compactly reproducible codes

In the previous section, we have described the properties that a family of functions must have in order to generate -reproducible matrices. This opens a wide range of possibilities for obtaining codes with compact representations, that is, CR codes according to Definition 3.5. In fact, -reproducible pseudo-rings allow us to design codes that can be described in a very compact manner. Codes of this type are of interest in code-based cryptography, where small public keys are important.

In this section, we describe how to design CR codes, and the properties that characterize them. In particular, we study how to achieve an -reproducible representation for the parity-check matrix H starting from an -reproducible generator matrix G . In addition, we provide intuitive methods to obtain random-looking CR codes, starting from their parity-check matrix.

Let C be a CR code over F q , with length n, dimension k, and codimension r = n k , with an -reproducible generator matrix G F q k × n defined by the signature g 0 F q m × n and the fixed and universally known family of transformations . In particular, according to Definition 3.5 we have = k m > 1 and we write = { σ 0 , σ 1 , , σ l 1 } . Without loss of generality, we can suppose that σ 0 = id = I n . The matrix G can thus be expressed as

(4.1) G = g 0 g 1 g 1 = g 0 g 0 σ 1 g 0 σ 1 .

Let H F q r × n be a parity-check matrix for C and s be one of the factors of r; if r is a prime, necessarily s = 1 . Then, H can be expressed as

(4.2) H = h 0 h 1 h r s 1 ,

where each h i is a matrix with dimensions s × n . Since by definition G H T = 0 k × r , it must be

(4.3) g i h j T = g 0 σ i h j T = 0 m × s , i , j N s.t. 0 i l 1 , 0 j r s 1 .

Let us assume that g 0 H T = 0 m × n : as we explain later, in the practical case of a cryptographic scheme, this condition can be easily satisfied. The following theorem considers a particular construction for a CR code and states some properties that its parity-check matrix must satisfy.

Theorem 4.1

Let G F q k × n be an -reproducible matrix, with signature g 0 F q m × n (hence, m divides k) and family = σ 0 , σ 1 , , σ k m 1 . For simplicity, we suppose σ 0 = I n . Let r = n k , and H F q r × n such that g 0 H T = 0 m × r . Let s be a factor of r, and denote by h j the subset of rows of H at positions { j s , j s + 1 , , ( j + 1 ) s 1 } . If we can define a function f ( x 0 , x 1 ) : 0 , k m 1 × 0 , r s 1 N 2 0 , r s 1 N , such that

(4.4) h j σ i T = h f ( i , j ) , i , j N , 0 i k m 1 , 0 j r s 1 ,

then G and H T are orthogonal, i.e., G H T = 0 k × r .

Proof

Since the generator matrix G is -reproducible, with signature g 0 , we have

(4.5) G = g 0 g 1 g k m 1 = g 0 g 0 σ 1 g 0 σ k m 1 , H = h 0 h 1 h r s 1 .

In order for G to be a valid generator matrix, it must be G H T = 0 k × r , that is,

(4.6) g i h j T = g 0 σ i h j T = 0 m × s , i , j N s.t. 0 i k m 1 , 0 j r s 1 .

By hypothesis, g 0 is an m × n matrix such that g 0 H T = 0 m × r , which means

(4.7) g 0 h j T = 0 m × s , j N s.t. 0 j r s 1 .

Consider now the product g i h j T = g 0 σ i h j T , for i 1 . If we can define a function f ( x 0 , x 1 ) : 0 , k m 1 × 0 , r s 1 N 2 0 , r s 1 N with the aforementioned property described by (4.4), then for all couples of indexes i , j we have

(4.8) σ i h j T = h f ( i , j ) T ,

and (4.6) is surely satisfied, since

(4.9) g i h j T = g 0 σ i h j T = g 0 h f ( i , j ) T = 0 m × s ,

where g 0 h f ( i , j ) T = 0 m × s because of (4.7).□

Remark 4.2

Note that if r is a prime, then we either have s = r or s = 1 . The first case may lead to somehow trivial constructions: we have that the function f is constant, since it maps any pair ( x 0 , 0 ) with x 0 0 , k m 1 to 0. This implies that the matrix H is such that H σ i T = H , for any σ i : if the functions σ i have all full rank (for instance, they are permutations), then H cannot have maximum rank r. Hence, when r is a prime, the only case with practical interest is that of s = 1 (i.e., the one in which each h j is actually a row vector).

For G and H to be, respectively, the generator and parity-check matrix of a code C , some conditions have to be verified, given in Corollary 4.3.

Corollary 4.3

Let G F q k × n be an -reproducible matrix, with signature g 0 F q m × n (hence, m is among the factors of k) and family = σ 0 , σ 1 , , σ k m 1 . Let H F q r × n be a matrix such that G H T = 0 k × n , and suppose that it satisfies the hypothesis of Theorem 4.1. For H and G to be, respectively, the parity-check and generator matrices of a code C with length n, dimension k and redundancy r, the following conditions are necessary:

  1. contains k m distinct linear transformations;

  2. k m r s ;

  3. For any three integers i 0 , k m 1 and j , j 0 , r s 1 , with j j , it must be f ( i , j ) f ( i , j ) .

Proof

We want the -reproducible k × n matrix G to be the generator matrix of a code with dimension k: then, G must have rank equal to k. If contains two transformations σ i = σ j , with i j , then the rows of G obtained as g 0 σ i are identical to the ones obtained as g 0 σ j . If G has some identical rows, then its rank cannot be maximum, and this proves condition ( a ) . It is straightforward to show that this condition can also be expressed as follows: there cannot exist three integers i , i 0 , k m 1 , with i i and j 0 , r s 1 , such that f ( i , j ) = f ( i , j ) . Indeed, if we can determine such integers, then

h j σ i T = h f ( i , j ) = h f ( i , j ) = h j σ i T ,

which results in σ i = σ i .

We can then easily prove condition ( b ) . Indeed, fix an integer j 0 , r s 1 and consider, for all i 0 , k m 1 , all the images f ( i , j ) : because of condition ( a ) , these images must be distinct. However, the dimension of the codomain of f ( i , j ) is equal to r s : if k m > r s , then ( a ) cannot be satisfied. This proves ( b ) .

If H is the parity-check matrix of a code with redundancy r, then it must have rank equal to r. If we suppose that there exists three integers i 0 , k m 1 , j , j 0 , r s 1 , with j j , such that f ( i , j ) = f ( i , j ) then, because of Theorem 4.1, we also have h j σ i T = h j σ i T , which implies h j = h j . If H has some identical rows, then its rank must be < r , and this proves condition ( c ) .□

Theorem 4.1 and Corollary 4.3 allow us to generate a CR code in a very simple way. Given a family of transformations , first obtain a matrix H with the characteristics required by the theorem. Then, for the code C having H as parity-check matrix, a variety of -reproducible generator matrices can be found. Indeed, let G be a generator matrix for C : by definition, since G H T = 0 k × r , we know that whichever subset g 0 formed by m rows of G is such that g 0 H T = 0 m × r . Then, g 0 is a valid signature for an -reproducible generator matrix, defined by the family . On condition that both H and G have full rank, and m < k l > 1 , then they can be used to represent the CR code C with length n, dimension k, and redundancy r.

We point out that the properties defined by Theorem 4.1 can be described in a graphical way, considering the fact that the linear functions σ i define a mapping acting on the ensemble of matrices h j . We can consider a directed graph G , with r s nodes, labeled from 0 to r s 1 . In such a graph, we have an edge from a node j 0 to a node j 1 if there exists an integer i such that h j 0 σ i T = h j 1 . In addition, every edge is labeled with the corresponding function σ i T . With this construction, the graph G contains all the information about the mapping defined by . The meaning of the graph is the following: if there exists a length-l path from a node j 0 to a node j 1 , whose edges have labels = { i 0 , i 1 , , i l 1 } , then it must be

(4.10) h j 1 = h j 0 i σ i T .

We can now consider two different paths having the same starting and final nodes, with the corresponding sets of edges labeled as a and b . Then, it must be

(4.11) i a σ i T = i b σ i T .

The definitions we have introduced in the previous section describe codes whose generator matrices can be efficiently described just by a subset of their entries; for this reason, they are natural candidates for being used in a McEliece cryptosystem. Actually, some variants of this type have already been proposed during the years, with the aim of reducing the public-key size by exploiting such a property. We show that these already existing variants are encompassed by our general framework and that the possibilities for obtaining such features are actually many more than those already exploited.

In some cases, a QCR code can be seen as a particular case of a CR code (and viceversa). Let us consider a code C with length n = n 0 p , dimension k = p , and codimension r = ( n 0 1 ) p , for some integer n 0 N . Let us suppose that G is obtained as a row of n 0 blocks with size p × p , that is,

(4.12) G = [ G 0 G 1 G n 0 1 ] .

This form of the generator matrix is commonly used in sparse-matrix code-based cryptosystems [5,35]. Suppose that G in (4.12) is an -quasi-reproducible matrix, i.e., each G i is an element of the pseudo-ring q i , m i and has signature V i . If the signatures have all the same number of rows (that is, m i = m ), then such a G can be seen as a particular -reproducible matrix. Let us write the ith family of transformations as i = σ 0 ( i ) , σ 1 ( i ) , , σ p m 1 ( i ) and define an overall family of transformations = σ 0 , σ 1 , , σ p m 1 , such that

(4.13) σ i = σ i ( 0 ) 0 p × p 0 p × p 0 p × p 0 p × p σ i ( 1 ) 0 p × p 0 p × p 0 p × p 0 p × p σ i ( 2 ) 0 p × p 0 p × p 0 p × p 0 p × p σ i ( n 0 1 ) .

Then, it is easy to see that a matrix in the form (4.12) is also an -reproducible matrix obtained through in (4.13), with signature

(4.14) g 0 = [ g 0 ( 0 ) g 0 ( 1 ) g 0 ( n 0 1 ) ] .

4.1 CR codes from Householder matrices

A Householder matrix [36] is a matrix that is at the same time orthogonal and symmetric. Let us consider a set of distinct Householder matrices ψ 0 , , ψ v 1 . We have that, for all j = 0 , , v 1 , it must be ψ j 1 = ψ j T = ψ j . In order to fulfill the conditions of Theorem 4.1, these matrices must form a commutative group, that is,

(4.15) ψ i ψ j = ψ j ψ i , 0 i , j v 1 .

Let us consider two sets containing all the 2 v distinct binary v-tuples, i.e.,

(4.16) { a ( i ) 0 i 2 v 1 , a ( i ) F 2 v , s.t. a ( i ) a ( j ) , i j } , { b ( i ) 0 i 2 v 1 , b ( i ) F 2 v , s.t. b ( i ) b ( j ) , i j } .

For the sake of simplicity, let us fix a ( 0 ) = 0 1 × v . It is clear that these two sets are identical, except for the order of their elements. We can now define a family of transformations , containing 2 v linear functions σ i , defined as

(4.17) σ i = l = 0 v 1 ψ l a l ( i ) ,

where a l ( i ) is the lth entry of a ( i ) . Since we are considering Householder matrices with the property (4.15), it is easy to verify that σ i 2 = I n , and it follows that each function is an involution.

The family can be used to define an -reproducible generator matrix G for a code C ; a parity-check matrix for C can then be the -reproducible matrix H , with signature h 0 F q s × n , whose rows are obtained as

(4.18) h j = h 0 l = 0 v 1 ψ l b l ( j ) T .

If H has full rank, the corresponding code has redundancy r = s 2 v , and

h j σ i T = h j l = 0 v 1 ψ l a ( i ) T = h 0 l = 0 v 1 ψ l b ( j ) T l = 0 v 1 ψ l a l ( i ) T = h 0 l = 0 v 1 ψ l a l ( j ) b l ( i ) T = h f ( i , j ) ,

where denotes the modulo 2 sum and

(4.19) f ( i , j ) = u , s.t. b ( u ) = a ( i ) b ( j ) .

It is straightforward to show that such a function satisfies the properties required by Theorem 4.1 and Corollary 4.3. The corresponding code has length n, dimension k = m 2 v , and redundancy r = s 2 v , thus the code rate corresponds to m m + s . In addition, we point out that it might be possible to tune the code parameters, by selecting only proper subsets of all the binary v-tuples, in order to form the rows of both G and H .

4.2 CR codes from powers of a single function

In this section, we present another construction of reproducible codes satisfying Theorem 4.1. Let us consider an n × n matrix π such that π b = I n , for some integer b. Let v be a divisor of b; obviously, if b is a prime, then v = 1 . We can use π to build a family of k m b v linear transformations, where k is the desired code dimension and m is the number of rows in a signature. Indeed, the functions in can be defined as σ i = π v z i , where the values z i are distinct integers b v . For simplicity, we assume z 0 = 0 , i.e., σ 0 = I n . Then, given an m × n signature g 0 , we can use the family to obtain a generator matrix G for a code C as

(4.20) G = g 0 g 1 g 2 g k m 1 = g 0 g 0 π v z 1 g 0 π v z 2 g 0 π v z k m 1 .

An -reproducible parity-check matrix for C can be obtained by taking an s × n matrix h 0 , and using it to generate the parity-check matrix H as

(4.21) H = h 0 h 1 h 2 h b v 1 = h 0 h 0 ( π b v ) T h 0 ( π b 2 v ) T h 0 ( π v ) T .

If H is full rank, then C has redundancy r = s b v ; the code dimension and redundancy must be linked to the code length according to k + s b v = n .

It is quite easy to show that such a parity-check matrix is compliant with Theorem 4.1. In fact, we have

(4.22) h j σ i T = h 0 ( π b j v ) T ( π v z i ) T = h 0 [ π b + ( z i j ) v ] T .

If z i j , we have

[ π b + ( z i j ) v ] T = [ π 2 b b + ( z i j ) v ] T = π b b v + j z i v T [ π b ] T = π b b v + j z i v T = π b j z i mod b v v T .

In the case of z i < j , we can write

(4.23) [ π b + ( z i j ) v ] T = [ π b ( j z i ) v ] T π b j z i mod b v v T .

Thus, we have proven that

(4.24) h j σ i T = h 0 π b j z i mod b v v T = h j z i mod b v ,

such that the function f ( x 0 , x 1 ) required by Theorem 4.1 is defined as

(4.25) f ( x 0 , x 1 ) = x 1 z x 0 mod b v .

For instance, a simple construction can be obtained by choosing m = s = 1 and k = r = n / 2 : the matrices G and H are two -reproducible matrices, with signatures that are row vectors of length n and are characterized by the same number of rows (thus, C has rate 1/2).

For what concerns property ( b ) , we can consider the following equivalence:

(4.26) x 0 x 1 x 0 x 1 mod r s ,

which turns into

(4.27) x 1 x 1 0 mod r s .

Then, it is clear that it must be x , x < r s : however, this condition is quite straightforward, since j denotes the row index of the matrix blocks in H . In the same way, when considering the index of the transformation σ i , we have

(4.28) x 0 x 1 x 0 x 1 mod r s ,

which turns into

(4.29) x 0 x 0 0 mod r s .

Again, in order to guarantee that the previous equivalence has no solution, it must be x 0 , x 0 < r s . This basically means that we must have k m r s .

Remark

There is a clear analogy between the concept of reproducibility and that of automorphism group of a code. Remember that, by automorphism group, we refer to the set of functions that map a code into itself. For instance, consider codes obtained from generator matrices as in (4.20) and assume that π is a permutation. Let us further assume, for simplicity, that v = 1 and choose k = b , i.e., suppose the code has dimension equal to the order of the considered permutation π . We then have = { I n , π , π 2 , , π k 1 } , and for each each g 0 F q n we obtain an -reproducible generator matrix as

G = g 0 g 0 π g 0 π 2 g 0 π k 1 .

It is trivial to show that is in the automorphism group of the code C having G as a generator matrix. Indeed, each codeword is obtained as

c = u G = j = 0 k 1 u j g 0 π j , u j F q .

If we permute c according to a permutation π i , we obtain

c π i = j = 0 k 1 u i g 0 π i + j = j = 0 k 1 u j g 0 π j = u G , with u j = u j i mod k .

Thus, u is a cyclic permutation of u : this proves that c π i C . Hence, the automorphism group of C contains all permutations of the form π i , for i [ 1 ; k 1 ] . With similar arguments, one can prove that analogous results hold for other families of transformations that we consider in this article.

4.3 Code-based schemes from QCR codes

The algebraic structures we have introduced in the previous sections can be used to generate key pairs in code-based cryptosystems. For instance, let us consider a parity-check matrix H made of r 0 × n 0 matrices belonging to a pseudo-ring q , m . In order to use H as the private key of a sparse-matrix code-based instance of the Niederreiter cryptosystem, we must guarantee that H is sufficiently sparse: this property can be easily achieved by choosing a family of sparse matrices σ i , which guarantee that an -reproducible matrix defined by a sparse signature will be sparse as well. In such a case, we can obtain the public key as H = S H , where S is a random dense matrix, whose elements are picked over q , m . Because of Theorem 3.9, the entries of H belong to q , m , thus they maintain the same structure defined by .

If m = 1 and is a family of permutations satisfying Theorem 3.10, then q , 1 is actually a ring (see Corollary 3.12). Then, the secret key can be chosen as H = [ H 0 , H 1 , , H n 0 1 ] , with H i q , 1 , while the public key can correspond to the systematic form of H , that is, H = H 0 1 H . Indeed, because of Theorem 3.13, we have H 0 1 q , 1 , and so H is a matrix constituted of blocks over q , 1 . This is the approach followed in previous instances of the McEliece and Niederreiter cryptosystems based on QC-LDPC and QC-MDPC codes [5,35], which, however, only considered the special case of circulant matrices as H i .

Suppose we have a family satisfying Theorem 3.14, for which multiplication in q , 1 is commutative (see Section 3.2 for some examples). Then, we can use the -reproducible pseudo-ring induced by to obtain key pairs for a McEliece cryptosystem. For instance, we can choose H = [ H 0 , H 1 ] , with H i q , 1 , and obtain a generator matrix as G = S [ H 1 T , H 0 T ] , with S q , 1 . The matrices H and G can be used as the private and public key, respectively, for a McEliece cryptosystem. Even if this case might seem quite specific, it is of significant interest since it is exactly the structure appearing in the first of the three variants (BIKE-1) of the BIKE proposal to the NIST competition [37].

When both Theorems 3.13 and 3.14 are satisfied, we can obtain a generator matrix in systematic form, which is still an -reproducible matrix. In fact, starting from an r × n parity-check matrix H , where the elements are picked randomly from q , 1 , we can use the corresponding parity-check matrix in systematic form as the public key for a Niederreiter cryptosystem instance. In the same way, we can compute the systematic generator matrix, and use it as the public key in a McEliece cryptosystem instance.

The idea of using codes that are completely reproducible, and not formed by reproducible pseudo-rings, opens up for the possibility of a whole new way of generating key pairs in the McEliece cryptosystem. Indeed, once we have generated a sparse parity-check matrix H , we can use it as the secret key. Then, a possible public key can be obtained by taking a bunch of linearly independent codewords, and using them as the signature of the public generator matrix. If such codewords correspond to rows of the generator matrix in systematic form, then we obviously obtain another significant reduction in the public key size, since there is no need for publishing the first k bits of each one of the selected codewords.

It is clear that having a CR public code may lead to a significant reduction in the public-key size. Indeed, once the structure of the matrix is fixed by the protocol (i.e., dimensions, family ), the whole public key can be efficiently represented using just the signatures of each building block.

5 Cryptographic properties and attacks

In the previous sections, we have introduced the notion of reproducibility and have described some properties of reproducible codes. Our analysis has shown that there can be a wide variety of methods which allow us obtaining reproducible codes. As we have seen in Section 4.3, these codes can be used to generate key pairs in code-based cryptosystems. The main advantage is the possibility of reducing the information needed to represent the matrix used as the public key. In particular, following the considerations in Section 2.3, this framework is well suited for sparse-matrix code-based cryptosystems. Let C be a secret code with parity-check matrix H , and suppose that the public key is constituted by a general generator matrix (for the McEliece case) or parity-check matrix (for the Niederreiter case) of C . Then, the following properties must be satisfied:

  1. H is sufficiently sparse to perform efficient decoding;

  2. the knowledge of the public key does not admit efficient techniques for obtaining H or another valid sparse parity-check matrix H .

When property (a) is satisfied, C is an LDPC code and so admits an efficient decoding algorithm D . We point out that this property can be easily satisfied if we choose as a family of sparse matrices: this way, choosing a sparse signature for H guarantees that H will be sparse as well. Satisfying property (b) might result in being the most delicate part, since it depends on the particular reproducible structure we consider. However, as the case of circulant matrices clearly shows, this property might not be hard to satisfy. For instance, let us consider the systematic form of H = [ H 0 H 1 ] obtained as H = H 1 1 H . For a generic sparse matrix, there is no constraint regarding the density of its inverse. This means that, unless for particular structures (like orthogonal matrices), H 1 1 is dense with overwhelming probability, and this is enough to hide the structure of H into that of H . For the systematic generator matrix, we have G = [ I k ( H 1 1 H 0 ) T ] , and so we can make analogous considerations.

Regardless of the particular choice of , it is important to note that this additional structure does not expose the secret key to the risk of enumeration. For instance, let us consider the construction described in Section 4.2, in which the signature H is defined by a signature of size m × n , with all the rows having weight w. If we assume that the rows are picked in such a way as to be linearly independent, the cardinality of the secret key is then approximately equal to n w m . It is easy to see that, for practical choices of the parameters, this number is sufficiently large to make attacks based on the enumeration of the secret key unfeasible. In the next sections, we provide some considerations on attacks that work for QC codes and that may be hindered by proper families of reproducible codes. We only provide some qualitative arguments and leave detailed and thorough considerations about these attacks for future works.

5.1 Reaction attacks

Reaction attacks [29,38, 39,40] are a recent kind of attacks aimed at recovering the private key by exploiting events of decoding failure. In this section, we briefly describe the attack proposed in ref. [29], and then we make some considerations about reproducible codes. In particular, we consider a binary QC code with parity-check matrix H = [ H 0 H 1 ] , where each H i is a sparse p × p circulant with row and column weight equal to w. Then, the resulting code has length n = 2 p , dimension and redundancy equal to p.

In a reaction attack, the opponent impersonates Alice, producing ciphertexts and sending them to Bob. Events of decoding failure can be detected since, in the case of a decoding failure, Bob must ask for a retransmission. A crucial player in a reaction attack is the distance spectrum, that is, the set of all distances produced by the elements of value 1 in a vector [29]. If a distance d appears μ times in the spectrum, we say that it has multiplicity equal to μ ; if a distance is not in the spectrum, we say that it has zero multiplicity. In the case of QC codes, these distances are computed cyclically: given two ones at positions x 0 and x 1 , the corresponding distance is obtained as d = min { ± ( x 0 x 1 ) mod p } . In a circulant matrix, all the rows are characterized by the same distance spectrum; in particular, an opponent performing a reaction attack aims to obtain the distance spectrum of the rows of H 0 . For this purpose, he collects the produced ciphertexts into subsets Σ d , such that each error vector used for the encryption of a ciphertext in Σ d has d in the distance spectrum of its first circulant block. Then he observes a sufficiently large number of Bob’s reactions and assigns a decoding failure probability to each set. As observed in ref. [29], the decoding failure probability of Σ d depends on the presence of couples of ones in the rows of H 0 , at the same distance d. Indeed, suppose that the first length-p block of e has a couple of ones forming the distance d; then, the following properties hold

  • if the distance spectrum of H 0 contains d with multiplicity μ , then the couple of ones overlaps with μ rows of H ;

  • if the distance spectrum of H 0 does not contain d, then the couple of ones does not overlap with any row of H .

These justify the fact that the average syndrome weight of the ciphertexts belonging to the same set Σ d depends on the multiplicity of d in the spectrum of H 0 , as observed in ref. [40]. In particular, the syndrome weight slightly decreases as μ increases, and this causes the difference in the corresponding decoding failure probabilities [40]. This allows an opponent to obtain the distance spectrum of H 0 , since he can guess the multiplicity of each distance d by looking at the decoding failure probability of the corresponding set Σ d . Since H 0 is sparse, its distance spectrum is not dense, which means that it contains a small number of distances, with multiplicities that generically are rather low. It is then possible to recover H 0 from the knowledge of its distance spectrum, with a procedure that can be related to that of finding cliques of prefixed size in a given graph. In principle, cliques finding algorithms run with a time complexity that grows exponentially with the clique size; however, for sparse graphs (i.e., graphs that contain a small number of edges), the problem becomes significantly easier [29,38].

In summary, reaction attacks against QC codes are possible because of two factors:

  1. A sufficiently high DFR;

  2. The invariance of the set of distances between pairs of ones in a row of the secret key with respect to the row index. This guarantees feasibility of the key reconstruction phase, since the resulting graph (in which rows of the secret key are represented by cliques of fixed size) is sparse.

In particular, one can try to counter reaction attacks by choosing codes for which condition (ii) is not met. For instance, in ref. [34] authors propose to use a specific family of QC monomial codes with the property that the distances between pairs of ones in the secret key fill the distance spectrum. In this way, the density in the obtained graph becomes maximal and, as a consequence, reconstructing the secret key becomes unfeasible. We argue that families of reproducible codes may, in general, be characterized by analogous properties.

For simplicity, consider the example of a reproducible code with k = r = p and n = 2 p , with a signature made of just one row, and a family of functions σ i that are obtained as consecutive powers of a permutation ψ . In addition, suppose that ψ is obtained as the product of two disjoint p-cycles. In other words, ψ is such that that we can find two disjoint sets { a 0 ( 0 ) , a 1 ( 0 ) , , a p 1 ( 0 ) } and { a 0 ( 1 ) , a 1 ( 1 ) , , a p 1 ( 1 ) } , for which

(5.1) f ψ ( a j ( b ) ) = a j + 1 mod p ( b ) , b { 0 , 1 } .

It is clear that

(5.2) f σ i ( a j ( b ) ) = a j + i mod p ( b ) , b { 0 , 1 } , i .

Suppose now that the signature of H has two ones at positions a v ( 0 ) and a l ( 0 ) , with a l ( 0 ) a v ( 0 ) = d . Then, in the ith row of H these ones correspond to the positions a v + i mod p ( 0 ) and a l + i mod p ( 0 ) . The corresponding distance is d = a l + i mod p ( 0 ) a v + i mod p ( 0 ) which, in general, is different from d.

As a toy example, set p = 7 and suppose ψ is formed by the cycles { 1 , 8 , 5 , 3 , 7 , 0 , 13 } and { 4 , 12 , 10 , 6 , 15 , 11 , 2 } . For simplicity, suppose that in the secret signature there are two ones in positions 0 and 1. These correspond to the ones at positions 13 and 8 in the second row of H , at positions 1 and 8 in the third row, etc. The distances between these ones are all different and, furthermore, are not an invariant of the row index. Thus, differently from the case of QC codes, the distances that are produced between ones in the first row of the secret key are not maintained in the other rows.

With this simple example we have shown that, differently from the QC case, the distance spectrum of generic reproducible codes becomes richer and, as a consequence, the graph which is used to discover the secret key becomes denser. Thus, the secret key reconstruction phase, which is the final step of a reaction attack, may be hindered, and this may be enough to remove the basis upon which reaction attacks are built. Asserting the resistance of general families of transformations requires a deeper investigation, although some conclusions can already be drawn.

5.2 DOOM

In ref. [28], Sendrier introduced a technique, called DOOM, which is able to speed up the execution of ISD algorithms for certain families of codes, including QC codes. In general, this technique can be applied whenever there are multiple instances of SDP with just one solution. When ISD is used to perform a decoding attack, the gain obtained from DOOM can be explained as follows. Consider the public parity-check matrix H and a set of N different syndromes S = { s ( 0 ) , s ( 1 ) , , s ( N 1 ) } to be decoded. Suppose that, e ( i ) such that H e ( i ) T = s ( i ) , there exists a bijective function that allows us to obtain e ( i ) from e ( 0 ) and vice versa. We denote such a function by , so that e ( i ) = ( e ( 0 ) ) and e ( 0 ) = 1 ( e ( i ) ) . Then each pair { s ( i ) , H } can be considered as the input of an ISD algorithm aimed at finding e ( 0 ) with weight w such that H ( e ( 0 ) ) T = H e ( i ) T = s ( i ) . According to DOOM, we consider N i independent calls to an ISD algorithm. As soon as one of these runs successfully comes to an end, the whole algorithm ends as well, since e ( 0 ) has been found. The corresponding gain is equal to S / N i = N / N i , which becomes N when N i = N . Obviously, exploiting DOOM is beneficial when the N i independent decoding instances have comparable complexity. This only occurs on the condition that e ( i ) = ( e ( 0 ) ) has the same Hamming weight as e ( 0 ) , or almost the same.

The rationale of exploiting DOOM for a decoding attack is to intercept one ciphertext and then try to obtain other valid ciphertexts from it, corresponding to transformed versions of the same error vector. Let us consider the case in which the opponent intercepts a ciphertext corresponding to an initial syndrome s ( 0 ) and wants to recover the vector e ( 0 ) used during encryption. Then, in order to apply DOOM, the opponent must produce other syndromes corresponding to as many error vectors being deterministic functions of e ( 0 ) . In other words, suppose that ISD returns the solution e ( i ) for s ( i ) , then it must be e ( i ) = A e ( 0 ) , with A being a full-rank matrix. For instance, in the QC case, the opponent can obtain a set of p syndromes S just by cyclically shifting the initial syndrome s ( 0 ) and the corresponding error vector e ( 0 ) .

In general terms, the applicability of DOOM can be modeled as follows. Starting from a syndrome s ( 0 ) = H e ( 0 ) T , we want to determine a transformation Φ of the syndrome that corresponds to a transformation Ψ of the error vector, that is,

(5.3) Φ s ( 0 ) = Φ H e ( 0 ) T = H ( e ( 0 ) Ψ ) T = H Ψ T e ( 0 ) T ,

where Φ and Ψ are two matrices over F q , with size r × r and n × n , respectively. The previous equation must be satisfied for every vector e ( 0 ) ; this can happen only if

(5.4) Φ F q r × r , Ψ F q n × n s.t. Φ H = H Ψ T .

For the general class of reproducible codes, the applicability of DOOM must be carefully analyzed. For instance, consider a code obtained with the procedure described in Section 4.2, using a family of functions consisting of powers of a single function. If this is a permutation, due to Theorem 4.1, we have that H σ i with σ i always results in a permutation of the rows of H . So, the opponent can build the set S , which is used as input for the DOOM algorithm, by multiplying the initial syndrome by the matrices σ i .

However, as we have described in the previous sections, reproducible families of codes can be obtained in many different ways. For instance, we can use functions σ i that are powers of a matrix θ that is not a permutation. In this case, the opponent can still produce a set S , since equation (5.3) can be satisfied by choosing Ψ = σ i ; the corresponding reordering of the rows of H is a cyclic shift by i positions. However, it results that e ( i ) = e ( 0 ) σ i . Unless θ is a permutation, powers of this matrix would contain a rather large number of non-null entries: for instance, if θ is selected at random, then we expect that for any σ i the portion of non-null components is close to q 1 q . In such a case, any e ( i ) would have a rather large Hamming weight say, close to q 1 q , way larger than that of e ( 0 ) . According to ref. [41], we can approximate the time complexity of an ISD algorithm searching for a vector with weight t as 2 c t , where c = log 2 1 k n . If t is the weight of e ( 0 ) , then we have that the ISD algorithm taking s ( 0 ) as input is expected to run in time 2 c t . Since all the other syndromes s ( i ) , with i 1 , are associated with error vectors with weights significantly larger than t, applying ISD on them requires a time complexity that is significantly larger than 2 c t . Then, there is no gain in considering this set of multiple instances, since the additional instances (which are produced by the opponent) are associated with an ISD complexity that is significantly larger than that of the original one.

We note that codes of this type may be employed in cryptosystems where codes in compact form are not required to admit efficient decoding. This is the case, for instance, of the HQC KEM [42] and the AGS identification scheme [43]. In both schemes, a code in compact form is needed to obtain a syndrome decoding instance: while in HQC decoding is done with a public and fixed code, in AGS decoding is not involved at all. Hence, in this type of applications, the adoption of reproducible families of codes may be convenient: defeating DOOM would obviously result in the possibility of choosing better parameters for a scheme.

5.3 Construction examples

We provide some explicit constructions of reproducible codes that can be advantageous for the use in code-based cryptographic schemes, with the aim of illustrating the potential of the introduced theoretical framework.

5.3.1 Quasi-dyadic MDPC codes

Dyadic matrices, which we have already mentioned in Section 3.2, have been used with some measure of success in cryptography, but always in the context of algebraic codes. The first proposal using quasi-dyadic (QD) Goppa codes [1] was cryptanalyzed [26] almost in its entirety. A later proposal based on generalized Srivastava (GS) codes [44] was designed to be more robust against the previous attack and led to one of the NIST submissions for the key exchange functionality, DAGS [45,46]. Nevertheless, the threat of structural attacks is always present, as shown by the recent results of Barelli and Couvreur [47]. On the other hand, using dyadic matrices has undeniable advantages, not only in terms of key reduction but also because it leads to fast and efficient arithmetic (as shown in ref. [48]) while at the same time featuring a reproducible structure which is less “obvious” than that provided by circulant matrices.

The reasons mentioned above are why we believe that designing MDPC codes with a QD structure, i.e., QD-MDPC codes, has potential in cryptography. Dyadic matrices have many good properties (e.g., they are symmetric and orthogonal) and satisfy Theorems 3.93.13, which means the ensemble q , 1 of dyadic matrices forms a fully-fledged ring (which is also commutative). A formal definition of reproducible codes having such a structure is given below.

Definition 5.1

(QD-MDPC codes) Let q , 1 be the ring of dyadic matrices. We call Quasi-Dyadic MDPC (QD-MDPC) code of type ( r 0 , n 0 ) a linear code of length n = n 0 p and redundancy r r 0 p that admits a parity-check matrix in the form H = { Z i j } , where Z i j q , 1 for all 0 i r 0 1 , 0 j n 0 1 , such that H has row weight O ( n ) .

Constructing a code-based cryptosystem from QD-MDPC codes is actually rather intuitive, since we can follow the guidelines detailed in Section 4.3. However, due to the very same properties we just mentioned, building QD-MDPC codes for cryptographic purposes requires some caution. For example, in the simplest instantiation, one could form a parity-check matrix by selecting just two blocks, i.e., H = [ H 0 , H 1 ] , with H i q , 1 of size p × p . However, this would not be secure. In fact, since dyadic matrices are orthogonal, the density of the inverse matrix is not guaranteed. This means that a Niederreiter instantiation would not be secure, since the non-systematic block is obtained as H 0 1 H 1 . Similarly, to use the McEliece framework, one could compute a generator matrix as G = [ G 0 , G 1 ] = S [ H 1 T , H 0 T ] , where S q , 1 is dense, but then the product G 0 G 1 1 may still reveal the private key, due to the sparsity of the inverse of a dyadic matrix.

As a consequence, to construct code-based schemes using this particular family of reproducible codes, it is recommended to choose r 0 2 and employ “true” block matrices, with blocks in q , 1 .

5.3.2 Block-wise circulant matrices

As shown in Section 3.2, circulant matrices are a classic special case of reproducible matrices and have already been used in cryptography for quite some time. For a traditional circulant matrix, the signature corresponds to its first row and the set of transformations is = { σ 0 = I p , σ 1 = π , σ 2 = π 2 , , σ p 1 = π p 1 } , where π is the unitary circulant permutation matrix (3.15).

The concept of circulant matrix can be easily generalized into that of a block-wise circulant matrix, or a periodically circulant matrix as defined in ref. [49]. Such a generalization of circulant matrices can be described in the form of -reproducible matrices as follows. Let us consider m > 1 , such that m p , and an m × p signature z formed by m independent rows of p elements each, with entries over F q . Then, let us consider a fixed family of linear maps formed by the set of permutations

(5.5) = σ 0 = I p , σ 1 = π m , σ 2 = π 2 m , , σ p m 1 = π p m ,

which induces q , m as the set of all -reproducible matrices of the type

(5.6) Z = z z π m z π 2 m z π p m .

These matrices are indeed block-wise circulant, in the sense that any block of m rows is originated by the previous block of m rows through a cyclic shift by m positions. It is easy to verify that, for every matrix Z q , m , we have

σ i Z = π i m Z = Z π i m = Z σ i , i N , 0 i p m 1 .

Based on Theorem 3.9, q , m is a semigroup with respect to the multiplication, and therefore a pseudo-ring. With this in mind, we can define the following object.

Definition 5.2

(BC-MDPC codes) Let q , m be the pseudo-ring formed by block-wise circulant matrices of the form (5.6). We call Block-wise Cyclic MDPC (BC-MDPC) code of type ( r 0 , n 0 ) a linear code of length n = n 0 p and redundancy r r 0 p that admits a parity-check matrix in the form H = { Z i j } , where Z i j q , m for all 0 i r 0 1 , 0 j n 0 1 , such that H has row weight O ( n ) .

Circulant matrices have the property that any distance between a pair of ones in their first row can be found in any other position in one of the other rows, due to the unitary cyclic shift between any row and the subsequent one. In this more general formulation, shifts by m positions replace unitary shifts, therefore the aforementioned property no longer holds. Therefore, we expect that using BC-MDPC codes could hinder reaction attacks of the type introduced in ref. [29], which rely on such a property of circulant matrices.

Remark 5.3

Note that the above formulation of BC-MDPC codes could be made even more general. In fact, in Definition 5.2, these codes are described as made of blocks all coming from the same pseudo-ring q , m . However, this is not strictly necessary to preserve a reproducible structure. One could in fact select block-wise circulant components with different reproducible orders, which would lead to a BC-MDPC code of reproducible order m = lcm ( m i ) . We believe that such a formulation could be an interesting avenue to investigate in future works.

6 Conclusion

We have introduced the notions of reproducibility and quasi-reproducibility. They capture the idea of matrices that can be compactly represented through a signature, i.e., a subset of rows, and a family of functions which generate all remaining rows. We have provided theoretical results about the existence and properties of these families of matrices, which only depend on the chosen family of transformations. Alongside, we have extended these notions to coding theory and have introduced the concept of reproducible and quasi-reproducible codes, which are codes described by a generator or a parity-check matrix yielding a compact representation. We have shown that existing and well known families of structured codes are encompassed within this framework, and have provided some concrete constructions of other families of reproducible codes.

A direct application of this work is in code-based cryptography, where the representation of a code is commonly used as the public key. As the recent NIST call for the standardization of post-quantum cryptosystems clearly emphasizes, random and pseudo-random codes are of interest for many code-based cryptosystems. In particular, at the current state of the art, many systems rely on the quasi-cyclic structure of codes in order to reduce the public key size. Essentially, all the schemes employing such structured codes can be generalized to the use of reproducible codes, via some of the constructions we have shown in this article. While the compactness of the public key is preserved, advantages come from the fact that attacks targeting the specific quasi-cyclic structure can be avoided when more general code constructions are considered. Although a complete cryptanalysis of these new families of codes requires a deeper investigation, and is out of the scope of this article, these potential benefits motivate the study of reproducible codes as a generalization of quasi-cyclic and other known structured codes.

Acknowledgements

Edoardo Persichetti and Paolo Santini were supported by National Science Foundation (NSF) grant CNS-1906360.

  1. Conflict of interest: Authors state no conflict of interest.

References

[1] Misoczki R, Barreto PSLM. Compact McEliece keys from Goppa codes. In: Selected Areas in Cryptography, Lecture Notes in Computer Science 5867. Springer Verlag; 2009. p. 376–92. 10.1007/978-3-642-05445-7_24Search in Google Scholar

[2] McEliece RJ. A public-key cryptosystem based on algebraic coding theory. DSN Progress Report. 1978;4244:114–6. Search in Google Scholar

[3] Gaborit P. Shorter keys for code based cryptography. In: Proceedings of the International Workshop on Coding and Cryptography (WCC 2005). Bergen, Norway; March 2005. p. 81–90. Search in Google Scholar

[4] Monico C, Rosenthal J, Shokrollahi A. Using low density parity check codes in the McEliece cryptosystem. In: Proceedingsof the IEEE International Symposium on Information Theory (ISIT 2000) Sorrento, Italy; June 2000. p. 215.10.1109/ISIT.2000.866513Search in Google Scholar

[5] Misoczki R, Tillich JP, Sendrier N, Barreto PSLM. MDPC-McEliece: New McEliece variants from moderate density parity-check codes. In: 2013 IEEE International Symposium on Information Theory; July 2013. p. 2069–73. 10.1109/ISIT.2013.6620590Search in Google Scholar

[6] Baldi M. LDPC codes in the McEliece cryptosystem: attacks and countermeasures. NATO Science for Peace and Security Series - D: Information and Communication Security 23. IOS Press; 2009. p. 160–74. Search in Google Scholar

[7] Shor PW. Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM J Comput. 1997;26:1484–509. 10.1137/S0097539795293172Search in Google Scholar

[8] https://csrc.nist.gov/Projects/Post-Quantum-Cryptography. Search in Google Scholar

[9] Bootland C, Castryck W, Szepieniec A, Vercauteren F. A framework for cryptographic problems from linear algebra. J Math Cryptol. 2019;14:202–17.10.1515/jmc-2019-0032Search in Google Scholar

[10] Alagic C, Alperin-Sheriff J, Apon D, Cooper D, Dang Q, Liu YK, et al. Status report on the first round of the NIST post-quantum cryptography standardization process. Washington, DC: US Department of Commerce, National Institute of Standards and Technology; 2019. 10.6028/NIST.IR.8240Search in Google Scholar

[11] Berlekamp E, McEliece R, van Tilborg H. On the inherent intractability of certain coding problems. IEEE Trans Inform Theory 1978;24:384–86. 10.1109/TIT.1978.1055873Search in Google Scholar

[12] Sidelnikov VM, Shestakov SO. On insecurity of cryptosystems based on generalized Reed–Solomon codes. Discr Math Appl. 1992;2:439–44. 10.1515/dma.1992.2.4.439Search in Google Scholar

[13] Faugère J-C, Otmani A, Perret L, Tillich J-P. A distinguisher for high rate McEliece cryptosystems. In: Proceedings of IEEE Information Theory Workshop (ITW). Paraty, Brazil; October 2011. p. 282–6. 10.1109/ITW.2011.6089437Search in Google Scholar

[14] Gallager RG. Low-density parity-check codes. IRE Transactions on Information Theory. IEEE; 1963;8(1). 10.7551/mitpress/4347.001.0001Search in Google Scholar

[15] Hofheinz D, Hövelmanns K, Kiltz E. A modular analysis of the Fujisaki-Okamoto transformation. In:Kalai Y, Reyzin L, editors. Theory of Cryptography. Cham: Springer International Publishing; 2017. p. 341–71. 10.1007/978-3-319-70500-2_12Search in Google Scholar

[16] Baldi M, Barenghi A, Chiaraluce F, Pelosi G, Santini P. Failure rate model of bit-flipping decoders for QC-LDPC and QC-MDPC code-based cryptosystems. In: Proceedings of the 17th International Joint Conference on e-Business and Telecommunications - Volume 3: SECRYPT, INSTICC. SciTePress; 2020. p. 238–49. 10.5220/0009891702380249Search in Google Scholar

[17] Prange E. The use of information sets in decoding cyclic codes. IRE Trans Inform Theory. 1962;8:5–9. 10.1109/TIT.1962.1057777Search in Google Scholar

[18] Leon JS. A probabilistic algorithm for computing minimum weights of large error-correcting codes. IEEE Trans Inform Theory. 1988;34:1354–9. 10.1109/18.21270Search in Google Scholar

[19] Stern J. A method for finding codewords of small weight. In: Coding Theory and Applications. Cohen G, Wolfmann J, editors. Lecture Notes in Computer Science 388. Springer Verlag; 1989. p. 106–13. 10.1007/BFb0019850Search in Google Scholar

[20] May A, Meurer A, Thomae E. Decoding random linear codes in O(20.054n). ASIACRYPT, LNCS 7073. Springer; 2011. p. 107–24. 10.1007/978-3-642-25385-0_6Search in Google Scholar

[21] Becker A, Joux A, May A, Meurer A. Decoding random binary linear codes in 2n∕20: How 1+1=0 improves information set decoding. In: Pointcheval D, Johansson T, editors. Advances in Cryptology – EUROCRYPT 2012, Lecture Notes in Computer Science 7237. Springer Verlag; 2012. p. 520–36. 10.1007/978-3-642-29011-4_31Search in Google Scholar

[22] Grover LK. A fast quantum mechanical algorithm for database search. In: Proceedings of the 28th Annual ACM Symposium on the Theory of Computing. Philadephia, PA; May 1996. p. 212–9. 10.1145/237814.237866Search in Google Scholar

[23] Bernstein DJ. Grover vs. McEliece. In: PQCrypto. 2010. 10.1007/978-3-642-12929-2_6Search in Google Scholar

[24] Baldi M, Bodrato M, Chiaraluce F. A new analysis of the McEliece Cryptosystem based on QC-LDPC codes. In: Security and Cryptography for Networks, Lecture Notes in Computer Science 5229. Springer Verlag; 2008. p. 246–62. 10.1007/978-3-540-85855-3_17Search in Google Scholar

[25] Berger TP, Cayrel P-L, Gaborit P, Otmani A. Reducing key length of the McEliece cryptosystem. In: Progress in Cryptology - AFRICACRYPT 2009, Lecture Notes in Computer Science 5580. Springer Verlag; 2009. p. 77–97. 10.1007/978-3-642-02384-2_6Search in Google Scholar

[26] Faugère J-C, Otmani A, Perret L, Tillich J-P. Algebraic cryptanalysis of McEliece variants with compact keys. In: EUROCRYPT 2010, Lecture Notes in Computer Science 6110. Springer Verlag; 2010. p. 279–98. 10.1007/978-3-642-13190-5_14Search in Google Scholar

[27] https://bigquake.inria.fr/. Search in Google Scholar

[28] Sendrier N. Decoding one out of many. In: Post-quantum cryptography. Yang BY, editor. Lecture Notes in Computer Science 7071. Springer Verlag; 2011. p. 51–67. 10.1007/978-3-642-25405-5_4Search in Google Scholar

[29] Guo Q, Johansson T, Stankovski P. A key recovery attack on MDPC with CCA security using decoding errors. In: ASIACRYPT, LNCS 10031. Springer; 2016. p. 789–815. 10.1007/978-3-662-53887-6_29Search in Google Scholar

[30] Baldi M, Barenghi A, Chiaraluce F, Pelosi G, Santini P. LEDAkem: A Post-quantum Key Encapsulation Mechanism Based on QC-LDPC Codes. In: 9th International Conference on Post-Quantum Cryptography. Fort Lauderdale, FL, USA: PQCrypto; April 9–11 2018. p. 3–24. 10.1007/978-3-319-79063-3_1Search in Google Scholar

[31] Barreto PSLM, Gueron S, Gueneysu T, Misoczki R, Persichetti E, Sendrier N, et al. CAKE: code-based algorithm for key encapsulation. In: IMA International Conference on Cryptography and Coding. Springer; 2017. p. 207–26. 10.1007/978-3-319-71045-7_11Search in Google Scholar

[32] Tillich J-P. The decoding failure probability of MDPC codes. In:2018 IEEE International Symposium on Information Theory (ISIT), IEEE; 2018. p. 941–5. 10.1109/ISIT.2018.8437843Search in Google Scholar

[33] Santini P, Battaglioni M, Baldi M, Chiaraluce F. Analysis of the error correction capability of LDPC and MDPC codes under parallel bit-flipping decoding and application to cryptography. IEEE Trans Commun. 2020;68:4648–60. 10.1109/TCOMM.2020.2987898Search in Google Scholar

[34] Santini P, Baldi M, Cancellieri G, Chiaraluce F. Hindering reaction attacks by using monomial codes in the McEliece cryptosystem. In:2018 IEEE International Symposium on Information Theory (ISIT), IEEE; 2018. p. 951–5. 10.1109/ISIT.2018.8437553Search in Google Scholar

[35] Baldi M, Bianchi M, Chiaraluce F. Optimization of the parity-check matrix density in QC-LDPC code-based McEliece cryptosystems. In: Proc. IEEE ICC 2013 - Workshop on Information Security over Noisy and Lossy Communication Systems. Budapest, Hungary; June 2013. 10.1109/ICCW.2013.6649325Search in Google Scholar

[36] Householder AS. Unitary triangularization of a nonsymmetric matrix. J ACM. 1958;5:339–42. 10.1145/320941.320947Search in Google Scholar

[37] Aragon N, Barreto PSLM, Bettaieb S, Bidoux L, Blazy O, Deneuville Jc. BIKE: Bit flipping key encapsulation; 2017. Search in Google Scholar

[38] Fabšič T, Hromada V, Stankovski P, Zajac P, Guo Q, Johansson T. A reaction attack on the QC-LDPC McEliece cryptosystem. In:Post-Quantum Cryptography, LNCS 10346. Cham: Springer; 2017. p. 51–68. 10.1007/978-3-319-59879-6_4Search in Google Scholar

[39] Fabsic T, Hromada V, Zajac P. A reaction attack on LEDApkc. IACR Cryptol ePrint Archive. 2018;2018:140. Search in Google Scholar

[40] Eaton E, Lequesne M, Parent A, Sendrier N. QC-MDPC: A timing attack and a CCA2 KEM. In: PQCrypto. Cham: Springer; 2018. p. 47–76. 10.1007/978-3-319-79063-3_3Search in Google Scholar

[41] CantoTorres R, Sendrier N. Analysis of information set decoding for a sub-linear error weight. Cham: Springer International Publishing; 2016. p. 144–61. 10.1007/978-3-319-29360-8_10Search in Google Scholar

[42] Melchor CA, Aragon N, Bettaieb S, Bidoux L, Blazy O, Deneuville Jc, et al. HQC: Hamming Quasi Cyclic. 2017. Search in Google Scholar

[43] Aguilar C, Gaborit P, Schrek J. A new zero-knowledge code based identification scheme with reduced communication. In:2011 IEEE Information Theory Workshop (ITW). Paraty, Brazil; Oct 2011. p. 648–52. 10.1109/ITW.2011.6089577Search in Google Scholar

[44] Persichetti E. Compact McEliece keys based on quasi-dyadic Srivastava codes. J Math Cryptol 2012;6:149–69. 10.1515/jmc-2011-0099Search in Google Scholar

[45] Banegas G, Barreto PSLM, Boidje BO, Cayrel P-L, Dione K, Gaj GN, et al. DAGS: Key encapsulation using dyadic GS codes. J Math Cryptol. 2018;12:221–39. 10.1515/jmc-2018-0027Search in Google Scholar

[46] Banegas G, Barreto PSLM, Boidje BO, Cayrel P-L, Dione K, Gaj GN, et al. Dags: Reloaded revisiting dyadic key encapsulation. In: Code-Based Cryptography Workshop. Springer; 2019. p. 69–85. 10.1007/978-3-030-25922-8_4Search in Google Scholar

[47] Barelli E, Couvreur A. An efficient structural attack on NIST submission DAGS. In: International Conference on the Theory and Application of Cryptology and Information Security. Springer; 2018. p. 93–118. 10.1007/978-3-030-03326-2_4Search in Google Scholar

[48] Banegas G, Barreto PSLM, Persichetti E, Santini P. Designing efficient dyadic operations for cryptographic applications. J Math Cryptol. 2020;14:95–109. 10.1515/jmc-2015-0054Search in Google Scholar

[49] Battaglioni M, Chiaraluce F, Baldi M, Lentmaier M. Girth analysis and design of periodically time-varying SC-LDPC codes. IEEE Trans Inform Theor. 2021;67(4):2217–35. 10.1109/TIT.2021.3059414Search in Google Scholar

Received: 2020-01-24
Revised: 2021-07-30
Accepted: 2021-08-10
Published Online: 2021-09-17

© 2022 Paolo Santini et al., published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 16.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/jmc-2020-0003/html
Scroll to top button