3 Classical Error Correction Basics

Learning Objectives

Explain why redundancy is necessary for reliable communication over noisy channels
Define linear codes using generator and parity-check matrices
Calculate syndromes and use them to detect and locate errors
Construct and decode the $[7,4,3]$ Hamming code
Apply the Singleton bound to determine code limitations

Prerequisites

This chapter assumes familiarity with: - Linear algebra basics (vector spaces, matrices, rank) - Modular arithmetic (operations mod 2) - Basic probability (independent events)

3.1 Why Error Correction?

Information transmitted through physical channels is inevitably subject to noise. Whether we are sending data over a wireless network, reading from a hard drive, or communicating with a spacecraft millions of kilometers away, errors will occur. The fundamental question of coding theory is: How can we reliably transmit information through an unreliable channel?

The answer lies in redundancy. By encoding our message with carefully designed redundant information, we can detect and correct errors introduced by the channel. This section introduces the basic concepts through two classical channel models and the simplest possible error-correcting code.

3.1.1 Channel Models

A channel is a mathematical abstraction of a physical medium that transforms input symbols into output symbols, possibly introducing errors. We focus on binary channels where inputs and outputs are bits (0 or 1).

Note

Definition: Binary Symmetric Channel (BSC). The Binary Symmetric Channel with crossover probability $p$ is a channel where each input bit is flipped (0 → 1 or 1 → 0) independently with probability $p$, and transmitted correctly with probability $1 - p$.

The BSC models channels where errors occur as random bit flips, such as thermal noise in electronic systems. The channel is “symmetric” because the probability of flipping 0 to 1 equals the probability of flipping 1 to 0.

Input	Output	Probability	Interpretation
0	0	$1-p$	Correct transmission
0	1	$p$	Bit flip error
1	0	$p$	Bit flip error
1	1	$1-p$	Correct transmission

Table: Transition probabilities for the Binary Symmetric Channel with crossover probability $p$.

Note

Definition: Binary Erasure Channel (BEC). The Binary Erasure Channel with erasure probability $\epsilon$ is a channel where each input bit is either transmitted correctly with probability $1 - \epsilon$, or replaced by a special “erasure” symbol $?$ with probability $\epsilon$. The receiver knows which bits were erased, but not their values.

The BEC models situations where we can detect that data was lost but cannot determine what it was—for example, a packet that never arrived over a network. Unlike the BSC, there are no undetected errors; we always know when something went wrong.

Input	Output	Probability	Interpretation
0	0	$1-\epsilon$	Correct transmission
0	$?$	$\epsilon$	Erasure
1	1	$1-\epsilon$	Correct transmission
1	$?$	$\epsilon$	Erasure

Table: Transition probabilities for the Binary Erasure Channel with erasure probability $\epsilon$.

3.1.2 The Repetition Code

The simplest error-correcting code is the repetition code: simply repeat each bit multiple times. To send a single bit of information, we transmit $n$ copies of it.

Example: 3-Repetition Code. In the 3-repetition code, we encode: $ 0 $ $ 1 $

Suppose we transmit the codeword $000$ through a BSC with $p = 0.1$, and receive $010$. Since two of the three bits are 0, we decode by majority voting and correctly recover the original bit 0.

This code can correct any single bit error. However, if two or more bits flip (which happens with probability $3p^2(1-p) + p^3 \approx 0.028$), we decode incorrectly.

More generally, an $n$-repetition code (with $n$ odd) can correct up to $\lfloor(n-1)/2\rfloor$ errors using majority voting. As $n$ increases, the probability of decoding failure decreases, but at a cost: we must transmit $n$ bits for each bit of actual information.

3.1.3 Code Rate: The Price of Reliability

The code rate quantifies the efficiency of an error-correcting code.

Note

Definition: Code Rate. For a code that encodes $k$ information bits into $n$ transmitted bits, the code rate is $ R = k / n $ The rate satisfies $0 < R <= 1$, where $R = 1$ means no redundancy (and no error correction), while $R$ close to 0 indicates heavy redundancy.

Tip

Intuition. Think of code rate like packing efficiency. Imagine shipping fragile items: you could wrap each item in lots of bubble wrap (low rate, high protection) or use minimal packaging (high rate, less protection). The code rate tells you what fraction of your “shipping capacity” carries actual goods versus protective padding. A rate of $1/3$ means two-thirds of every transmission is redundancy—expensive but robust.

For the $n$-repetition code, we have $k = 1$ and rate $R = 1/n$. The 3-repetition code has rate $R = 1/3 \approx 0.33$—we sacrifice two-thirds of our transmission capacity for error protection.

This raises a fundamental question: Is there a fundamental limit to how much information we can reliably transmit through a noisy channel?

Shannon’s celebrated noisy-channel coding theorem [@shannon1948] answers this affirmatively. For any channel, there exists a channel capacity $C$ such that: - For any rate $R < C$, there exist codes achieving arbitrarily low error probability - For any rate $R > C$, reliable communication is impossible

For the BSC with crossover probability $p$, the capacity is $ C_{} = 1 - H(p) $ where $H(p) = -p \log_2 p - (1-p) \log_2(1-p)$ is the binary entropy function. For the BEC with erasure probability $\epsilon$, the capacity is simply $ C_{} = 1 - $

Example: Capacity Comparison. Consider a BSC with $p = 0.1$ and a BEC with $\epsilon = 0.1$: - $C_{\text{BSC}} = 1 - H(0.1) \approx 1 - 0.469 = 0.531$ - $C_{\text{BEC}} = 1 - 0.1 = 0.9$

The BEC has higher capacity because the receiver knows where errors occurred. The 3-repetition code (rate 0.33) is well below both capacities, suggesting there exist much more efficient codes.

Shannon’s theorem is existential: it proves good codes exist but doesn’t tell us how to construct them. The quest for practical codes approaching channel capacity has driven decades of research, culminating in the development of LDPC codes and turbo codes that come remarkably close to the Shannon limit.

Note

Remark. The repetition code is far from optimal. A $[7,4,3]$ Hamming code (which we will study in the next section) achieves rate $R = 4/7 \approx 0.57$ while still correcting single errors—much better than the rate-$1/3$ 3-repetition code with the same error-correcting capability.

3.2 Linear Codes and Matrix Representation

The repetition code, while simple, is highly inefficient. To construct better codes, we turn to linear algebra over finite fields. Linear codes form the most important class of error-correcting codes, combining mathematical elegance with practical efficiency.

3.2.1 Binary Linear Codes

All arithmetic in this section is performed over the binary field $\mathbb{F}_2 = \{0, 1\}$, where addition is XOR ($0 + 1 = 1$, $1 + 1 = 0$) and multiplication is AND.

Note

Definition: Linear Code. A binary linear code of length $n$ and dimension $k$ is a $k$-dimensional linear subspace $\mathcal{C} \subseteq \mathbb{F}_2^n$. Such a code is denoted as an $[n, k]$ code and contains $2^k$ codewords.

The linearity property has important consequences: - The sum of any two codewords is also a codeword - The all-zeros vector $\mathbf{0}$ is always a codeword - Encoding and decoding can be performed using matrix operations

3.2.2 Generator Matrix

A linear code can be completely described by a generator matrix.

Note

Definition: Generator Matrix. A generator matrix $G$ for an $[n, k]$ code is a $k \times n$ matrix whose rows form a basis for the code. Every codeword $\mathbf{c}$ can be written as $ = G $ where $\mathbf{m} \in \mathbb{F}_2^k$ is the message (information) vector.

The encoding process is simply matrix multiplication: given a $k$-bit message $\mathbf{m} = (m_1, m_2, ..., m_k)$, we compute the $n$-bit codeword $\mathbf{c} = \mathbf{m} G$.

Note

Definition: Systematic Form. A generator matrix is in systematic form if $G = [I_k | P]$, where $I_k$ is the $k \times k$ identity matrix and $P$ is a $k \times (n-k)$ matrix. In this form, the first $k$ bits of every codeword equal the message bits.

Example: 3-Repetition Code Generator. The 3-repetition code has $k = 1$ and $n = 3$. Its generator matrix is simply $ G = \[\begin{pmatrix} 1 & 1 & 1 \end{pmatrix}\]

$ Encoding the message $m = 1$ gives $\mathbf{c} = (1) \cdot \begin{pmatrix} 1 & 1 & 1 \end{pmatrix} = (1, 1, 1)$.

3.2.3 Parity-Check Matrix

An alternative description of a linear code uses the parity-check matrix.

Note

Definition: Parity-Check Matrix. A parity-check matrix $H$ for an $[n, k]$ code is an $(n-k) \times n$ matrix such that a vector $\mathbf{x} \in \mathbb{F}_2^n$ is a codeword if and only if $ H ^T = $ The rows of $H$ define the parity-check equations that all codewords must satisfy.

Important

Proposition: Duality Relation. If $G$ is a generator matrix and $H$ is a parity-check matrix for the same code, then $ G H^T = $ Equivalently, every row of $G$ is orthogonal to every row of $H$ (over $\mathbb{F}_2$).

When $G = [I_k | P]$ is in systematic form, the corresponding parity-check matrix is $H = [-P^T | I_{n-k}] = [P^T | I_{n-k}]$ (since $-1 = 1$ in $\mathbb{F}_2$).

3.2.4 Syndrome and Error Detection

The parity-check matrix provides an efficient method for error detection.

Note

Definition: Syndrome. For a received vector $\mathbf{r} \in \mathbb{F}_2^n$, the syndrome is $ = H ^T $ If $\mathbf{s} = \mathbf{0}$, then $\mathbf{r}$ is a valid codeword. If $\mathbf{s} \neq \mathbf{0}$, an error has been detected.

Tip

Intuition. Think of the syndrome as a “fingerprint” of the error. Just as a doctor diagnoses illness from symptoms without knowing the patient’s entire medical history, the syndrome tells us about the error without revealing the original message. Different error patterns leave different fingerprints, allowing us to identify and correct them. The key insight: the syndrome depends only on what went wrong, not on what was sent.

Suppose the transmitted codeword is $\mathbf{c}$ and the received vector is $\mathbf{r} = \mathbf{c} + \mathbf{e}$, where $\mathbf{e}$ is the error pattern. Then $ = H ^T = H ( + )^T = H ^T + H ^T = + H ^T = H ^T $

The syndrome depends only on the error pattern, not on which codeword was sent. This is the key insight that makes syndrome decoding possible.

Warning

Common Misconception. A common mistake is thinking that a zero syndrome means “no errors occurred.” Actually, a zero syndrome only means the received vector is a valid codeword—it could be the correct codeword, or it could be a different codeword reached by an undetectable error pattern. If enough errors occur to transform one codeword into another, the syndrome will be zero but decoding fails silently. This is why minimum distance matters: a code with distance $d$ guarantees that fewer than $d$ errors always produce a non-zero syndrome.

3.2.5 The Hamming Code: A Complete Example

The $[7, 4, 3]$ Hamming code is the smallest perfect single-error-correcting code and serves as an excellent illustration of these concepts.

Note

Definition: Hamming Code. The $[7, 4, 3]$ Hamming code has parameters: - Length $n = 7$ (transmitted bits) - Dimension $k = 4$ (information bits) - Minimum distance $d = 3$ (can correct 1 error) - Rate $R = 4/7 \approx 0.571$

The parity-check matrix of the Hamming code is constructed so that its columns are all non-zero 3-bit binary vectors:

$ H = \[\begin{pmatrix} 1 & 0 & 1 & 0 & 1 & 0 & 1 \\ 0 & 1 & 1 & 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 1 & 1 & 1 & 1 \end{pmatrix}\]

Reading the columns: $001, 010, 011, 100, 101, 110, 111$ — these are simply the binary representations of $1, 2, 3, 4, 5, 6, 7$.

The corresponding systematic generator matrix is:

$ G = \[\begin{pmatrix} 1 & 0 & 0 & 0 & 1 & 0 & 1 \\ 0 & 1 & 0 & 0 & 1 & 1 & 0 \\ 0 & 0 & 1 & 0 & 1 & 1 & 1 \\ 0 & 0 & 0 & 1 & 0 & 1 & 1 \end{pmatrix}\]

Example: Hamming Code Encoding. To encode the message $\mathbf{m} = (1, 0, 1, 1)$:

$ = G = (1, 0, 1, 1) \[\begin{pmatrix} 1 & 0 & 0 & 0 & 1 & 0 & 1 \\ 0 & 1 & 0 & 0 & 1 & 1 & 0 \\ 0 & 0 & 1 & 0 & 1 & 1 & 1 \\ 0 & 0 & 0 & 1 & 0 & 1 & 1 \end{pmatrix}\]

= (1, 0, 1, 1, 0, 0, 1) $

The first 4 bits $(1, 0, 1, 1)$ are the message, and the last 3 bits $(0, 0, 1)$ are the parity bits.

Example: Hamming Code Decoding. Suppose we receive $\mathbf{r} = (1, 0, 1, 1, 0, 1, 1)$ — note the error in position 6 (the second-to-last bit should be 0).

Computing the syndrome: $ = H ^T = \[\begin{pmatrix} 1 & 0 & 1 & 0 & 1 & 0 & 1 \\ 0 & 1 & 1 & 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 1 & 1 & 1 & 1 \end{pmatrix} \begin{pmatrix} 1 \\ 0 \\ 1 \\ 1 \\ 0 \\ 1 \\ 1 \end{pmatrix}\] = \[\begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}\]

The syndrome $(1, 1, 0)$ is the binary representation of 6, telling us the error is in position 6. Flipping bit 6 recovers the correct codeword $(1, 0, 1, 1, 0, 0, 1)$.

This elegant property — the syndrome directly indicating the error position — is why the Hamming code is called perfect: every possible syndrome corresponds to either no error or exactly one correctable error.

Note

Remark. The Hamming code achieves rate $4/7 \approx 0.571$, much better than the 3-repetition code’s rate of $1/3 \approx 0.333$, while both correct single errors. This improvement comes from the algebraic structure of linear codes.

3.2.6 Worked Example: Complete Hamming Code Cycle

Let us walk through a complete encode-transmit-decode cycle for the $[7,4,3]$ Hamming code, showing each step in detail.

Step 1: Prepare the message. Suppose Alice wants to send the 4-bit message $\mathbf{m} = (1, 0, 1, 1)$ to Bob.

Step 2: Encode using the generator matrix. Alice computes the codeword by multiplying the message with the generator matrix: $ = G = (1, 0, 1, 1) \[\begin{pmatrix} 1 & 0 & 0 & 0 & 1 & 0 & 1 \\ 0 & 1 & 0 & 0 & 1 & 1 & 0 \\ 0 & 0 & 1 & 0 & 1 & 1 & 1 \\ 0 & 0 & 0 & 1 & 0 & 1 & 1 \end{pmatrix}\]

Computing each bit of the codeword: - $c_1 = 1 \cdot 1 + 0 \cdot 0 + 1 \cdot 0 + 1 \cdot 0 = 1$ - $c_2 = 1 \cdot 0 + 0 \cdot 1 + 1 \cdot 0 + 1 \cdot 0 = 0$ - $c_3 = 1 \cdot 0 + 0 \cdot 0 + 1 \cdot 1 + 1 \cdot 0 = 1$ - $c_4 = 1 \cdot 0 + 0 \cdot 0 + 1 \cdot 0 + 1 \cdot 1 = 1$ - $c_5 = 1 \cdot 1 + 0 \cdot 1 + 1 \cdot 1 + 1 \cdot 0 = 0$ (mod 2) - $c_6 = 1 \cdot 0 + 0 \cdot 1 + 1 \cdot 1 + 1 \cdot 1 = 0$ (mod 2) - $c_7 = 1 \cdot 1 + 0 \cdot 0 + 1 \cdot 1 + 1 \cdot 1 = 1$ (mod 2)

Result: $\mathbf{c} = (1, 0, 1, 1, 0, 0, 1)$

Step 3: Transmit through noisy channel. Alice sends $\mathbf{c} = (1, 0, 1, 1, 0, 0, 1)$ through a BSC. An error occurs in position 3, flipping bit 3 from 1 to 0.

Bob receives: $\mathbf{r} = (1, 0, 0, 1, 0, 0, 1)$

Step 4: Compute the syndrome. Bob computes the syndrome to detect and locate errors: $ = H ^T = \[\begin{pmatrix} 1 & 0 & 1 & 0 & 1 & 0 & 1 \\ 0 & 1 & 1 & 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 1 & 1 & 1 & 1 \end{pmatrix} \begin{pmatrix} 1 \\ 0 \\ 0 \\ 1 \\ 0 \\ 0 \\ 1 \end{pmatrix}\]

Computing each syndrome bit: - $s_1 = 1 \cdot 1 + 0 \cdot 0 + 1 \cdot 0 + 0 \cdot 1 + 1 \cdot 0 + 0 \cdot 0 + 1 \cdot 1 = 0$ (mod 2) - $s_2 = 0 \cdot 1 + 1 \cdot 0 + 1 \cdot 0 + 0 \cdot 1 + 0 \cdot 0 + 1 \cdot 0 + 1 \cdot 1 = 1$ - $s_3 = 0 \cdot 1 + 0 \cdot 0 + 0 \cdot 0 + 1 \cdot 1 + 1 \cdot 0 + 1 \cdot 0 + 1 \cdot 1 = 0$ (mod 2)

Result: $\mathbf{s} = (0, 1, 1)^T$

Step 5: Identify the error location. The syndrome $(0, 1, 1)$ in binary equals $011_2 = 3$ in decimal. This indicates an error in position 3.

Step 6: Correct the error. Bob flips bit 3 of the received vector: $ _{} = (1, 0, 0 , 1, 0, 0, 1) = (1, 0, 1, 1, 0, 0, 1) $

Step 7: Extract the message. Since the code is systematic, the first 4 bits are the original message: $ _{} = (1, 0, 1, 1) $

Bob successfully recovers Alice’s original message despite the channel error.

Note

Remark. Python Implementation. The Hamming code encoding and decoding can be implemented concisely in Python:

import numpy as np

# Hamming [7,4,3] generator and parity-check matrices
G = np.array([
    [1, 0, 0, 0, 1, 0, 1],
    [0, 1, 0, 0, 1, 1, 0],
    [0, 0, 1, 0, 1, 1, 1],
    [0, 0, 0, 1, 0, 1, 1]
])

H = np.array([
    [1, 0, 1, 0, 1, 0, 1],
    [0, 1, 1, 0, 0, 1, 1],
    [0, 0, 0, 1, 1, 1, 1]
])

# Encode
message = np.array([1, 0, 1, 1])
codeword = message @ G % 2
print(f"Codeword: {codeword}")  # [1 0 1 1 0 0 1]

# Simulate error in position 6
received = codeword.copy()
received[5] = 1 - received[5]

# Decode via syndrome
syndrome = H @ received % 2
error_pos = int(''.join(map(str, syndrome)), 2)  # Convert binary to position
print(f"Syndrome: {syndrome}, Error at position: {error_pos}")  # Syndrome: [1 1 0], Error at position: 6

# Correct
if error_pos > 0:
    received[error_pos - 1] = 1 - received[error_pos - 1]

# Extract message (systematic code)
decoded = received[:4]
print(f"Decoded: {decoded}")  # [1 0 1 1]

This implementation follows exactly the steps described in the worked example above.

Code	$[n,k,d]$	Rate $R$	Corrects	Remarks
3-Repetition	$[3,1,3]$	$1/3$	1 error	Simple but inefficient
Hamming	$[7,4,3]$	$4/7$	1 error	Perfect code
Extended Hamming	$[8,4,4]$	$1/2$	1 error	Can detect 2 errors

Table: Comparison of classical error-correcting codes. The repetition code is simple but has low rate, while Hamming codes achieve optimal single-error correction with better efficiency.

3.3 Hamming Distance and Correction Capability

The error-correcting capability of a code is determined by its minimum distance, a fundamental parameter that quantifies how far apart any two distinct codewords are. This section introduces the key distance measures and shows how they relate to a code’s ability to detect and correct errors.

3.3.1 Hamming Weight and Distance

The Hamming weight counts the number of non-zero components in a vector, while the Hamming distance measures how many positions differ between two vectors.

Note

Definition: Hamming Weight. The Hamming weight $wt(\mathbf{x})$ of a binary vector $\mathbf{x} \in \mathbb{F}_2^n$ is the number of positions where $\mathbf{x}$ has a 1: $ wt() = _{i=1}^n x_i $

Note

Definition: Hamming Distance. The Hamming distance $d(\mathbf{x}, \mathbf{y})$ between two vectors $\mathbf{x}, \mathbf{y} \in \mathbb{F}_2^n$ is the number of positions where they differ: $ d(, ) = wt( + ) $

The Hamming distance satisfies the metric properties: - Non-negativity: $d(\mathbf{x}, \mathbf{y}) \geq 0$ with equality iff $\mathbf{x} = \mathbf{y}$ - Symmetry: $d(\mathbf{x}, \mathbf{y}) = d(\mathbf{y}, \mathbf{x})$ - Triangle inequality: $d(\mathbf{x}, \mathbf{z}) \leq d(\mathbf{x}, \mathbf{y}) + d(\mathbf{y}, \mathbf{z})$

Example: Hamming Distance Calculation. Consider $\mathbf{x} = (1, 0, 1, 1, 0)$ and $\mathbf{y} = (1, 1, 1, 0, 0)$. Their Hamming weight and distance are: $ wt() = 3 d(, ) = 2 $ They differ only in positions 2 and 4.

3.3.2 Minimum Distance

For a linear code, the minimum distance is simply the minimum weight among all non-zero codewords.

Note

Definition: Minimum Distance. The minimum distance $d$ of a code $\mathcal{C}$ is the minimum Hamming distance between any two distinct codewords: $ d = _{_1 _2 } d(_1, _2) $ For a linear code, this simplifies to $d = \min_{\mathbf{c} \neq \mathbf{0}} wt(\mathbf{c})$.

Tip

Intuition. Imagine codewords as cities on a map, with Hamming distance as the road distance between them. The minimum distance is the shortest trip between any two cities. If cities are far apart (large $d$), a traveler (received word) who gets lost can still find the nearest city unambiguously. With $d = 3$, even one wrong turn still leaves you closer to your origin than any other city—that’s why we can correct 1 error.

The notation $[n, k, d]$ completely specifies a binary linear code by its length, dimension, and minimum distance. The parameters satisfy several important bounds.

Example: Hamming Code Distance. The $[7, 4, 3]$ Hamming code has minimum distance $d = 3$. The smallest non-zero weight codewords have weight 3, and there are no weight-1 or weight-2 codewords. This minimum distance of 3 gives the code its single-error-correcting capability.

3.3.3 Error Detection and Correction Capability

The minimum distance directly determines how many errors a code can detect and correct.

Important

Proposition: Detection and Correction Bounds. A code with minimum distance $d$ can: - Detect up to $d-1$ errors (any error pattern of weight $< d$ changes the syndrome) - Correct up to $t = \lfloor(d-1)/2\rfloor$ errors (error patterns within a ball of radius $t$ around each codeword are uniquely decodable)

Hamming balls of radius $t = 1$ around codewords $\mathbf{c}_1$ and $\mathbf{c}_2$. The balls are disjoint when $d > 2t$.

The proof follows from the triangle inequality: if two codewords have distance $d$, their Hamming balls of radius $t$ are disjoint when $2t < d$. For unique decoding, we require $d > 2t$, giving $t = \lfloor(d-1)/2\rfloor$.

Example: Correction Capability. The Hamming code with $d = 3$ can correct $t = \lfloor(3-1)/2\rfloor = 1$ error. It can also detect up to $d-1 = 2$ errors. A received vector with 2 errors will have a non-zero syndrome (indicating an error) but decoding to the wrong codeword is possible.

3.3.4 The Singleton Bound

The Singleton bound relates the minimum distance to the code’s length and dimension.

Important

Proposition: Singleton Bound. For any $[n, k, d]$ code, the parameters satisfy $ d n - k + 1 $

Proof. Consider any $[n, k, d]$ linear code. Delete the first $n-k+1$ columns of the generator matrix $G$. The resulting matrix has $k$ rows and $k-1$ columns, which must be linearly dependent. Thus, there exists a non-zero vector $\mathbf{m}$ such that $\mathbf{m} G' = \mathbf{0}$, where $G'$ is the shortened matrix. The corresponding codeword $\mathbf{c} = \mathbf{m} G$ has weight at most $n - (k-1) = n - k + 1$. Since $d$ is the minimum weight, $d \leq n - k + 1$.

Codes achieving the Singleton bound are called Maximum Distance Separable (MDS) codes. The repetition code and Hamming code are not MDS, but there exist MDS codes with special structures.

Note

Remark. The Singleton bound shows that we cannot simultaneously maximize rate $R = k/n$ and minimum distance $d/n$. High-rate codes have small minimum distance, while codes with large minimum distance must have low rate. This fundamental tension drives the design of practical error-correcting codes.

3.4 Decoding Problem

Given a received vector from a noisy channel, the decoder must determine the most likely transmitted codeword. This section formulates the decoding problem and introduces the concept of syndrome-based decoding.

3.4.1 Maximum Likelihood Decoding

The optimal decoding strategy in terms of minimizing error probability is Maximum Likelihood Decoding (MLD).

Note

Definition: Maximum Likelihood Decoding. Given a received vector $\mathbf{r}$, MLD selects the codeword $\mathbf{c}^*$ that maximizes the conditional probability: $ ^* = _{ } ( | ) $ By Bayes’ rule, this is equivalent to maximizing $\Pr(\mathbf{r} | \mathbf{c}) \Pr(\mathbf{c})$. For equally likely messages, this reduces to maximizing the likelihood $\Pr(\mathbf{r} | \mathbf{c})$.

For the Binary Symmetric Channel with crossover probability $p$, the likelihood of receiving $\mathbf{r}$ given transmitted codeword $\mathbf{c}$ is: $ ( | ) = p^{d(, )} (1-p)^{n - d(, )} $

Maximizing this is equivalent to minimizing the Hamming distance $d(\mathbf{c}, \mathbf{r})$. Thus, for the BSC with $p < 1/2$, MLD is simply minimum distance decoding: choose the codeword closest to the received vector.

Example: MLD on BSC. For the 3-repetition code on a BSC with $p = 0.1$, suppose we receive $\mathbf{r} = (0, 1, 1)$. The possible codewords are $000$ and $111$: - Distance to $000$: $d = 2$, likelihood $= p^2(1-p) = 0.009$ - Distance to $111$: $d = 1$, likelihood $= p(1-p)^2 = 0.081$

MLD selects $111$ since it has higher likelihood.

3.4.2 Computational Complexity

Unfortunately, MLD is computationally intractable for most codes of practical interest.

Important

Proposition: MLD NP-Hardness. Maximum likelihood decoding for general linear codes is NP-hard [@garey1979].

The brute-force approach requires checking all $2^k$ codewords, which is exponential in $k$. For codes with $k = 1000$ (modest by modern standards), this means checking $2^{1000}$ codewords—completely infeasible.

Even for small codes, the structure doesn’t help: finding the closest codeword to a given vector is equivalent to solving a general linear programming problem over $\mathbb{F}_2$, which remains computationally hard.

Note

Remark. This computational barrier motivates the development of efficient approximate decoding algorithms. Belief Propagation, which we study in Chapter 2, provides near-optimal performance for many codes at a fraction of MLD’s complexity.

3.4.3 Syndrome Decoding

An alternative perspective on decoding exploits the structure of linear codes. Instead of searching over all codewords, we search over error patterns.

Given the parity-check matrix $H$, the syndrome of a received vector is $s = H \mathbf{r}^T$. If $\mathbf{r} = \mathbf{c} + \mathbf{e}$ where $\mathbf{c}$ is a codeword and $\mathbf{e}$ is the error, then: $ s = H ( + )^T = H ^T + H ^T = H ^T $

The syndrome depends only on the error, not on the transmitted codeword. This is the key insight behind syndrome decoding.

Note

Definition: Syndrome Decoding. Syndrome decoding solves the equivalent problem: given syndrome $\mathbf{s}$, find an error pattern $\mathbf{e}$ with minimum weight such that $H \mathbf{e}^T = \mathbf{s}$. The decoded codeword is then $\mathbf{r} + \mathbf{e}$.

The set of error patterns with the same syndrome form a coset of the code. The minimum weight error in each coset is the coset leader.

Example: Syndrome Decoding for Hamming Code. The Hamming code has parity-check matrix $H$ with columns equal to all non-zero 3-bit vectors. For syndrome $\mathbf{s} = (1, 1, 0)^T$, we find: - If $\mathbf{s}$ equals a column of $H$, say column 6, then $\mathbf{e}$ is the unit vector with 1 in position 6 - If $\mathbf{s} = \mathbf{0}$, then $\mathbf{e} = \mathbf{0}$ (no error)

The syndrome directly gives the error location, which is why the Hamming code is called “perfect.”

The syndrome decoding approach reduces the search space from $2^n$ possible error patterns to $2^{n-k}$ possible syndromes. For the Hamming code, this means checking only 8 syndromes instead of 128 possible error patterns.

3.4.4 Degeneracy: Different Errors, Same Syndrome

A crucial concept that will become important in quantum coding is degeneracy: different error patterns may produce the same syndrome.

Note

Definition: Degeneracy. Two error patterns $\mathbf{e}_1$ and $\mathbf{e}_2$ are degenerate if they have the same syndrome: $H \mathbf{e}_1^T = H \mathbf{e}_2^T$. Their difference $\mathbf{e}_1 + \mathbf{e}_2$ is a codeword.

Degenerate errors are indistinguishable from syndrome measurements alone. However, they may have different effects on the encoded information. In classical coding, this doesn’t help much—we still need to correct the error. But in quantum coding, degenerate errors can be exploited to reduce the effective error rate.

Example: Classical Degeneracy. Consider a code with minimum distance $d = 4$. The error patterns $\mathbf{e}_1$ with weight 1 and $\mathbf{e}_2$ with weight 3 might have the same syndrome if $\mathbf{e}_1 + \mathbf{e}_2$ is a codeword of weight 4. Both errors produce the same syndrome, but their effects on the decoded message differ.

Note

Remark. Degeneracy is a purely classical concept, but it becomes far more important in quantum codes. Quantum codes can exploit degeneracy to achieve better performance than classical codes with equivalent parameters, because the syndrome doesn’t distinguish between equivalent quantum errors. This will be explored in detail in Chapter 3.

The syndrome decoding formulation leads naturally to efficient approximate algorithms. Instead of finding the global minimum weight error pattern, we can iteratively update beliefs about each bit’s value—the subject of the next chapter.

Key Takeaways

Linear codes encode $k$ information bits into $n$ codeword bits using generator matrix $G$; the code rate $R = k/n$ measures efficiency
Parity-check matrix $H$ defines constraints that all codewords satisfy; $H \mathbf{c}^T = \mathbf{0}$ for any codeword $\mathbf{c}$
Syndrome $\mathbf{s} = H \mathbf{r}^T$ reveals error information without exposing the original message; zero syndrome means valid codeword (not necessarily correct)
Minimum distance $d$ determines correction capability: can correct $t = \lfloor(d-1)/2\rfloor$ errors and detect $d-1$ errors
Hamming code $[7,4,3]$ is a perfect single-error-correcting code where syndrome directly indicates error position
Singleton bound $d \leq n - k + 1$ limits achievable distance for given rate

3.5 Exercises

Hamming Code Encoding. Consider the $[7,4,3]$ Hamming code with generator matrix $G$ given in this chapter.
1. Encode the message $\mathbf{m} = (0, 1, 1, 0)$ using the generator matrix. Show your calculation.
2. Verify that your codeword satisfies all parity-check equations by computing $H \mathbf{c}^T$.
3. What is the Hamming weight of your codeword? Is this the minimum possible weight for a non-zero codeword in this code?
Syndrome Decoding. Suppose you receive the vector $\mathbf{r} = (1, 1, 0, 1, 0, 1, 0)$ from a channel using the $[7,4,3]$ Hamming code.
1. Compute the syndrome $\mathbf{s} = H \mathbf{r}^T$.
2. Determine which bit (if any) is in error based on the syndrome.
3. Correct the error and extract the original 4-bit message.
4. If two bits were flipped during transmission, would the Hamming code be able to correct the error? Explain why or why not.
Code Parameters and Bounds. Consider designing a binary linear code.
1. Using the Singleton bound, what is the maximum possible minimum distance for a $[15, 7]$ code?
2. The $[15, 11, 3]$ Hamming code exists. Verify that it satisfies the Singleton bound.
3. Explain intuitively why we cannot have a $[7, 4, 4]$ code (i.e., why the Singleton bound prevents this).
4. For a code that can correct $t = 2$ errors, what is the minimum required distance $d$?

Bridge to Next Chapter

We have established the foundations of classical error correction: linear codes, parity-check matrices, syndromes, and the Hamming code. However, syndrome decoding by exhaustive search is computationally intractable for large codes. The next chapter introduces Low-Density Parity-Check (LDPC) codes and Belief Propagation, which enable efficient near-optimal decoding by exploiting the sparse structure of the parity-check matrix. The message-passing algorithms we develop there will later extend directly to quantum error correction.