250624 Compressed Sensing Chapter 1 note
Fundamental Concepts from Set Theory & Linear Algebra
-
Subset
A setA
is a subset of another setB
if all elements ofA
are also elements ofB
. This is denoted as .- Example from the text: The chapter discusses selecting a portion of a signal’s coefficients. If you have a signal with
n
coefficients, the set of all coefficient indices is . An index set that points to the locations of the k-largest entries is a subset of the full set of indices, so .
- Example from the text: The chapter discusses selecting a portion of a signal’s coefficients. If you have a signal with
-
Vector Space
A vector space is a collection of objects called vectors, which can be added together and multiplied by scalars (numbers). The key idea is that if you take any two vectors from the space and add them, you get another vector that is still within that space. Similarly, scaling a vector keeps it within the space. The text models signals as vectors in a vector space. -
Span
The span of a set of vectors is the set of all possible vectors you can create by taking linear combinations of them. In other words, if you have vectors , their span is the set of all vectors that can be written as for some scalar coefficients . -
Linear Independence
A set of vectors is linearly independent if no vector in the set can be written as a linear combination of the others. A more formal way to say this is that the only way to get the zero vector from their linear combination () is if all the scalar coefficients () are zero. -
Subspace
A subspace is a vector space that is contained within another, larger vector space. To be a subspace, a set of vectors must satisfy three rules:- It must contain the zero vector.
- It must be “closed under addition”: if you add any two vectors from the set, their sum is also in the set.
- It must be “closed under scalar multiplication”: if you multiply any vector in the set by a scalar, the result is also in the set.
- Example from the text: The set of all 2-sparse signals in is not a single subspace, because adding two 2-sparse signals can result in a 3-sparse signal. Instead, it’s a union of subspaces, where each subspace is a plane defined by two coordinate axes (like the x-y plane, x-z plane, etc.).
-
Null Space
The null space of a matrix , denoted , is the set of all vectors that give the zero vector when multiplied by .- Relevance in CS: In compressed sensing, if you have two different sparse signals, and , that produce the same measurement (i.e., ), then their difference, , must be in the null space of because . To guarantee unique recovery, the null space of should not contain any sparse vectors (other than the zero vector).
(n-dimensional Euclidean Space)
- Definition: The symbol refers to the n-dimensional Euclidean space. It is the set of all vectors that consist of
n
real-number elements. - Breakdown of the Notation:
- The stands for the set of all real numbers.
- The superscript n indicates the dimension, meaning each vector in this set is an ordered list of
n
real numbers, like .
- Context from the Text: The chapter states that signals can be viewed as vectors in an n-dimensional Euclidean space, which is denoted by . For example, is the 2D plane, and is 3D space.
Basis
- Definition: A set of vectors is called a basis for if the vectors in the set span and are linearly independent. This means two things:
- Span the space: The vectors in the set can be scaled and added in some combination to create any vector in the entire space.
- Linearly Independent: There are no redundant vectors in the set. No vector in the basis can be created from a combination of the others.
- Analogy: A basis is like a set of fundamental directions for a space. For the 2D plane (), the standard basis is the pair of vectors
[1, 0]
(the x-direction) and[0, 1]
(the y-direction). You can get to any point on the plane by moving some amount in the x-direction and some amount in the y-direction.
Dimension
-
Definition: The dimension of a vector space is the number of vectors in any of its bases.
-
Key Property: For any given vector space, although there can be many different sets of basis vectors, every basis for that space will have the exact same number of vectors. This unique number defines the dimension of the space.
- For example, any basis for the 2D plane () must have exactly two vectors, so its dimension is 2.
- A line passing through the origin is a 1-dimensional subspace because its basis consists of only one vector.
-
Rank
The rank of a matrix is the maximum number of linearly independent columns (or, equivalently, rows) in the matrix. It represents the dimension of the subspace spanned by its columns.- Relevance in CS: The concept of rank is central to the low-rank matrix model. Just as a sparse vector has few non-zero entries, a low-rank matrix has “few” linearly independent columns/rows, meaning it contains less information than its ambient dimensions suggest. This structure is another type of simplicity that can be exploited for recovery from limited data.
Affine space
An affine space is essentially a vector subspace that has been shifted so that it does not necessarily pass through the origin.
Here’s a breakdown:
- Vector Subspace: In , a one-dimensional vector subspace is a straight line that must pass through the origin (the point (0,0)).
- Affine Space: An affine space is a generalization. It can be a point, a line, or a plane that does not need to go through the origin. It’s created by taking a vector subspace and adding a fixed vector to every point in it, effectively “translating” it.
The “one-dimensional affine space A” is illustrated in Figure 1.2 as a straight line that does not pass through the origin. The goal is to find the point on this line that is closest to the signal vector .
A bit more math used in Compressed Sensing
Vector Space Concepts
Norms
For a vector , the norm is defined as:
where:
- represents the i-th element of the vector .
“Norm”
The “norm” counts the number of non-zero elements in a vector:
where:
- is the “support” of the vector , which is the set of indices of its non-zero elements, i.e., .
- denotes the cardinality (the number of elements) of the set.
Inner Product
The standard inner product in is defined as:
where:
- denotes the transpose of the vector .
Basis and Frames
- Basis: A set of vectors is a basis for if they span the space and are linearly independent. Any vector has a unique representation . In matrix form, this is , where:
- is the matrix whose columns are the basis vectors .
- is the vector containing the coefficients .
- Orthonormal Basis: A basis is orthonormal if .
- Frame: A set of vectors in () is a frame if for any vector , the following holds for constants :
- and are called the frame bounds. If , the frame is tight.
- Dual Frame: A frame is a dual frame of if , where is the identity matrix. The canonical dual frame is given by , where denotes the matrix inverse.
Signal Models
Sparse and Compressible Signals
- k-sparse signal: A signal is k-sparse if it has at most non-zero entries, i.e., . The set of all k-sparse signals is denoted by .
- Compressible signal: A signal is compressible if it can be well-approximated by a sparse signal. The approximation error is given by:
- Here, is the sparse approximation of the signal from the set .
- Power Law Decay: A signal is compressible if its sorted coefficients decay according to a power law:
- and are positive constants.
Union of Subspaces and Low-Rank Models
- Union of Subspaces: A signal lies in a union of subspaces if:
- are the individual subspaces, and is the total number of subspaces in the union.
- Low-Rank Matrix: The set of low-rank matrices is defined as:
- is a matrix of size .
- is the maximum rank of the matrices in the set.
Signal Recovery
Minimization
This involves solving one of the following optimization problems:
- Noise-free case (Basis Pursuit):
- Bounded noise case:
- Dantzig Selector:
- LASSO (unconstrained form):
In these formulas:
- is the estimated signal.
- finds the vector that minimizes the given objective function.
- is a constant representing the upper bound on the norm of the noise.
- is a regularization parameter that balances the trade-off between the data fidelity term () and the sparsity term ().
Recovery Guarantees
- Theorem 1.8 (Noise-Free Recovery): If satisfies the RIP of order 2k with , the solution to the noise-free minimization problem obeys:
- Theorem 1.9 (Bounded Noise): Under similar RIP conditions and with noise where , the solution obeys:
In these theorems:
- and are constants that depend on the RIP constant .
- represents a noise vector corrupting the measurements.
Multiple Measurement Vector (MMV) Problem
This problem involves recovering a set of sparse vectors that share a common support. These vectors form the columns of a matrix which is row-sparse.
- Uniqueness Condition (Theorem 1.15): A row-sparse matrix is uniquely determined by if:
- is the matrix of measurements, with columns being the measurements for each signal .
- Norms: Used for matrix recovery, defined as , where is the i-th row of the matrix .
Understand matrix properties in compressed sensing
Coherence
The coherence of a matrix is the largest absolute inner product between any two distinct normalized columns :
- and are columns of the matrix .
Excellent request. The chapter introduces several properties of the sensing matrix A
, each with different strengths, guarantees, and requirements. Organizing them is the best way to see the relationships between them.
Here is an organization of the sensing matrix properties discussed in the chapter, from the most fundamental to the most powerful.
1. Spark
This is the most fundamental condition for guaranteeing a unique sparse solution.
Definition
The spark of a matrix , denoted is the smallest number of columns of that are linearly dependent.
I find the definition on Wikipedia is more precise:
https://en.wikipedia.org/wiki/Spark_(mathematics)
The spark of an matrix is the smallest integer , such that there exists a set of columns of that are linearly dependent
What it Guarantees
Theorem 1.1: For any vector , there is at most one signal such that if and only if .
It provides the definitive “if and only if” condition for the unique recovery of exactly k-sparse signals in a noise-free setting.
Key Requirement
-
For unique recovery of any k-sparse signal, the condition is .
-
Requirement on : It’s easy to see that . This condition immediately implies that the number of measurements , i.e., .
Practicality
Calculating the spark for a general matrix is NP-hard and computationally infeasible for large matrices. (https://en.wikipedia.org/wiki/Spark_(mathematics))
2. Null Space Property (NSP)
The NSP is a more refined condition on the null space of that is necessary for robust recovery.
Definition
A matrix has the Null Space Property of order if for a constant , the following holds for all and all index sets with :
- is the null space of matrix , which is the set of all vectors for which .
- is a subset of indices .
- is the complement of .
- is a vector formed by keeping the entries of indexed by and setting all other entries to zero.
- is a vector formed by keeping the entries of indexed by and setting all other entries to zero.
In other words, a matrix satisfying the NSP of order means that vectors in its null space are not overly concentrated on a small set of indices.
What it Guarantees
The NSP is presented as a necessary condition for any stable recovery algorithm.
Theorem 1.2: Let denote a sensing matrix and denote an arbitrary recovery algorithm. If the pair satisfies
then satisfies the NSP of order .
It shows that if any algorithm can robustly recover signals from compressive measurements, then the matrix must satisfy the NSP of order .
Practicality
Like the spark, it is generally hard to verify directly for a given matrix.
3. Restricted Isometry Property (RIP)
The RIP is the most powerful property discussed, forming the basis for many of the strongest theoretical guarantees in compressed sensing.
Definition
A matrix satisfies the RIP of order if there exists a such that for all k-sparse signals :
- is the “restricted isometry constant” of order .
In other words, a matrix satisfies the RIP of order if it approximately preserves the Euclidean length of all k-sparse vectors.
What it Guarantees
The RIP is a sufficient condition for many different algorithms (like minimization) to provide stable and robust recovery of sparse signals, even in the presence of noise.
Key Requirements & Properties
1. Necessity for Stability
C-stable definition: Let denote a sensing matrix and denote a recovery algorithm. We say that the pair is C-stable if for any and any we have that
This definition simply says that if we add a small amount of noise to the measurements, then the impact of this on the recovered signal should not be arbitrarily large.
Theorem 1.3: If a pair is C-stable, then
for all .
Theorem 1.3 shows that RIP’s lower bound is necessary for any recovery algorithm to be stable against measurement noise.
2. Lower Bound on
Theorem 1.4: Let be an matrix that satisfies the RIP of order with constant . Then
where is a constant depending only on .
The restriction to is arbitrary and is made merely for convenience - minor modifications to the argument establish bounds for for any .
Theorem 1.4 provides a precise lower bound: (i.e. ). This is necessary to satisfy the RIP and tells us the fundamental minimum number of measurements needed.
3. Relationship to NSP
Theorem 1.5: Suppose that satisfies the RIP of order with . Then satisfies the NSP of order with constant
RIP is a stronger condition than NSP. Theorem 1.5 shows that if a matrix satisfies the RIP (with a sufficiently small constant), it automatically satisfies the NSP.
Practicality
It is computationally infeasible to verify the RIP for large matrices. However, its main power comes from the fact that random matrices can be proven to satisfy the RIP with very high probability when is on the order of .
4. Coherence
Coherence is the most practical and easily computable property, though it often leads to stricter requirements.
Definition
The coherence of a matrix is the largest absolute inner product between any two distinct normalized columns :
- and are columns of the matrix .
What it Guarantees
It provides simple, checkable conditions for unique recovery and stability.
Key Requirements & Properties
1. Uniqueness Condition
Theorem 1.7: If
then for each measurement vector there exists at most one signal such that .
This uses coherence to provide a sufficient condition for uniqueness. This often requires to be much smaller than what RIP would allow.
It is possible to show that the coherence of a matrix is always in the range ; the lower bound is known as the Welch bound
Theorem 1.7, together with the Welch bound, provides an upper bound on the level of sparsity that guarantees uniqueness using coherence: .
2. Relationship to Spark
Lemma 1.4: For any matrix ,
Coherence provides a lower bound on the spark.
3. Relationship to RIP
Lemma 1.5: If has unit-norm columns and coherence , then satisfies the RIP of order k with for all .
Coherence can be used to establish a (sometimes loose) RIP constant for a matrix .
Practicality
Coherence is the only property listed that is easy to compute for any given matrix.
Summary Table
Property | Definition | What it Guarantees | Key Requirement(s) on or | Practicality |
---|---|---|---|---|
Spark | Smallest # of linearly dependent columns | Necessary and sufficient condition: uniqueness for k-sparse signals | (implies ) | Hard to compute |
NSP | Null space vectors are not “sparse” | Necessary for any robust/stable recovery | Must hold for order 2k | Hard to verify directly |
RIP | Near-isometry for sparse vectors | Sufficient for stable/robust recovery for many algorithms | Hard to verify, but some random matrices highly likely satisfy it | |
Coherence () | Max inner product between columns | Uniqueness and stability, but with stricter conditions | may be small, typically | Easy to compute |
Understand different error bounds of signal recovery via l1 minimization
Here is an organization of the signal recovery theorems for minimization, grouped by the main assumption made about the sensing matrix A
.
Group 1: Guarantees Based on the Restricted Isometry Property (RIP)
These theorems rely on the Restricted Isometry Property (RIP), a strong theoretical condition ensuring that the matrix A
nearly preserves the length of sparse vectors.
1.1 The Ideal Case: Noise-Free Recovery
This is the baseline scenario where measurements are perfect.
- Theorem: Theorem 1.8
- Conditions:
A
satisfies the RIP of order 2k, and the measurements are exact (). - Algorithm: Basis Pursuit ( subject to ).
- Error Bound: The recovery error is bounded by how “compressible” the original signal
x
is:- If the signal
x
is truly k-sparse, then its approximation error is zero, and the recovery is exact ().
- If the signal
1.2 The Realistic Case: Bounded Noise
This scenario assumes the measurement noise is contained within a known energy level.
- Theorem: Theorem 1.9
- Conditions:
A
satisfies RIP. The measurement noisee
is bounded by . - Algorithm: A quadratically constrained version of Basis Pursuit ( subject to ).
- Error Bound: The error has two parts: one from the signal’s compressibility and one from the noise:
1.3 A Specific Noise Model: Gaussian Noise
This is a very common scenario in practice. There are two main approaches.
-
Theorem: Corollary 1.1 (derived from Theorem 1.9)
- Conditions:
A
satisfies RIP, and the noise is i.i.d. Gaussian. - Guarantee: The error is proportional to the number of measurements
m
and the noise variance : .
- Conditions:
-
Theorem: Corollary 1.2 (derived from the Dantzig Selector, Theorem 1.10)
- Conditions:
A
has unit-norm columns, satisfies RIP, and the noise is i.i.d. Gaussian. - Guarantee: The error is proportional to the sparsity
k
, the dimensionn
, and the noise variance : . This is often a much tighter and more useful bound, ask log n
is typically much smaller thanm
.
- Conditions:
Group 2: Guarantees Based on Coherence
These theorems rely on coherence (), which measures the maximum correlation between any two columns of A
. Coherence is easier to calculate than RIP but often leads to stricter conditions on the signal’s sparsity k
.
-
Theorem: Theorem 1.11
- Conditions:
A
has a certain coherence , and the sparsityk
must be small relative to it (). - Error Bound: Provides a worst-case error bound for bounded noise:
- Conditions:
-
Theorem: Theorem 1.12 (using the LASSO algorithm)
- Conditions:
A
has coherence , the sparsityk
is small (), and the noise is Gaussian. - Error Bound: In addition to an error bound, this provides a powerful guarantee on recovering the correct sparse structure: with high probability, the recovered non-zero entries are a subset of the true non-zero entries ().
- Conditions:
Group 3: Theorems on the Nature of the Error Bounds
This final group explains why the error bounds from the RIP-based theorems look the way they do.
-
The “Pessimistic” Result:
- Theorem: Theorem 1.13
- What it says: It is impossible for any recovery algorithm to provide a “simple” error bound of the form unless you take a huge number of measurements (). This explains why the theorems use the more complicated term.
-
The “Optimistic” Result (with Randomness):
- Theorem: Theorem 1.14
- What it says: This result shows a way around the limitation of Theorem 1.13. If you use a randomized sensing matrix
A
, you can achieve the “simple” error bound with high probability, while still using very few measurements. This highlights a key advantage of using random matrices in compressed sensing.
Deterministic or Probabilistic Guarantees
Key Recovery Formulations
The following theorems provide guarantees for solutions to sparse recovery problems. These problems are typically formulated as optimizations, with the goal of finding a signal that is both sparse and consistent with the measurements . The chapter focuses on the following formulations.
-
The Ideal Minimization (Computationally Intractable)
This is the direct statement of finding the sparsest solution, but it is NP-hard and cannot be solved efficiently for large problems . -
The Constrained Minimization (Basis Pursuit-style)
This is the practical, convex relaxation of the problem and is the focus of most of the theorems. The constraint set changes based on the assumptions about noise. -
The Unconstrained Minimization (LASSO)
This is an alternative, unconstrained formulation that is equivalent to the constrained version for a specific choice of the regularization parameter .
Note that for (1.15) some choice of the parameter this optimization problem will yield the same result as the constrained version of the problem given by (1.12) with .
However, in general the value of which makes these problems equivalent is unknown a priori.
Types of Guarantees
The chapter presents two main flavors of guarantees:
- Deterministic Guarantees: These are absolute “if-then” statements. If you have a matrix with a specific property (like RIP), then a certain error bound is guaranteed to hold. The challenge is that deterministically constructing a matrix with these properties for large systems can be very difficult.
- Probabilistic Guarantees: These state that an error bound will hold with high probability, where the probability comes from the random choice of the matrix or the random nature of the noise. The most common approach in compressed sensing is to choose the matrix A randomly. The guarantee then is that with very high probability, the chosen matrix will have the desired properties (like RIP) and the recovery will be successful.
Group 1: Guarantees Based on the Restricted Isometry Property (RIP)
These theorems rely on the RIP, a strong theoretical condition ensuring the matrix nearly preserves the length of sparse vectors.
1.1 The Ideal Case: Noise-Free Recovery
This is the baseline scenario where measurements are perfect.
Theorem 1.8
Suppose that satisfies the RIP of order with and we obtain measurements of the form . Then when , the solution to (1.12) obeys
where
- Type of Guarantee: Deterministic.
- Conditions: satisfies the RIP of order , and the measurements are exact ().
Error Bound
If the signal x
is truly k-sparse, then its approximation error is zero, and the recovery is exact ().
1.2 The Realistic Case: Bounded Noise
This scenario assumes the measurement noise is contained within a known energy level.
Theorem 1.9
Suppose that satisfies the RIP of order with and let where . Then when the solution to (1.12) obeys
where
- Type of Guarantee: Deterministic.
- Conditions: satisfies RIP. The measurement noise is bounded by .
Error Bound
The error has two parts: one from the signal’s compressibility and one from the noise.
Theorem 1.10
Suppose that satisfies the RIP of order with and we obtain measurements of the form where Then when the solution to (1.12) obeys
where
- Type of Guarantee: Deterministic.
- Conditions: satisfies RIP. The noise is bounded such that .
- Algorithm: The Dantzig Selector (Eq. 1.12 with ).
- Error Bound:
1.3 A Specific Noise Model: Gaussian Noise
This is a very common scenario in practice. There are two main approaches.
Corollary 1.1 (derived from Theorem 1.9)
Suppose that satisfies the RIP of order with . Furthermore, suppose that and that we obtain measurements of the form where the entries of e are i.i.d. . Then when , the solution to (1.12) obeys
with probability at least , where is a constant.
- Type of Guarantee: Probabilistic (over the random draw of the noise).
- Conditions: satisfies RIP, noise is i.i.d. Gaussian, and is k-sparse.
Error Bound
The error is proportional to the sqrt of number of measurements and the noise variance
Corollary 1.2 (derived from the Dantzig Selector, Theorem 1.10)
Suppose that has unit-norm columns and satisfies the RIP of order with Furthermore, suppose that and that we obtain measurements of the form where the entries of are i.i.d. . Then when , the solution to (1.12) obeys
with probability at least .
- Type of Guarantee: Probabilistic (over the random draw of the noise).
- Conditions: satisfies RIP, has unit-norm columns, noise is i.i.d. Gaussian, and is k-sparse.
Error Bound
The error is proportional to the sqrt of sparsity , the sqrt of log of dimension , and the noise variance : . This might be a more useful bound, as may be much smaller than .
Group 2: Guarantees Based on Coherence
These theorems rely on coherence (), which measures the maximum correlation between any two columns of . Coherence is easier to calculate than RIP but often leads to stricter conditions on the signal’s sparsity (the signal should be sparser).
Theorem 1.11
Suppose that has coherence and that with Furthermore, suppose that we obtain measurements of the form Then when the solution to (1.12) obeys
- Type of Guarantee: Deterministic.
- Conditions: is small (), and is k-sparse.
Error Bound
Provides a worst-case analysis error bound for bounded noise, will typically overestimate the actual error.
Theorem 1.12
Suppose that has coherence and that with . Furthermore, suppose that we obtain measurements of the form where the entries of e are i.i.d. . Set
for some fairly small value . Then with probability exceeding
the solution to (1.15) is unique, and has the following error bound.
- Type of Guarantee: Probabilistic (over the random draw of the noise).
- Conditions: is small (), noise is i.i.d. Gaussian, and is k-sparse.
- Algorithm: LASSO (Eq. 1.15).
Error Bound
In addition to an error bound, this provides a powerful guarantee on recovering the correct sparse structure: with high probability, the recovered non-zero entries are a subset of the true non-zero entries ().
Group 3: Theorems on the Nature of the Error Bounds
This final group explains why the error bounds from the RIP-based theorems look the way they do.
Theorem 1.13 (The “Pessimistic” Result)
Suppose that is an matrix and that is a recovery algorithm that satisfies
for some then
- Type of Guarantee: Deterministic.
- What it says: It is impossible for any algorithm with a deterministic matrix to satisfy the “simple” error bound unless you take a huge number of measurements (). This explains why the theorems use term.
Theorem 1.14 (The “Optimistic” Result with Randomness)
Let be fixed. Set Suppose that is an sub-gaussian random matrix with Suppose we obtain measurements of the form . Set . Then with probability exceeding , when , the solution to (1.12) obeys the following bound.
- Type of Guarantee: Probabilistic (over the random choice of the matrix
A
). - What it says: If you use a random sensing matrix , you can achieve the “simple” error bound with high probability, using far fewer measurements.
- Error Bound:
Instance-Optimal Guarantees
An “instance-optimal” guarantee means the quality of the recovery error bound adapts to the specific signal instance that you are measuring. Instead of a single worst-case error for all signals, the error bound is a function of the properties of that particular signal, such as how compressible it is (e.g. measured by or ).
In deterministic cases, examples are Theorem 1.8, Theorem 1.9, Theorem 1.10.
This concept is then combined with probabilistic guarantees in two different ways.
The Two “Flavors” of Probabilistic Guarantees
The distinction comes down to whether one “good” random matrix works for all signals, or if the guarantee only applies to a specific signal-matrix pair.
1. The Stronger Guarantee: A “Universal” Random Matrix
This is the typical approach.
- The Process: You generate a random matrix once.
- The Guarantee: With very high probability, this single matrix you generated is “good.” A “good” matrix is one that satisfies a deterministic guarantee (like the RIP-based theorems) for all possible signals .
2. The Weaker Guarantee: “Instance-Optimal in Probability”
This is what Theorem 1.14 provides.
- The Process: You are first given a specific signal instance . Then, you generate a random matrix to measure it.
- The Guarantee: With high probability, the recovery of that specific signal using that specific matrix will be successful. The guarantee is probabilistic and only applies to the instance you started with. It doesn’t promise anything about how that same matrix would perform on a different signal.
In summary, “instance-optimal in probability” is the weaker type of probabilistic guarantee where the success probability is tied to a specific signal instance, and you might need to draw a new random matrix for each new signal.
Non Instance-Optimal Guarantees
Examples are Corollaries 1.1, 1.2 and Theorems 1.11, 1.12.
These guarantees provide a single, uniform error bound that applies to the entire class of k-sparse signals. As long as is k-sparse, the error bound is the same regardless of which specific k-sparse signal it is. The bound depends on parameters like , , , and the noise level, but not on the structure of beyond its k-sparsity.