Posted onInsw-hw
,
杂Disqus: Word count in article: 2.2kReading time ≈2 mins.
在处理一篇包含很多latex数学公式的post时出现了如下错误:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Error: [ERROR][hexo-renderer-pandoc] On D:\codedir\blog_source\source\_posts\250624-Compressed-Sensing-math.md [ERROR][hexo-renderer-pandoc] pandoc exited with code 64: Error parsing YAML metadata at (line 407, column 1): YAML parse exception at line 7, column 74: mapping values are not allowed in this context
at Hexo.pandocRenderer (D:\codedir\blog_source\node_modules\hexo-renderer-pandoc\lib\renderer.js:47:9) at Hexo.tryCatcher (D:\codedir\blog_source\node_modules\bluebird\js\release\util.js:16:23) at Hexo.<anonymous> (D:\codedir\blog_source\node_modules\bluebird\js\release\method.js:15:34) at D:\codedir\blog_source\node_modules\hexo\dist\hexo\render.js:73:28 at tryCatcher (D:\codedir\blog_source\node_modules\bluebird\js\release\util.js:16:23) at Promise._settlePromiseFromHandler (D:\codedir\blog_source\node_modules\bluebird\js\release\promise.js:547:31) at Promise._settlePromise (D:\codedir\blog_source\node_modules\bluebird\js\release\promise.js:604:18) at Promise._settlePromiseCtx (D:\codedir\blog_source\node_modules\bluebird\js\release\promise.js:641:10) at _drainQueueStep (D:\codedir\blog_source\node_modules\bluebird\js\release\async.js:97:12) at _drainQueue (D:\codedir\blog_source\node_modules\bluebird\js\release\async.js:86:9) at Async._drainQueues (D:\codedir\blog_source\node_modules\bluebird\js\release\async.js:102:5) at Async.drainQueues [as _onImmediate] (D:\codedir\blog_source\node_modules\bluebird\js\release\async.js:15:14) at process.processImmediate (node:internal/timers:485:21)
Posted onEdited onInresearchDisqus: Word count in article: 38kReading time ≈35 mins.
Fundamental Concepts from Set Theory & Linear Algebra
Subset
A set A is a subset of another set B if all elements of A are also elements of B. This is denoted as A⊂B.
Example from the text: The chapter discusses selecting a portion of a signal’s coefficients. If you have a signal with n coefficients, the set of all coefficient indices is {1,2,...,n}. An index set Λ that points to the locations of the k-largest entries is a subset of the full set of indices, so Λ⊂{1,2,...,n}.
Vector Space
A vector space is a collection of objects called vectors, which can be added together and multiplied by scalars (numbers). The key idea is that if you take any two vectors from the space and add them, you get another vector that is still within that space. Similarly, scaling a vector keeps it within the space. The text models signals as vectors in a vector space.
Span
The span of a set of vectors is the set of all possible vectors you can create by taking linear combinations of them. In other words, if you have vectors v1,v2,...,vk, their span is the set of all vectors w that can be written as w=c1v1+c2v2+...+ckvk for some scalar coefficients ci.
Linear Independence
A set of vectors is linearly independent if no vector in the set can be written as a linear combination of the others. A more formal way to say this is that the only way to get the zero vector from their linear combination (c1v1+c2v2+...+ckvk=0) is if all the scalar coefficients (c1,c2,...) are zero.
Subspace
A subspace is a vector space that is contained within another, larger vector space. To be a subspace, a set of vectors must satisfy three rules:
It must contain the zero vector.
It must be “closed under addition”: if you add any two vectors from the set, their sum is also in the set.
It must be “closed under scalar multiplication”: if you multiply any vector in the set by a scalar, the result is also in the set.
Example from the text: The set of all 2-sparse signals in R3 is not a single subspace, because adding two 2-sparse signals can result in a 3-sparse signal. Instead, it’s a union of subspaces, where each subspace is a plane defined by two coordinate axes (like the x-y plane, x-z plane, etc.).
Null Space
The null space of a matrix A, denoted N(A), is the set of all vectors z that give the zero vector when multiplied by A.
N(A)={z:Az=0}
Relevance in CS: In compressed sensing, if you have two different sparse signals, x and x′, that produce the same measurement y (i.e., Ax=Ax′=y), then their difference, h=x−x′, must be in the null space of A because A(x−x′)=0. To guarantee unique recovery, the null space of A should not contain any sparse vectors (other than the zero vector).
Rn (n-dimensional Euclidean Space)
Definition: The symbol Rn refers to the n-dimensional Euclidean space. It is the set of all vectors that consist of n real-number elements.
Breakdown of the Notation:
The R stands for the set of all real numbers.
The superscript n indicates the dimension, meaning each vector in this set is an ordered list of n real numbers, like [x1,x2,...,xn].
Context from the Text: The chapter states that signals can be viewed as vectors in an n-dimensional Euclidean space, which is denoted by Rn. For example, R2 is the 2D plane, and R3 is 3D space.
Basis
Definition: A set of vectors {ϕi}i=1n is called a basis for Rn if the vectors in the set span Rn and are linearly independent. This means two things:
Span the space: The vectors in the set can be scaled and added in some combination to create any vector in the entire space.
Linearly Independent: There are no redundant vectors in the set. No vector in the basis can be created from a combination of the others.
Analogy: A basis is like a set of fundamental directions for a space. For the 2D plane (R2), the standard basis is the pair of vectors [1, 0] (the x-direction) and [0, 1] (the y-direction). You can get to any point on the plane by moving some amount in the x-direction and some amount in the y-direction.
Dimension
Definition: The dimension of a vector space is the number of vectors in any of its bases.
Key Property: For any given vector space, although there can be many different sets of basis vectors, every basis for that space will have the exact same number of vectors. This unique number defines the dimension of the space.
For example, any basis for the 2D plane (R2) must have exactly two vectors, so its dimension is 2.
A line passing through the origin is a 1-dimensional subspace because its basis consists of only one vector.
Rank
The rank of a matrix is the maximum number of linearly independent columns (or, equivalently, rows) in the matrix. It represents the dimension of the subspace spanned by its columns.
Relevance in CS: The concept of rank is central to the low-rank matrix model. Just as a sparse vector has few non-zero entries, a low-rank matrix has “few” linearly independent columns/rows, meaning it contains less information than its ambient dimensions suggest. This structure is another type of simplicity that can be exploited for recovery from limited data.
Affine space
An affine space is essentially a vector subspace that has been shifted so that it does not necessarily pass through the origin.
Here’s a breakdown:
Vector Subspace: In R2, a one-dimensional vector subspace is a straight line that must pass through the origin (the point (0,0)).
Affine Space: An affine space is a generalization. It can be a point, a line, or a plane that does not need to go through the origin. It’s created by taking a vector subspace and adding a fixed vector to every point in it, effectively “translating” it.
The “one-dimensional affine space A” is illustrated in Figure 1.2 as a straight line that does not pass through the origin. The goal is to find the point on this line that is closest to the signal vector x.
The l0 “norm” counts the number of non-zero elements in a vector: ∣∣x∣∣0:=∣supp(x)∣
where:
supp(x) is the “support” of the vector x, which is the set of indices of its non-zero elements, i.e., {i:xi=0}.
∣⋅∣ denotes the cardinality (the number of elements) of the set.
Inner Product
The standard inner product in Rn is defined as:
⟨x,z⟩=zTx=i=1∑nxizi
where:
zT denotes the transpose of the vector z.
Basis and Frames
Basis: A set of vectors {ϕi}i=1n is a basis for Rn if they span the space and are linearly independent. Any vector x has a unique representation x=∑i=1nciϕi. In matrix form, this is x=Φc, where:
Φ is the n×n matrix whose columns are the basis vectors ϕi.
c is the vector containing the coefficients ci.
Orthonormal Basis: A basis is orthonormal if ⟨ϕi,ϕj⟩={1,0,i=ji=j.
Frame: A set of vectors {ϕi}i=1n in Rd (d<n) is a frame if for any vector x∈Rd, the following holds for constants 0<A≤B<∞:
A∣∣x∣∣22≤∣∣ΦTx∣∣22≤B∣∣x∣∣22
A and B are called the frame bounds. If A=B, the frame is tight.
Dual Frame: A frame Φ~ is a dual frame of Φ if ΦΦ~T=Φ~ΦT=I, where I is the identity matrix. The canonical dual frame is given by Φ~=(ΦΦT)−1Φ, where (⋅)−1 denotes the matrix inverse.
Signal Models
Sparse and Compressible Signals
k-sparse signal: A signal x is k-sparse if it has at most k non-zero entries, i.e., ∣∣x∣∣0≤k. The set of all k-sparse signals is denoted by Σk={x:∣∣x∣∣0≤k}.
Compressible signal: A signal is compressible if it can be well-approximated by a sparse signal. The approximation error is given by:
σk(x)p=x^∈Σkmin∣∣x−x^∣∣p
Here, x^ is the sparse approximation of the signal x from the set Σk.
Power Law Decay: A signal is compressible if its sorted coefficients ∣ci∣ decay according to a power law:
∣ci∣≤C1i−q
C1 and q are positive constants.
Union of Subspaces and Low-Rank Models
Union of Subspaces: A signal x lies in a union of M subspaces if:
x∈U=i=1⋃MUi
Ui are the individual subspaces, and M is the total number of subspaces in the union.
Low-Rank Matrix: The set of low-rank matrices is defined as:
Lr={M∈Rn1×n2:rank(M)≤r}
M is a matrix of size n1×n2.
r is the maximum rank of the matrices in the set.
Signal Recovery
l1 Minimization
This involves solving one of the following optimization problems:
Noise-free case (Basis Pursuit):
x^=argzmin∣∣z∣∣1subject toAz=y
Bounded noise case:
x^=argzmin∣∣z∣∣1subject to∣∣Az−y∣∣2≤ϵ
Dantzig Selector:
x^=argzmin∣∣z∣∣1subject to∣∣AT(Az−y)∣∣∞≤λ
LASSO (unconstrained form):
x^=argzmin21∣∣Az−y∣∣22+λ∣∣z∣∣1
In these formulas:
x^ is the estimated signal.
argminz finds the vector z that minimizes the given objective function.
ϵ is a constant representing the upper bound on the norm of the noise.
λ is a regularization parameter that balances the trade-off between the data fidelity term (∣∣Az−y∣∣22) and the sparsity term (∣∣z∣∣1).
Recovery Guarantees
Theorem 1.8 (Noise-Free Recovery): If A satisfies the RIP of order 2k with δ2k<2−1, the solution x^ to the noise-free l1 minimization problem obeys:
∣∣x^−x∣∣2≤C0kσk(x)1
Theorem 1.9 (Bounded Noise): Under similar RIP conditions and with noise e where ∣∣e∣∣2≤ϵ, the solution obeys:
∣∣x^−x∣∣2≤C0kσk(x)1+C2ϵ
In these theorems:
C0 and C2 are constants that depend on the RIP constant δ2k.
e represents a noise vector corrupting the measurements.
Multiple Measurement Vector (MMV) Problem
This problem involves recovering a set of l sparse vectors {xi}i=1l that share a common support. These vectors form the columns of a matrix X which is row-sparse.
Uniqueness Condition (Theorem 1.15): A row-sparse matrix X is uniquely determined by Y=AX if:
∣supp(X)∣<2spark(A)−1+rank(X)
Y is the m×l matrix of measurements, with columns being the measurements yi for each signal xi.
lp,q Norms: Used for matrix recovery, defined as ∣∣X∣∣p,q=(∑i∣∣xi∣∣pq)1/q, where xi is the i-th row of the matrix X.
Understand matrix properties in compressed sensing
Coherence
The coherence μ(A) of a matrix A is the largest absolute inner product between any two distinct normalized columns ai,aj:
μ(A)=1≤i<j≤nmax∣∣ai∣∣2∣∣aj∣∣2∣⟨ai,aj⟩∣
ai and aj are columns of the matrix A.
Excellent request. The chapter introduces several properties of the sensing matrix A, each with different strengths, guarantees, and requirements. Organizing them is the best way to see the relationships between them.
Here is an organization of the sensing matrix properties discussed in the chapter, from the most fundamental to the most powerful.
1. Spark
This is the most fundamental condition for guaranteeing a unique sparse solution.
Definition
The spark of a matrix A, denoted spark(A) is the smallest number of columns of A that are linearly dependent.
I find the definition on Wikipedia is more precise: https://en.wikipedia.org/wiki/Spark_(mathematics)
The spark of an m×n matrix A is the smallest integer k, such that there exists a set of k columns of A that are linearly dependent
What it Guarantees
Theorem 1.1: For any vector y∈Rm, there is at most one signal x∈Σk such that y=Ax if and only if spark(A)>2k.
It provides the definitive “if and only if” condition for the unique recovery of exactly k-sparse signals in a noise-free setting.
Key Requirement
For unique recovery of any k-sparse signal, the condition is spark(A)>2k.
Requirement on m: It’s easy to see that spark(A)∈[2,m+1]. This condition immediately implies that the number of measurements m, i.e., m≥2k.
The NSP is a more refined condition on the null space of A that is necessary for robust recovery.
Definition
A matrix A has the Null Space Property of order k if for a constant C>0, the following holds for all h∈N(A) and all index sets Λ with ∣Λ∣≤k:
∣∣hΛ∣∣2≤Ck∣∣hΛc∣∣1
N(A) is the null space of matrix A, which is the set of all vectors z for which Az=0.
Λ is a subset of indices {1,2,...,n}.
Λc is the complement of Λ.
hΛ is a vector formed by keeping the entries of h indexed by Λ and setting all other entries to zero.
hΛc is a vector formed by keeping the entries of h indexed by Λc and setting all other entries to zero.
In other words, a matrix A satisfying the NSP of order k means that vectors in its null space are not overly concentrated on a small set of indices.
What it Guarantees
The NSP is presented as a necessary condition for any stable recovery algorithm.
Theorem 1.2: Let A:Rn→Rm denote a sensing matrix and Δ:Rm→Rn denote an arbitrary recovery algorithm. If the pair (A,Δ) satisfies
∥Δ(Ax)−x∥2≤Ckσk(x)1
then A satisfies the NSP of order 2k.
It shows that if any algorithm can robustly recover signals from compressive measurements, then the matrix Amust satisfy the NSP of order 2k .
Practicality
Like the spark, it is generally hard to verify directly for a given matrix.
3. Restricted Isometry Property (RIP)
The RIP is the most powerful property discussed, forming the basis for many of the strongest theoretical guarantees in compressed sensing.
Definition
A matrix A satisfies the RIP of order k if there exists a δk∈(0,1) such that for all k-sparse signals x∈Σk:
(1−δk)∣∣x∣∣22≤∣∣Ax∣∣22≤(1+δk)∣∣x∣∣22
δk is the “restricted isometry constant” of order k.
In other words, a matrix A satisfies the RIP of order k if it approximately preserves the Euclidean length of all k-sparse vectors.
What it Guarantees
The RIP is a sufficient condition for many different algorithms (like l1 minimization) to provide stable and robust recovery of sparse signals, even in the presence of noise.
Key Requirements & Properties
1. Necessity for Stability
C-stable definition: Let A:Rn→Rm denote a sensing matrix and Δ:Rm→Rn denote a recovery algorithm. We say that the pair (A,Δ) is C-stable if for any x∈Σk and any e∈Rm we have that
∥Δ(Ax+e)−x∥2≤C∥e∥2
This definition simply says that if we add a small amount of noise to the measurements, then the impact of this on the recovered signal should not be arbitrarily large.
Theorem 1.3: If a pair (A,Δ) is C-stable, then
C1∥x∥2≤∥Ax∥2
for all x∈Σ2k.
Theorem 1.3 shows that RIP’s lower bound is necessary for any recovery algorithm to be stable against measurement noise.
2. Lower Bound on m
Theorem 1.4: Let A be an m×n matrix that satisfies the RIP of order 2k with constant δ2k∈(0,21]. Then
m≥Cklog(kn)
where C is a constant depending only on δ2k.
The restriction to δ2k≤1/2 is arbitrary and is made merely for convenience - minor modifications to the argument establish bounds for δ2k≤δmax for any δmax<1.
Theorem 1.4 provides a precise lower bound: m>Cklog(n/k) (i.e. m=O(klog(n/k))). This is necessary to satisfy the RIP and tells us the fundamental minimum number of measurements needed.
3. Relationship to NSP
Theorem 1.5: Suppose that A satisfies the RIP of order 2k with δ2k<2−1. Then A satisfies the NSP of order 2k with constant
C=1−(1+2)δ2k2
RIP is a stronger condition than NSP. Theorem 1.5 shows that if a matrix satisfies the RIP (with a sufficiently small constant), it automatically satisfies the NSP.
Practicality
It is computationally infeasible to verify the RIP for large matrices. However, its main power comes from the fact that random matrices can be proven to satisfy the RIP with very high probability when m is on the order of klog(n/k).
4. Coherence
Coherence is the most practical and easily computable property, though it often leads to stricter requirements.
Definition
The coherence μ(A) of a matrix A is the largest absolute inner product between any two distinct normalized columns ai,aj:
μ(A)=1≤i<j≤nmax∣∣ai∣∣2∣∣aj∣∣2∣⟨ai,aj⟩∣
ai and aj are columns of the matrix A.
What it Guarantees
It provides simple, checkable conditions for unique recovery and stability.
Key Requirements & Properties
1. Uniqueness Condition
Theorem 1.7: If
k<21(1+μ(A)1)
then for each measurement vector y∈Rm there exists at most one signal x∈Σk such that y=Ax.
This uses coherence to provide a sufficient condition for uniqueness. This often requires k to be much smaller than what RIP would allow.
It is possible to show that the coherence of a matrix is always in the range μ(A)∈[m(n−1)n−m,1] ; the lower bound is known as the Welch bound
Theorem 1.7, together with the Welch bound, provides an upper bound on the level of sparsity k that guarantees uniqueness using coherence: k=O(m).
2. Relationship to Spark
Lemma 1.4: For any matrix A,
spark(A)≥1+μ(A)1
Coherence provides a lower bound on the spark.
3. Relationship to RIP
Lemma 1.5: If A has unit-norm columns and coherence μ=μ(A), then A satisfies the RIP of order k with δk=(k−1)μ for all k<1/μ.
Coherence can be used to establish a (sometimes loose) RIP constant for a matrix .
Practicality
Coherence is the only property listed that is easy to compute for any given matrix.
Summary Table
Property
Definition
What it Guarantees
Key Requirement(s) on k or m
Practicality
Spark
Smallest # of linearly dependent columns
Necessary and sufficient condition: uniqueness for k-sparse signals
spark(A)>2k (implies m>=2k)
Hard to compute
NSP
Null space vectors are not “sparse”
Necessary for any robust/stable recovery
Must hold for order 2k
Hard to verify directly
RIP
Near-isometry for sparse vectors
Sufficient for stable/robust recovery for many algorithms
m=O(klog(n/k))
Hard to verify, but some random matrices highly likely satisfy it
Coherence (μ)
Max inner product between columns
Uniqueness and stability, but with stricter conditions
k may be small, typically O(1/μ)
Easy to compute
Understand different error bounds of signal recovery via l1 minimization
Here is an organization of the signal recovery theorems for l1 minimization, grouped by the main assumption made about the sensing matrix A.
Group 1: Guarantees Based on the Restricted Isometry Property (RIP)
These theorems rely on the Restricted Isometry Property (RIP), a strong theoretical condition ensuring that the matrix A nearly preserves the length of sparse vectors.
1.1 The Ideal Case: Noise-Free Recovery
This is the baseline scenario where measurements are perfect.
Theorem:Theorem 1.8
Conditions:A satisfies the RIP of order 2k, and the measurements are exact (y=Ax).
Algorithm: Basis Pursuit (min∣∣z∣∣1 subject to Az=y).
Error Bound: The recovery error is bounded by how “compressible” the original signal x is:
∣∣x^−x∣∣2≤C0kσk(x)1
If the signal x is truly k-sparse, then its approximation error σk(x)1 is zero, and the recovery is exact (∣∣x^−x∣∣2=0).
1.2 The Realistic Case: Bounded Noise
This scenario assumes the measurement noise is contained within a known energy level.
Theorem:Theorem 1.9
Conditions:A satisfies RIP. The measurement noise e is bounded by ∣∣e∣∣2≤ϵ.
Algorithm: A quadratically constrained version of Basis Pursuit (min∣∣z∣∣1 subject to ∣∣Az−y∣∣2≤ϵ).
Error Bound: The error has two parts: one from the signal’s compressibility and one from the noise:
∣∣x^−x∣∣2≤C0kσk(x)1+C2ϵ
1.3 A Specific Noise Model: Gaussian Noise
This is a very common scenario in practice. There are two main approaches.
Theorem:Corollary 1.1 (derived from Theorem 1.9)
Conditions:A satisfies RIP, and the noise is i.i.d. Gaussian.
Guarantee: The error is proportional to the number of measurements m and the noise variance σ: ∣∣x^−x∣∣2∝mσ.
Theorem:Corollary 1.2 (derived from the Dantzig Selector, Theorem 1.10)
Conditions:A has unit-norm columns, satisfies RIP, and the noise is i.i.d. Gaussian.
Guarantee: The error is proportional to the sparsity k, the dimension n, and the noise variance σ: ∣∣x^−x∣∣2∝klognσ. This is often a much tighter and more useful bound, as k log n is typically much smaller than m.
Group 2: Guarantees Based on Coherence
These theorems rely on coherence (μ), which measures the maximum correlation between any two columns of A. Coherence is easier to calculate than RIP but often leads to stricter conditions on the signal’s sparsity k.
Theorem:Theorem 1.11
Conditions:A has a certain coherence μ, and the sparsity k must be small relative to it (k<(1/μ+1)/4).
Error Bound: Provides a worst-case error bound for bounded noise:
∣∣x−x^∣∣2≤1−μ(4k−1)∣∣e∣∣2+ϵ
Theorem:Theorem 1.12 (using the LASSO algorithm)
Conditions:A has coherence μ, the sparsity k is small (k≤1/(3μ)), and the noise is Gaussian.
Error Bound: In addition to an l2 error bound, this provides a powerful guarantee on recovering the correct sparse structure: with high probability, the recovered non-zero entries are a subset of the true non-zero entries (supp(x^)⊂supp(x)).
Group 3: Theorems on the Nature of the Error Bounds
This final group explains why the error bounds from the RIP-based theorems look the way they do.
The “Pessimistic” Result:
Theorem:Theorem 1.13
What it says: It is impossible for any recovery algorithm to provide a “simple” error bound of the form ∣∣x^−x∣∣2≤Cσk(x)2 unless you take a huge number of measurements (m≈n). This explains why the theorems use the more complicated σk(x)1/k term.
The “Optimistic” Result (with Randomness):
Theorem:Theorem 1.14
What it says: This result shows a way around the limitation of Theorem 1.13. If you use a randomized sensing matrix A, you can achieve the “simple” error bound ∣∣x^−x∣∣2≤Cσk(x)2 with high probability, while still using very few measurements. This highlights a key advantage of using random matrices in compressed sensing.
Deterministic or Probabilistic Guarantees
Key Recovery Formulations
The following theorems provide guarantees for solutions to sparse recovery problems. These problems are typically formulated as optimizations, with the goal of finding a signal x that is both sparse and consistent with the measurements y. The chapter focuses on the following formulations.
The Ideal l0 Minimization (Computationally Intractable)
This is the direct statement of finding the sparsest solution, but it is NP-hard and cannot be solved efficiently for large problems .
x^=argzmin∣∣z∣∣0subject toz∈B(y)(1.10)
The Constrained l1 Minimization (Basis Pursuit-style)
This is the practical, convex relaxation of the l0 problem and is the focus of most of the theorems. The constraint set B(y) changes based on the assumptions about noise.
x^=argzmin∣∣z∣∣1subject toz∈B(y)(1.12)
The Unconstrained l1 Minimization (LASSO)
This is an alternative, unconstrained formulation that is equivalent to the constrained version for a specific choice of the regularization parameter λ .
x^=argzmin21∣∣Az−y∣∣22+λ∣∣z∣∣1(1.15)
Note that for (1.15) some choice of the parameter λ this optimization problem will yield the same result as the constrained version of the problem given by (1.12) with B(y)={z:∣∣Az−y∣∣2≤ϵ}.
However, in general the value of λ which makes these problems equivalent is unknown a priori.
Types of Guarantees
The chapter presents two main flavors of guarantees:
Deterministic Guarantees: These are absolute “if-then” statements. If you have a matrix A with a specific property (like RIP), then a certain error bound is guaranteed to hold. The challenge is that deterministically constructing a matrix A with these properties for large systems can be very difficult.
Probabilistic Guarantees: These state that an error bound will hold with high probability, where the probability comes from the random choice of the matrix A or the random nature of the noise. The most common approach in compressed sensing is to choose the matrix A randomly. The guarantee then is that with very high probability, the chosen matrix will have the desired properties (like RIP) and the recovery will be successful.
Group 1: Guarantees Based on the Restricted Isometry Property (RIP)
These theorems rely on the RIP, a strong theoretical condition ensuring the matrix A nearly preserves the length of sparse vectors.
1.1 The Ideal Case: Noise-Free Recovery
This is the baseline scenario where measurements are perfect.
Theorem 1.8
Suppose that A satisfies the RIP of order 2k with δ2k<2−1 and we obtain measurements of the form y=Ax. Then when B(y)={z:Az=y}, the solution x^ to (1.12) obeys
∣∣x^−x∣∣2≤C0kσk(x)1
where
C0=21−(1+2)δ2k1−(1−2)δ2k
Type of Guarantee: Deterministic.
Conditions:A satisfies the RIP of order 2k, and the measurements are exact (y=Ax).
Error Bound
∣∣x^−x∣∣2≤C0kσk(x)1
If the signal x is truly k-sparse, then its approximation error σk(x)1 is zero, and the recovery is exact (∣∣x^−x∣∣2=0).
1.2 The Realistic Case: Bounded Noise
This scenario assumes the measurement noise is contained within a known energy level.
Theorem 1.9
Suppose that A satisfies the RIP of order 2k with δ2k<2−1 and let y=Ax+e where ∣∣e∣∣2≤ϵ. Then when B(y)={z:∣∣Az−y∣∣2≤ϵ} the solution x^ to (1.12) obeys
Conditions:A satisfies RIP. The measurement noise e is bounded by ∣∣e∣∣2≤ϵ.
Error Bound
The error has two parts: one from the signal’s compressibility and one from the noise.
∣∣x^−x∣∣2≤C0kσk(x)1+C2ϵ
Theorem 1.10
Suppose that A satisfies the RIP of order 2k with δ2k<2−1 and we obtain measurements of the form y=Ax+e where ∣∣ATe∣∣∞≤λ. Then when B(y)={z:∣∣AT(Az−y)∣∣∞≤λ} the solution x^ to (1.12) obeys
Conditions:A satisfies RIP. The noise is bounded such that ∣∣ATe∣∣∞≤λ.
Algorithm: The Dantzig Selector (Eq. 1.12 with B(y)={z:∣∣AT(Az−y)∣∣∞≤λ}).
Error Bound:∣∣x^−x∣∣2≤C0kσk(x)1+C3kλ
1.3 A Specific Noise Model: Gaussian Noise
This is a very common scenario in practice. There are two main approaches.
Corollary 1.1 (derived from Theorem 1.9)
Suppose that A satisfies the RIP of order 2k with δ2k<2−1. Furthermore, suppose that x∈Σk and that we obtain measurements of the form y=Ax+e where the entries of e are i.i.d. N(0,σ2). Then when B(y)={z:∣∣Az−y∣∣2≤2mσ}, the solution x^ to (1.12) obeys
∣∣x^−x∣∣2≤81−(1+2)δ2k1+δ2kmσ
with probability at least 1−exp(−c0m), where c0>0 is a constant.
Type of Guarantee: Probabilistic (over the random draw of the noise).
Conditions:A satisfies RIP, noise is i.i.d. Gaussian, and x is k-sparse.
Error Bound
The error is proportional to the sqrt of number of measurements m and the noise variance
∣∣x^−x∣∣2≤81−(1+2)δ2k1+δ2kmσ
Corollary 1.2 (derived from the Dantzig Selector, Theorem 1.10)
Suppose that A has unit-norm columns and satisfies the RIP of order 2k with δ2k<2−1. Furthermore, suppose that x∈Σk and that we obtain measurements of the form y=Ax+e where the entries of e are i.i.d. N(0,σ2). Then when B(y)={z:∣∣AT(Az−y)∣∣∞≤2lognσ}, the solution x^ to (1.12) obeys
∣∣x^−x∣∣2≤421−(1+2)δ2k1+δ2kklognσ
with probability at least 1−n1.
Type of Guarantee: Probabilistic (over the random draw of the noise).
Conditions:A satisfies RIP, has unit-norm columns, noise is i.i.d. Gaussian, and x is k-sparse.
Error Bound
The error is proportional to the sqrt of sparsity k, the sqrt of log of dimension n, and the noise variance σ: ∣∣x^−x∣∣2∝klognσ. This might be a more useful bound, as klogn may be much smaller than m.
∣∣x^−x∣∣2≤421−(1+2)δ2k1+δ2kklognσ
Group 2: Guarantees Based on Coherence
These theorems rely on coherence (μ), which measures the maximum correlation between any two columns of A. Coherence is easier to calculate than RIP but often leads to stricter conditions on the signal’s sparsity k (the signal should be sparser).
Theorem 1.11
Suppose that A has coherence μ and that x∈Σk with k<(1/μ+1)/4. Furthermore, suppose that we obtain measurements of the form y=Ax+e. Then when B(y)={z:∣∣Az−y∣∣2≤ϵ}, the solution x^ to (1.12) obeys
∣∣x−x^∣∣2≤1−μ(4k−1)∣∣e∣∣2+ϵ
Type of Guarantee: Deterministic.
Conditions:k is small (k<(1/μ+1)/4), and x is k-sparse.
Error Bound
∣∣x−x^∣∣2≤1−μ(4k−1)∣∣e∣∣2+ϵ
Provides a worst-case analysis error bound for bounded noise, will typically overestimate the actual error.
Theorem 1.12
Suppose that A has coherence μ and that x∈Σk with k≤1/(3μ). Furthermore, suppose that we obtain measurements of the form y=Ax+e where the entries of e are i.i.d. N(0,σ2). Set
λ=8σ2(1+α)log(n−k)
for some fairly small value α>0. Then with probability exceeding
(1−(n−k)α1)(1−exp(−k/7))
the solution x^ to (1.15) is unique, supp(x^)⊂supp(x) and has the following error bound.
Type of Guarantee: Probabilistic (over the random draw of the noise).
Conditions:k is small (k≤1/(3μ)), noise is i.i.d. Gaussian, and x is k-sparse.
Algorithm: LASSO (Eq. 1.15).
Error Bound
∣∣x^−x∣∣22≤(3+32(1+α)log(n−k))2kσ2
In addition to an l2 error bound, this provides a powerful guarantee on recovering the correct sparse structure: with high probability, the recovered non-zero entries are a subset of the true non-zero entries (supp(x^)⊂supp(x)).
Group 3: Theorems on the Nature of the Error Bounds
This final group explains why the error bounds from the RIP-based theorems look the way they do.
Theorem 1.13 (The “Pessimistic” Result)
Suppose that A is an m×n matrix and that Δ:Rm→Rn is a recovery algorithm that satisfies
∣∣x−Δ(Ax)∣∣2≤Cσk(x)2
for some k≥1, then m>(1−1−1/C2)n.
Type of Guarantee: Deterministic.
What it says: It is impossible for any algorithm with a deterministic matrix A to satisfy the “simple” error bound ∣∣x^−x∣∣2≤Cσk(x)2 unless you take a huge number of measurements (m≈n). This explains why the theorems use σk(x)1/k term.
Theorem 1.14 (The “Optimistic” Result with Randomness)
Let x∈Rn be fixed. Set δ2k<2−1. Suppose that A is an m×n sub-gaussian random matrix with m=O(klog(n/k)/δ2k2). Suppose we obtain measurements of the form y=Ax. Set ϵ=2σk(x)2. Then with probability exceeding 1−2exp(−c1δ2k2m)−exp(−c0m), when B(y)={z:∣∣Az−y∣∣2≤ϵ}, the solution x^ to (1.12) obeys the following bound.
Type of Guarantee: Probabilistic (over the random choice of the matrix A).
What it says: If you use a random sensing matrix A, you can achieve the “simple” error bound with high probability, using far fewer measurements.
An “instance-optimal” guarantee means the quality of the recovery error bound adapts to the specific signal instance x that you are measuring. Instead of a single worst-case error for all signals, the error bound is a function of the properties of that particular signal, such as how compressible it is (e.g. measured by σk(x)1 or σk(x)2).
In deterministic cases, examples are Theorem 1.8, Theorem 1.9, Theorem 1.10.
This concept is then combined with probabilistic guarantees in two different ways.
The Two “Flavors” of Probabilistic Guarantees
The distinction comes down to whether one “good” random matrix A works for all signals, or if the guarantee only applies to a specific signal-matrix pair.
1. The Stronger Guarantee: A “Universal” Random Matrix
This is the typical approach.
The Process: You generate a random matrix Aonce.
The Guarantee: With very high probability, this single matrix A you generated is “good.” A “good” matrix is one that satisfies a deterministic guarantee (like the RIP-based theorems) for all possible signalsx.
2. The Weaker Guarantee: “Instance-Optimal in Probability”
This is what Theorem 1.14 provides.
The Process: You are first given a specific signal instance x. Then, you generate a random matrix A to measure it.
The Guarantee: With high probability, the recovery of that specific signalx using that specific matrixA will be successful. The guarantee is probabilistic and only applies to the instance you started with. It doesn’t promise anything about how that same matrix A would perform on a different signal.
In summary, “instance-optimal in probability” is the weaker type of probabilistic guarantee where the success probability is tied to a specific signal instance, and you might need to draw a new random matrix for each new signal.
Non Instance-Optimal Guarantees
Examples are Corollaries 1.1, 1.2 and Theorems 1.11, 1.12.
These guarantees provide a single, uniform error bound that applies to the entire class of k-sparse signals. As long as x is k-sparse, the error bound is the same regardless of which specific k-sparse signal it is. The bound depends on parameters like m, n, k, and the noise level, but not on the structure of x beyond its k-sparsity.
Posted onInsw-hw
,
杂Disqus: Word count in article: 236Reading time ≈1 mins.
只想用tab来确认vs code的代码建议/补全,回车只用来换行。
编辑markdown或者latex的时候,代码提示总是会干扰正常码字,因为回车就会确认代码提示的单词,完全不是想要的功能,于是找了找怎么只用tab确认代码提示。
Settings > Text Editor > Suggestions > Accept Suggestions On Enter > Select on, off or smart https://stackoverflow.com/a/72427355