You can edit almost every page by Creating an account and confirming your email.

Convolutional Sparse Coding: Difference between revisions

From EverybodyWiki Bios & Wiki
WikiMasterBot2 (talk | contribs)
m WikiMasterBot2 moved page Convolutional Sparse Coding to Kept on Wikipedia:Convolutional Sparse Coding without leaving a redirect: move to NS Kept on Wikipedia
WikiMasterBot2 (talk | contribs)
m Restauration de la révision ID 505297 (plus long contenu)
Line 1: Line 1:
{{Kept on Wikipedia}}
{{AFC submission|d|not|u=Renanar2|ns=118|decliner=AngusWOOF|declinets=20190206192059|reason2=context|ts=20181221024535}} <!-- Do not remove this line! -->
 
{{AFC comment|1=This reads like a research paper summary. Wikipedia is not for scientific journal postings [[WP:NOTJOURNAL]].  Please add context as to how this paradigm was founded and researched. There is no history on any of that as it stands. [[User:AngusWOOF|<strong><span style="color: #606060;">AngusWOOF</span></strong>]] ([[User talk:AngusWOOF#top|<span style=" color: #663300;">bark</span>]] • [[Special:Contributions/AngusWOOF|<span style="color: #006600;">sniff</span>]]) 19:20, 6 February 2019 (UTC)}}
 
----
 
The '''Convolutional Sparse Coding''' paradigm is an extension of the global Sparse Coding model, in which a redundant dictionary is modeled as a concatenation of circulant matrices. While the global sparsity constraint describes signal <math display="inline">\mathbf{x}\in \mathbb{R}^{N}</math> as a linear combination of a few atoms in the redundant dictionary <math display="inline">\mathbf{D}\in\mathbb{R}^{N\times M}, M\gg N</math>, usually expressed as <math display="inline">\mathbf{x}=\mathbf{D}\mathbf{\Gamma}</math> for a sparse vector <math display="inline">\mathbf{\Gamma}\in \mathbb{R}^{M}</math>, the alternative dictionary structure adopted by the Convolutional Sparse Coding model allows the sparsity prior to be applied '''locally''' instead of globally: independent patches of <math display="inline">\mathbf{x}</math> are generated by "local" dictionaries operating over stripes of <math display="inline">\mathbf{\Gamma}</math>.
 
The local sparsity constraint allows stronger uniqueness and stability conditions than the global sparsity prior, and has shown to be a versatile tool for inverse problems in fields such as ''Image Understanding'' and ''Computer Vision''. Also, a recently proposed multi-layer extension of the model has shown conceptual benefits for more complex signal decompositions, as well as a tight connection the ''Convolutional Neural Networks'' model, allowing a deeper understanding of how the latter operates.
 
= Overview =
 
Given a signal of interest <math display="inline">\mathbf{x}\in \mathbb{R}^{N}</math> and a redundant ''dictionary'' <math display="inline">\mathbf{D}\in\mathbb{R}^{N\times M}, M\gg N</math>, the ''Sparse Coding'' problem consist of retrieving a sparse vector <math display="inline">\mathbf{\Gamma}\in \mathbb{R}^{M}</math>, denominated the ''Sparse representation'' of <math display="inline">\mathbf{x}</math>, such that <math display="inline">\mathbf{x}= \mathbf{D}\mathbf{\Gamma}</math>. Intuitively, this implies <math display="inline">\mathbf{x}</math> is expressed as a linear combination of a small number of elements in <math display="inline">\mathbf{D}</math>. The global sparsity constraint prior has been shown to be useful in many ill-posed inverse problems such as ''image inpainting'', ''super-resolution'', and ''coding''.<ref>{{cite journal |last1=Jianchao Yang |last2=Wright |first2=John |last3=Huang |first3=Thomas S |last4=Yi Ma |title=Image Super-Resolution Via Sparse Representation |journal=IEEE Transactions on Image Processing |date=November 2010 |volume=19 |issue=11 |pages=2861–2873 |doi=10.1109/TIP.2010.2050625|pmid=20483687 |bibcode=2010ITIP...19.2861Y }}</ref><ref>{{cite journal |last1=Wetzstein |first1=Gordon |last2=Heidrich |first2=Wolfgang |last3=Heide |first3=Felix |title=Fast and Flexible Convolutional Sparse Coding |date=2015 |pages=5135–5143 |url=https://www.cv-foundation.org/openaccess/content_cvpr_2015/html/Heide_Fast_and_Flexible_2015_CVPR_paper.html}}</ref><ref>{{cite journal |last1=Wohlberg |first1=Brendt |title=SPORCO: A Python package for standard and convolutional sparse representations |journal=Proceedings of the 16th Python in Science Conference |date=2017 |pages=1–8 |doi=10.25080/shinma-7f4c6e7-001 |url=http://conference.scipy.org/proceedings/scipy2017/brendt_wohlberg.html}}</ref>. It has been of particular interest for ''image understanding'' and ''computer vision'' tasks involving natural images, allowing redundant dictionaries to be efficiently inferred <ref>{{cite journal |last1=Mairal |first1=Julien |last2=Bach |first2=Francis |last3=Ponce |first3=Jean |last4=Sapiro |first4=Guillermo |title=Online Dictionary Learning for Sparse Coding |journal=Proceedings of the 26th Annual International Conference on Machine Learning |date=2009 |pages=689–696 |doi=10.1145/1553374.1553463 |url=https://dl.acm.org/citation.cfm?id=1553463 |publisher=ACM|isbn=9781605585161 }}</ref><ref name="papyan_2017_working">{{cite journal |last1=Papyan |first1=Vardan |last2=Sulam |first2=Jeremias |last3=Elad |first3=Michael |title=Working Locally Thinking Globally: Theoretical Guarantees for Convolutional Sparse Coding |journal=IEEE Transactions on Signal Processing |date=1 November 2017 |volume=65 |issue=21 |pages=5687–5701 |doi=10.1109/TSP.2017.2733447|bibcode=2017ITSP...65.5687P |arxiv=1707.06066 }}</ref><ref name="wohlberg_2016_convolutional">{{cite journal |last1=Wohlberg |first1=Brendt |title=Convolutional sparse representation of color images |journal=2016 IEEE Southwest Symposium on Image Analysis and Interpretation (SSIAI) |pages=57–60 |date=6-8 March 2016 |doi=10.1109/SSIAI.2016.7459174 |url=https://ieeexplore.ieee.org/document/7459174|isbn=978-1-4673-9919-7 }}</ref>
 
As an extension to the global sparsity constraint, recent pieces in the literature have revisited the model to reach a more profound understanding of its uniqueness and stability conditions.<ref name="wohlberg_2016_convolutional"/>. Interestingly, by imposing a local sparsity prior in <math display="inline">\mathbf{\Gamma}</math>, meaning that its independent patches can be interpreted as sparse vectors themselves, the structure in <math display="inline">\mathbf{D}</math> can be understood as a “local" dictionary operating over each independent patch. This model extension is denominated ''Convolutional Sparse Coding'' (CSC) and drastically reduces the burden of estimating signal representations while being characterized by stronger uniqueness and stability conditions. Furthermore, it allows for <math display="inline">\mathbf{\Gamma}</math> to be efficiently estimated via projected gradient descent algorithms such as ''Orthonormal Matching Pursuit'' and ''Basis Pursuit'', while performing in a local fashion<ref name="papyan_2017_working" />
 
Besides its versatility in inverse problems, recent efforts have focused on the multi-layer version of the model and provided evidence of its reliability for recovering multiple underlying representations<ref name="papyan_2017_convolutional">{{cite journal |last1=Papyan |first1=Vardan |last2=Romano |first2=Yaniv |last3=Elad |first3=Michael |title=Convolutional Neural Networks Analyzed via Convolutional Sparse Coding |journal=J. Mach. Learn. Res. |date=2017 |volume=18 |issue=1 |pages=2887–2938 |url=http://dl.acm.org/citation.cfm?id=3122009.3176827 |issn=1532-4435|bibcode=2016arXiv160708194P |arxiv=1607.08194 }}</ref>. Moreover, a tight connection between such a model and the well-established ''Convolutional Neural Network'' model (CNN) was revealed, providing a new tool for a more rigurous understanding of its theoretical conditions.
 
The convolutional sparse coding model provides a very efficient set of tools to solve a wide range of inverse problems, including image denoising, image inpainting, and image superresolution. By imposing local sparsity constraints, it allows to efficiently tackle the global coding problem by iteratively estimating disjoint patches and assembling them into a global signal. Furthermore, by adopting a multi-layer sparse model, which results from imposing the sparsity constraint to the signal inherent representations themselves, the resulting “Layered" Pursuit algorithm keeps the strong uniqueness and stability conditions from the single-layer model. This extension also provides some interesting notions about the relation between its sparsity prior and the forward pass of the ''Convolutional Neural Network,'' which allows to understand how the theoretical benefits of the CSC model can provide a strong mathematical meaning of the CNN structure.
 
= Sparse Coding Paradigm =
 
Basic concepts and models are presented to explain into detail the ''Convolutional Sparse Representation'' framework. On the grounds that the sparsity constraint has been proposed under different models, a short description of them is presented to show its evolution up to the model of interest. Also included are the concepts of ''Mutual Coherence'' and ''Restricted Isometry Property'' to establish uniqueness stability guarantees.
 
== Global Sparse Coding Model ==
 
Allow signal <math display="inline">\mathbf{x}\in \mathbb{R}^{N}</math> to be expressed as a linear combination of a small number of atoms from a given dictionary <math display="inline">\mathbf{D}\in \mathbb{R}^{N \times M}, M>N</math>. Alternatively, the signal can be expressed as <math display="inline">\mathbf{x}=\mathbf{D}\mathbf{\Gamma}</math>, where <math display="inline">\mathbf{\Gamma}\in \mathbb{R}^{M}</math> corresponds to the ''sparse representation'' of <math display="inline">\mathbf{x}</math>, which selects the atoms to combine and their weights. Subsequently, given <math display="inline">\mathbf{D}</math>, the task of recovering <math display="inline">\mathbf{\Gamma}</math> from either the noise-free signal itself or an observation is denominated ''sparse coding''. Considering the noise-free scenario, the coding problem is formulated as follows: <math display="block">\begin{aligned}
      \hat{\mathbf{\Gamma}}_{\text{ideal}}&= \underset{\mathbf{\Gamma}}{\text{argmin}}\; \| \mathbf{\Gamma}\|_{0}\; \text{s.t.}\; \mathbf{D}\mathbf{\Gamma}=\mathbf{x}.\end{aligned}</math> The effect of the <math display="inline">\ell_{0}</math> norm is to favor solutions with as much zero elements as possible. Furthermore, given an observation affected by bounded energy noise: <math display="inline">\mathbf{Y}= \mathbf{D}\mathbf{\Gamma}+ \mathbf{E},\|\mathbf{E}\|_{2}<\epsilon</math>, the pursuit problem is reformulated as: <math display="block">\begin{aligned}
      \hat{\mathbf{\Gamma}}_{\text{noise}}&= \underset{\mathbf{\Gamma}}{\text{argmin}}\; \| \mathbf{\Gamma}\|_{0}\; \text{s.t.}\; \|\mathbf{D}\mathbf{\Gamma}-\mathbf{Y}\|_{2}<\epsilon.\end{aligned}</math>
 
=== Stability and Uniqueness Guarantees for the Global Sparse Model ===
 
Let the ''spark'' of <math display="inline">\mathbf{\mathbf{D}}</math> be defined as the minimum number of linearly independent columns: <math display="block">\begin{aligned}
      \sigma(\mathbf{D})=\underset{\mathbf{\Gamma}}{\text{min}} \quad \|\mathbf{\Gamma}\|_{0} \quad \text{s.t.}\quad \mathbf{D \Gamma}=0, \quad \mathbf{\Gamma}\neq 0.\end{aligned}</math>
 
Then, from the ''triangular inequality'', the sparsest vector <math display="inline">\mathbf{\Gamma}</math> satisfies: <math display="inline">\|\mathbf{\Gamma}\|_{0}<\frac{\sigma(\mathbf{D})}{2}</math>. Although the ''spark'' provides an upper bound, it is unfeasible to compute in practical scenarios. Instead, let the ''mutual coherence'' be a measure of similarity between atoms in <math display="inline">\mathbf{D}</math>. Assuming <math display="inline">\ell_{2}</math>-norm unit atoms, the ''mutual coherence'' of <math display="inline">\mathbf{D}</math> is defined as: <math display="inline">\mu(\mathbf{D})=\underset{i\neq j}{\text{max}}\|\mathbf{d_{i}^{T}}\mathbf{d_{j}}\|_{2}</math>, where <math display="inline">\mathbf{d}_{i}</math> are atoms. Based on this metric, it can be proven that the true sparse representation <math display="inline">\mathbf{\Gamma}^{*}</math> can be recovered if and only if <math display="inline">\|\mathbf{\Gamma}^{*}\|_{0}< \frac{1}{2}\big(1+\frac{1}{\mu(\mathbf{D})} \big)</math>.
 
Similarly, under the presence of noise, an upper bound for the distance between the true sparse representation <math display="inline">\mathbf{\Gamma^{*}}</math> and its estimation <math display="inline">\hat{\mathbf{\Gamma}}</math> can be established via the ''Restricted Isometry Property'' (''RIP''). A ''k-RIP'' matrix <math display="inline">\mathbf{D}</math> with constant <math display="inline">\delta_{k}</math> corresponds to: <math display="inline">(1-\delta_{k})\|\mathbf{\Gamma}\|_{2}^{2}\leq \|\mathbf{D\Gamma}\|_{2}^{2}\leq (1+\delta_{k})\|\mathbf{\Gamma}\|_{2}^{2}</math>, where <math display="inline">\delta_{k}</math> is the smallest number that satisfies the inequality for every <math display="inline">\|\mathbf{\Gamma}\|_{0}=k</math>. Then, assuming <math display="inline">\|\mathbf{\Gamma}\|_{0}<\frac{1}{2}\big(1+\frac{1}{\mu(\mathbf{D})} \big)</math>, it is guaranteed that <math display="inline">\|\mathbf{\hat{\Gamma}-\Gamma^{*}}\|_{2}^{2}\leq \frac{4\epsilon^{2}}{1-\mu(\mathbf{D})(2\|\mathbf{\Gamma}\|_{0}-1)}</math>.
 
Solving such a general pursuit problem is a hard task if no structure is imposed on dictionary <math display="inline">\mathbf{D}</math>. This implies learning large, highly overcomplete representations, which is extremely expensive. Assuming such a burden has been met and a representative dictionary has been obtained for a given signal <math display="inline">\mathbf{x}</math>, typically based on prior information, <math display="inline">\mathbf{\Gamma}^{*}</math> can be estimated via several pursuit algorithms.
 
=== Pursuit Algorithms for the Global Sparse Model ===
 
Two basic methods for solving the ''Global Sparse Coding'' problem are ''Orthogonal Matching Pursuit'' (''OMP'') and ''Basis Pursuit'' (''BP'') . ''OMP'' is a greedy algorithm that iteratively selects the atom best correlated with the residual between <math display="inline">\mathbf{x}</math> and a current estimation, followed by a projection onto a subset of pre-selected atoms. On the other hand, ''BP'' is a more sophisticated approach that replaces the original coding problem by a linear programming problem. Based on this algorithms, the ''Global Sparse Coding'' provides considerably loose bounds for the uniqueness and stability of <math display="inline">\hat{\mathbf{\Gamma}}</math>. To overcome this, additional priors are imposed over <math display="inline">\mathbf{D}</math> to guarantee tighter bounds and uniqueness conditions. The reader is referred to (<ref name="papyan_2017_working" />, section 2) for details regarding this properties.
 
== Convolutional Sparse Coding Model ==
 
A ''local'' prior is adopted such that each overlapping section of <math display="inline">\mathbf{\Gamma}</math> is sparse. Let <math display="inline">\mathbf{D}\in \mathbb{R}^{N \times Nm}</math> be constructed from shifted versions of a local dictionary <math display="inline">\mathbf{D_{L}}\in\mathbb{R}^{n \times m}, m\ll M</math>. Then, <math display="inline">\mathbf{x}</math> is formed by products between <math display="inline">\mathbf{D_{L}}</math> and local patches of <math display="inline">\mathbf{\Gamma}\in\mathbb{R}^{mN}</math>.
 
[[File:Structure of the Convolutional Sparse Coding Paradigm.svg|thumb|The global dictionary is expressed in terms of a stride convolutional matrix. So, signals can be generated in terms of stripes of the sparse representation multiplies by a shift-invariant local dictionary.]]
 
From the latter, <math display="inline">\mathbf{\Gamma}</math> can be re-expressed by <math display="inline">N</math> disjoint sparse vectors <math display="inline">\alpha_{i}\in \mathbb{R}^{m}</math>: <math display="inline">\mathbf{\Gamma}\in \{\alpha_{1},\alpha_{2},\dots, \alpha_{N}\}^{T}</math>. Similarly, let <math display="inline">\gamma</math> be a set of <math display="inline">(2n-1)</math> consecutive vectors <math display="inline">\alpha_{i}</math>. Then, each disjoint segment in <math display="inline">\mathbf{x}</math> is expressed as: <math display="inline">\mathbf{x}_{i}=\mathbf{R}_{i}\mathbf{D}\mathbf{\Gamma}</math>, where operator <math display="inline">\mathbf{R}_{i}\in \mathbb{R}^{n\times N}</math> extracts overlapping patches of size <math display="inline">n</math> starting at index <math display="inline">i</math>. Thus, <math display="inline">\mathbf{R}_{i}\mathbf{D}</math> contains only <math display="inline">(2n-1)m</math> nonzero columns. Hence, by introducing operator <math display="inline">\mathbf{S}_{i}\in \mathbf{R}^{(2n-1)m \times Nm}</math> which exclusively preserves them: <math display="block">\begin{aligned}
      \mathbf{x}_{i}&= \underset{\Omega}{\underbrace{\mathbf{R}_{i}\mathbf{D}\mathbf{S}_{i}^{T}}}\underset{\gamma_{i}}{\underbrace{(S_{i}\mathbf{\Gamma})}},\end{aligned}</math> where <math display="inline">\Omega</math> is known as the ''stripe dictionary'', which is independent of <math display="inline">i</math>, and <math display="inline">\gamma_{i}</math> is denominated the ''i-th stripe''. So, <math display="inline">\mathbf{x}</math> corresponds to a patch aggregation or ''convolutional interpretation'': <math display="block">\begin{aligned}
      \mathbf{x}&= \sum_{i=1}^{N}\mathbf{R}_{i}^{T}\mathbf{D}_{L}\alpha_{i}= \sum_{i=1}^{m}\mathbf{d}_{i}\ast \mathbf{z_{i}}.\end{aligned}</math> Where <math display="inline">\mathbf{d}_{i}</math> corresponds to the ''i-th'' atom from the local dictionary <math display="inline">\mathbf{D}_{L}</math> and <math display="inline">\mathbf{z_{i}}</math> is constructed by elements of patches <math display="inline">\alpha</math>: <math display="inline">\mathbf{z_{i}}\triangleq (\alpha_{1,i}, \alpha_{2,i},\dots, \alpha_{N,i})^{T}</math>. Given the new dictionary structure, let the <math display="inline">\ell_{0,\infty}</math> pseudo-norm be defined as: <math display="inline">\|\mathbf{\Gamma}\|_{0,\infty}\triangleq \underset{i}{\text{ max}}\; \|\gamma_{i}\|_{0}</math>. Then, for the noise-free and noise-corrupted scenarios, the problem can be respectively reformulated as: <math display="block">\begin{aligned}
      \hat{\mathbf{\Gamma}}_{\text{ideal}}&= \underset{\mathbf{\Gamma}}{\text{argmin}}\; \| \mathbf{\Gamma}\|_{0,\infty}\; \text{s.t.}\; \mathbf{D}\mathbf{\Gamma}=\mathbf{x},\\
      \hat{\mathbf{\Gamma}}_{\text{noise}}&= \underset{\mathbf{\Gamma}}{\text{argmin}}\; \| \mathbf{\Gamma}\|_{0,\infty}\; \text{s.t.}\; \|\mathbf{Y}-\mathbf{D}\mathbf{\Gamma}\|_{2}<\epsilon.\end{aligned}</math>
 
=== Stability and Uniqueness Guarantees for the Convolutional Sparse Model ===
 
For the local approach, <math display="inline">\mathbf{D}</math> mutual coherence satisfies: <math display="inline">\mu(\mathbf{D})\geq \big(\frac{m-1}{m(2n-1)-1}\big)^{1/2}.</math> So, if a solution obeys <math display="inline">\|\mathbf{\Gamma}\|_{0,\infty}< \frac{1}{2}\big(1+\frac{1}{\mu(\mathbf{D})}\big)</math>, then it is the sparsest solution to the <math display="inline">\ell_{0,\infty}</math> problem. Thus, under the local formulation, the same number of non-zeros is permitted for each stripe instead of the full vector!
 
Similar to the global model, the ''CSC'' is solved via ''OMP'' and ''BP'' methods, the latter contemplating the use of the ''Iterative Shrinkage Thresholding Algorithm'' (''ISTA'')<ref>{{cite journal |last1=Beck |first1=Amir |last2=Teboulle |first2=Marc |title=A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems |journal=SIAM Journal on Imaging Sciences |date=January 2009 |volume=2 |issue=1 |pages=183–202 |doi=10.1137/080716542 }}</ref>  for splitting the pursuit into smaller problems. Based on the <math display="inline">\ell_{0,\infty}</math> pseudonorm, if a solution <math display="inline">\mathbf{\Gamma}</math> exists satisfying <math display="inline">\|\mathbf{\Gamma}\|_{0,\infty}<\frac{1}{2}\big(1+\frac{1}{\mu(\mathbf{D})} \big)</math>, then both methods are guaranteed to recover it. Moreover, the local model guarantees recovery independently of the signal dimension, as opposed to the <math display="inline">\ell_{0}</math> prior. Stability conditions for ''OMP'' and ''BP'' are also guaranteed if its ''Exact Recovery Condition'' (''ERC'') is met for a support <math display="inline">\mathcal{T}</math> with a constant <math display="inline">\theta</math>. The ''ERC'' is defined as: <math display="inline">\theta= 1-\underset{i\notin \mathcal{T}}{\text{max}} \|\mathbf{D}_{\mathcal{T}}^{\dagger}\mathbf{d}_{i}\|_{1}>0</math>, where <math display="inline">\dagger</math> denotes the Pseudo-inverse. Algorithm 1 shows the Global Pursuit method based on ISTA.
 
'''Algorithm 1: 1D CSC via Local Iterative Soft-Thresholding.'''
 
''Input:''
 
<math display="inline">\mathbf{D}_{L}</math>: Local Dictionary,
 
<math display="inline">\mathbf{y}</math>: observation,
 
<math display="inline">\lambda</math>: Regularization parameter,
 
<math display="inline">c</math>: step size for <code>ISTA</code>,
 
<code>tol</code>: tolerance factor,
 
<code>maxiters</code>: maximum number of iterations.<br />
 
:::<math display="inline">\{\boldsymbol{\alpha}_{i}\}^{(0)}\gets \{\mathbf{0}_{N\times 1}\}</math> (Initialize disjoint patches.)
 
:::<math display="inline">\{\mathbf{r}_{i}\}^{(0)}\gets \{\mathbf{R}_{i}\mathbf{y}\}</math> (Initialize residual patches.)
 
:::<math display="inline">k\gets 0</math><br />
 
'''Repeat'''
 
:::<math display="inline">\{\boldsymbol{\alpha}_{i}\}^{(k)}\gets \mathcal{S}_{\frac{\lambda}{c}}\big( \{\boldsymbol{\alpha}_{i}\}^{(k-1)}+\frac{1}{c}\{\mathbf{D}_{L}^{T}\mathbf{r}_{i}\}^{(k-1)} \big)</math> (Coding along disjoint patches)
 
:::<math display="inline">\boldsymbol{\alpha}_{i}</math> <math display="inline">\hat{\mathbf{x}}^{(k)}\gets \sum_{i}\mathbf{R}_{i}^{T}\mathbf{D}_{L}\boldsymbol{\alpha}_{i}^{(k)}</math> (Patch Aggregation)
 
:::<math display="inline">\{\mathbf{r}_{i}\}^{(k)}\gets \mathbf{R}_{i}\big( \mathbf{y}-\hat{\mathbf{x}}^{(k)} \big)</math> (Update Residuals)
 
:::<math display="inline">k \gets k+ 1</math>
 
'''Until''' <math display="inline">\|\hat{\mathbf{x}}^{(k)}- \hat{\mathbf{x}}^{(k-1)}\|_{2}<</math> <code>tol</code> or <math display="inline">k></math> <code>maxiters</code>.
 
= Multi-Layered Convolutional Sparse Coding Model =
 
By imposing the sparsity prior in the inherent structure of <math display="inline">\mathbf{x}</math>, strong conditions for a unique representation and feasible methods for estimating it are granted. Similarly, such a constraint can be applied to its representation itself, generating a cascade of sparse representations: Each code is defined by a few atoms of a given set of convolutional dictionaries.
 
Based on this criteria, yet another extension denominated ''Multi-layer Convolutional Sparse Coding'' (''ML-CSC'') is proposed. A set of analytical dictionaries <math display="inline">\{\mathbf{D}\}_{k=1}^{K}</math> can be efficiently designed, where sparse representations at each layer <math display="inline">\{\mathbf{\Gamma}\}_{k=1}^{K}</math> are guaranteed by imposing the sparsity prior over the dictionaries themselves <ref name="papyan_2017_convolutional" />. In other words, by considering dictionaries to be ''stride convolutional matrices'' i.e. atoms of the local dictionaries shift <math display="inline">m</math> elements instead of a single one, where <math display="inline">m</math> corresponds to the number of channels in the previous layer, it is guaranteed that the <math display="inline">\|\mathbf{\Gamma}\|_{0,\infty}</math> norm of the representations along layers is bounded.
 
For example, given the dictionaries <math display="inline">\mathbf{D}_{1} \in \mathbb{R}^{N\times Nm_{1}}, \mathbf{D}_{2} \in \mathbb{R}^{Nm_{1}\times Nm_{2}}</math>, the signal is modeled as <math display="inline">\mathbf{D}_{1}\mathbf{\Gamma}_{1}= \mathbf{D}_{1}(\mathbf{D}_{2}\mathbf{\Gamma}_{2})</math>, where <math display="inline">\mathbf{\Gamma}_{1}</math> is its sparse code, and <math display="inline">\mathbf{\Gamma}_{2}</math> is the sparse code of <math display="inline">\mathbf{\Gamma}_{1}</math>. Then, the estimation of each representation is formulated as an optimization problem for both noise-free and noise-corrupted scenarios, respectively. Assuming <math display="inline">\mathbf{\Gamma}_{0}=\mathbf{x}</math>: <math display="block">\begin{aligned}
      \text{Find}\; \{\mathbf{\Gamma}_{i}\}_{i=1}^{K}\;\text{s.t.}&\; \mathbf{\Gamma}_{i-1}=\mathbf{D}_{i}\mathbf{\Gamma}_{i},\; \|\mathbf{\Gamma}_{i}\|_{0,\infty}\leq \lambda_{i}\\
      \text{Find}\; \{\mathbf{\Gamma}_{i}\}_{i=1}^{K}\; \text{s.t.} &\;\|\mathbf{\Gamma}_{i-1}-\mathbf{D}_{i}\mathbf{\Gamma}_{i}\|_{2}\leq \varepsilon_{i},\; \|\mathbf{\Gamma}_{i}\|_{0,\infty}\leq \lambda_{i}\end{aligned}</math>
 
In what follows, theoretical guarantees for the uniqueness and stability of this extended model are described.
 
'''Theorem 1:''' (Uniqueness of Sparse Representations) Consider signal <math display="inline">\mathbf{x}</math> satisfies the (''ML-CSC'') model for a set of convolutional dictionaries <math display="inline">\{\mathbf{D}_{i}\}_{i=1}^{K}</math> with mutual coherence <math display="inline">\{\mu(\mathbf{D}_{i})\}_{i=1}^{K}</math>. If the true sparse representations satisfy <math display="inline">\{\mathbf{\Gamma}\}_{i=1}^{K}<\frac{1}{2}\big(1+\frac{1}{\mu(\mathbf{D}_{i})}\big)</math>, then a solution to the problem <math display="inline">\{\hat{\mathbf{\Gamma}_{i}}\}_{i=1}^{K}</math> will be its unique solution if the thresholds are chosen to satisfy: <math display="inline">\lambda_{i}<\frac{1}{2}\big(1+\frac{1}{\mu(\mathbf{D}_{i})} \big)</math>.
 
'''Theorem 2:''' (Global Stability of the noise-corrupted scenario) Consider signal <math display="inline">\mathbf{x}</math> satisfies the (''ML-CSC'') model for a set of convolutional dictionaries <math display="inline">\{\mathbf{D}_{i}\}_{i=1}^{K}</math> is contaminated with noise <math display="inline">\mathbf{E}</math>, where <math display="inline">\|\mathbf{E}\|_{2}\leq \epsilon_{0}</math>. resulting in <math display="inline">\mathbf{Y=X+E}</math>. If <math display="inline">\lambda_{i}<\frac{1}{2}\big(1+\frac{1}{\mu(\mathbf{D}_{i})}\big)</math> and <math display="inline">\epsilon_{i}^{2}=\frac{4\epsilon_{i-1}^{2}}{1-(2\|\mathbf{\Gamma}_{i}\|_{0,\infty}-1)\mu(\mathbf{D}_{i})}</math>, then the estimated representations <math display="inline">\{\mathbf{\Gamma}_{i}\}_{i=1}^{K}</math> satisfy the following: <math display="inline">\|\mathbf{\Gamma}_{i}-\hat{\mathbf{\Gamma}}_{i}\|_{2}^{2}\leq \epsilon_{i}^{2}</math>.
 
== Projection-based Algorithms ==
 
As a simple approach for solving the ''ML-CSC'' problem, either via the <math display="inline">\ell_{0}</math> or <math display="inline">\ell_{1}</math> norms, is by computing inner products between <math display="inline">\mathbf{x}</math> and the dictionary atoms to identify the most representatives ones. Such a projection is described as: <math display="block">\begin{aligned}
      \hat{\mathbf{\Gamma}}_{\ell_{p}}&= \underset{\mathbf{\Gamma}}{\text{argmin}}\;\frac{1}{2}\|\mathbf{\Gamma}-\mathbf{D}^{T}\mathbf{x}\|_{2}^{2}+\beta\|\mathbf{\Gamma}\|_{p}& p\in\{0,1\},\end{aligned}</math>
 
which have closed-form solutions via the hard-thresholding <math display="inline">\mathcal{H}_{\beta}(\mathbf{D}^{T}\mathbf{x})</math> and soft-thresholding algorithms <math display="inline">\mathcal{S}_{\beta}(\mathbf{D}^{T}\mathbf{x})</math>, respectively. If a nonnegative constraint is also contemplated, the problem can be expressed via the <math display="inline">\ell_{1}</math> norm as: <math display="block">\begin{aligned}
      \hat{\mathbf{\Gamma}}&= \underset{\mathbf{\Gamma}}{\text{argmin}}\; \frac{1}{2}\|\mathbf{\Gamma}-\mathbf{D}^{T}\mathbf{x}\|_{2}^{2}+\beta\|\mathbf{\Gamma}\|_{1},\; \text{ s.t.}\; \mathbf{\Gamma}\geq 0,\end{aligned}</math> which closed-form solution corresponds to the soft nonnegative thresholding operator <math display="inline">\mathcal{S}_{\beta}^{+}(\mathbf{D}^{T}\mathbf{x})</math>, where <math display="inline">\mathcal{S}_{\beta}^{+}(z)\triangleq \max(z-\beta,0)</math>. Guarantees for the Layered soft-thresholding approach are included in the Appendix (Section [[#sec_appendix02|6.2]]).
 
'''Theorem 3''': (Stable Recovery of the Multi-layered Soft-thresholding Algorithm) Consider signal <math display="inline">\mathbf{x}</math> that satisfies the (''ML-CSC'') model for a set of convolutional dictionaries <math display="inline">\{\mathbf{D}_{i}\}_{i=1}^{K}</math> with mutual coherence <math display="inline">\{\mu(\mathbf{D}_{i})\}_{i=1}^{K}</math> is contaminated with noise <math display="inline">\mathbf{E}</math>, where <math display="inline">\|\mathbf{E}\|_{2}\leq \epsilon_{0}</math>. resulting in <math display="inline">\mathbf{Y=X+E}</math>. Denote by <math display="inline">|\mathbf{\Gamma}_{i}^{\text{min}}|</math> and <math display="inline">|\mathbf{\Gamma}_{i}^{\text{max}}|</math> the lowest and highest entries in <math display="inline">\mathbf{\Gamma}_{i}</math>. Let <math display="inline">\{\hat{\mathbf{\Gamma}}_{i}\}_{i=1}^{K}</math> be the estimated sparse representations obtained for <math display="inline">\{\beta_{i}\}_{i=1}^{K}</math>. If <math display="inline">\|\mathbf{\Gamma}_{i}\|_{0,\infty}<\frac{1}{2}\big(1+\frac{1}{\mu(\mathbf{D}_{i})}\frac{|\mathbf{\Gamma}_{i}^{\text{min}}|}{|\mathbf{\Gamma}_{i}^{\text{min}}|}\big)-\frac{1}{\mu(\mathbf{D}_{i})}\frac{\epsilon_{i-1}}{|\mathbf{\Gamma}_{i}^{\text{max}}|}</math> and <math display="inline">\beta_{i}</math> is chosen according to: <math display="block">\begin{aligned}
        \|\mathbf{\Gamma}_{i}\|_{0,\infty}^{s}<\frac{1}{2}\big( 1+\frac{1}{\mu(\mathbf{D}_{i})}\frac{|\mathbf{\Gamma}_{i}^{\text{min}}|}{|\mathbf{\Gamma}_{i}^{\text{max}}|} \big)-\frac{1}{\mu(\mathbf{D}_{i})}\frac{\epsilon_{i-1}}{|\mathbf{\Gamma}_{i}^{\text{max}}|}
      \end{aligned}</math> Then, <math display="inline">\hat{\mathbf{\Gamma}}_{i}</math> has the same support as <math display="inline">\mathbf{\Gamma}_{i}</math>, and <math display="inline">\|\mathbf{\Gamma}_{i}-\hat{\mathbf{\Gamma}_{i}}\|_{2,\infty}\leq \epsilon_{i}</math>, for <math display="inline">\epsilon_{i}=\sqrt{\|\mathbf{\Gamma}_{i}\|_{0,\infty}}(\epsilon_{i-1}+\mu(\mathbf{D}_{i})(\|\mathbf{\Gamma}_{i}\|_{0,\infty}-1)|\mathbf{\Gamma}_{i}^{\text{max}}|+\beta_{i})</math>
 
== Connections to Convolutional Neural Networks ==
 
Recall the ''Forward Pass'' of the ''Convolutional Neural Network'' (''CNN'') model, used in both training and inference steps. Let <math display="inline">\mathbf{x}\in \mathbb{R}^{Mm_{1}}</math> be its input and <math display="inline">\mathbf{W}_{k}\in\mathbb{R}^{N\times m_{1}}</math> the filters at layer <math display="inline">k</math>, which are followed by the ''Rectified Linear Unit'' <math display="inline">\text{ReLU}(\mathbf{x})= \max(0, x)</math>, for bias <math display="inline">\mathbf{b}\in \mathbb{R}^{Mm_{1}}</math>. Based on this elementary block, taking <math display="inline">K=2</math> as example, the ''CNN'' output can be expressed as: <math display="block">\begin{aligned}
      \mathbf{Z}_{2}&= \text{ReLU}\big(\mathbf{W}_{2}^{T}\; \text{ReLU}(\mathbf{W}_{1}^{T}\mathbf{x})+\mathbf{b}_{1})+\mathbf{b}_{2}\;\big).\end{aligned}</math> Finally, comparing the ''CNN'' algorithm and the Layered thresholding approach for the nonnegative constaint, it is straightforward to show that both are equivalent: <math display="block">\begin{aligned}
      \hat{\mathbf{\Gamma}}&= \mathcal{S}^{+}_{\beta_{2}}\big(\mathbf{D}_{2}^{T}\mathcal{S}^{+}_{\beta_{1}}(\mathbf{D}_{1}^{T}\mathbf{x}) \big)\\
      &= \text{ReLU}\big(\mathbf{W}_{2}^{T} \text{ReLU}(\mathbf{W}_{1}^{T}\mathbf{x}+\beta_{1})+\beta_{2}\big).\end{aligned}</math>
 
[[File:CNN Convolutional Layers.svg|thumb|Convolutional Layers from the Forward Pass Algorithm]]
 
[[File:ReLU and Nonnegative Soft Thresholding Functions.svg|thumb|Contrast between the Rectified Linear Unit function and the Nonnegative Soft Thresholding Pointwise Nonlinearities]]
 
As explained in what follows, this naive approach of solving the coding problem is a particular case of a more stable projected gradient descent algorithm for the ''ML-CSC'' model. Equiped with the stability conditions of both approaches, a more clear understanding about the class of signals a ''CNN'' can recover, under what noise conditions can an estimation be accurately attained, and how can its structure be modified to improve its theoretical conditions. The reader is referred to (<ref name="papyan_2017_convolutional"/>, section 5) for details regarding their connection.
 
== Pursuit Algorithms for the Multi-layer CSC Model ==
 
A crucial limitation of the ''Forward Pass'' is it being unable to recover the unique solution of the ''DCP'' problem, which existence has been demonstrated. So, instead of using a thresholding approach at each layer, a full pursuit method is adopted, denominated ''Layered Basis Pursuit'' (''LBP''). Considering the projection onto the <math display="inline">\ell_{1}</math> ball, the following problem is proposed: <math display="block">\begin{aligned}
      \hat{\mathbf{\Gamma}}_{i}&= \underset{\mathbf{\Gamma}_{i}}{\text{argmin}}\; \frac{1}{2}\|\mathbf{D}_{i}\mathbf{\Gamma}_{i}-\hat{\mathbf{\Gamma}}_{i}\|_{2}^{2}+\; \xi_{i}\|\mathbf{\Gamma}_{i}\|_{1},\end{aligned}</math> where each layer is solved as an independent ''CSC'' problem, and <math display="inline">\xi_{i}</math> is proportional to the noise level at each layer. Among the methods for solving the layered coding problem, ''ISTA'' is an efficient decoupling alternative. In what follows, a short summary of the guarantees for the ''LBP'' are established.
 
'''Theorem 4''': (Recovery guarantee) Consider a signal <math display="inline">\mathbf{x}</math> characterized by a set of sparse vectors <math display="inline">\{\mathbf{\Gamma}_{i}\}_{i=1}^{K}</math>, convolutional dictionaries <math display="inline">\{\mathbf{D}_{i}\}_{i=1}^{K}</math> and their corresponding mutual coherences <math display="inline">\{\mu\big(\mathbf{D}_{i}\big)\}_{i=1}^{K}</math>. If <math display="inline">\|\mathbf{\Gamma}_{i}\|_{0,\infty}<\frac{1}{2}\big(1+\frac{1}{\mu(\mathbf{D}_{i})}\big)</math>, then the LBP algorithm is guaranteed to recover the sparse representations.
 
'''Theorem 5''': (Stability in the presence of noise) Consider the contaminated signal <math display="inline">\mathbf{Y}=\mathbf{X+E}</math>, where <math display="inline">\|\mathbf{E}\|_{0,\infty}\leq \epsilon_{0}</math> and <math display="inline">\mathbf{x}</math> is characterized by a set of sparse vectors <math display="inline">\{\mathbf{\Gamma}_{i}\}_{i=1}^{K}</math> and convolutional dictionaries <math display="inline">\{\mathbf{D}_{i}\}_{i=1}^{K}</math>. Let <math display="inline">\{\hat{\mathbf{\Gamma}}_{i}\}_{i=1}^{K}</math> be solutions obtained via the LBP algorithm with parameters <math display="inline">\{\xi\}_{i=1}^{K}</math>. If <math display="inline">\|\mathbf{\Gamma}_{i}\|_{0,\infty}<\frac{1}{3}\big(1+\frac{1}{\mu(\mathbf{D}_{i})}\big)</math> and <math display="inline">\xi_{i}=4\epsilon_{i-1}</math>, then: (i) The support of the solution <math display="inline">\hat{\mathbf{\Gamma}}_{i}</math> is contained in that of <math display="inline">\mathbf{\Gamma}_{i}</math>, (ii) <math display="inline">\|\mathbf{\Gamma}_{i}-\hat\mathbf{\Gamma}_{i}\|_{2,\infty}\leq \epsilon_{i}</math>, and (iii) Any entry greater in absolute value than <math display="inline">\frac{\epsilon_{i}}{\sqrt{\|\mathbf{\Gamma}_{i}\|_{0\infty}}}</math> is guaranteed to be recovered.
 
= Applications of the Convolutional Sparse Coding Model: Image Inpainting =
 
As a practical example, an efficient ''image inpainting'' method for color images via the ''CSC'' model is shown<ref name="wohlberg_2016_convolutional" />. Consider the three-channel dictionary <math display="inline">\mathbf{D} \in \mathbb{R}^{N \times M \times 3}</math>, where <math display="inline">\mathbf{d}_{c,m}</math> denotes the <math display="inline">m</math>-th atom at channel <math display="inline">c</math>, represents signal <math display="inline">\mathbf{x}</math> by a single cross-channel sparse representation <math display="inline">\mathbf{\Gamma}</math>, with stripes denoted as <math display="inline">\mathbf{z}_{i}</math>. Given an observation <math display="inline">\mathbf{y}=\{\mathbf{y}_{r}, \mathbf{y}_{g}, \mathbf{y}_{b}\}</math>, where randomly chosen channels at unknown pixel locations are fixed to zero, in a similar way to ''impulse noise'', the problem is formulated as: <math display="block">\begin{aligned}
      \{\mathbf{\hat{z}}_{i}\}&=\underset{\{\mathbf{z}_{i}\}}{\text{argmin}}\frac{1}{2}\sum_{c}\bigg\|\sum_{i}\mathbf{d}_{c,i}\ast \mathbf{z}_{i} -\mathbf{y}_{c}\bigg\|_{2}^{2}+\lambda \sum_{i}\|\mathbf{z}_{i}\|_{1}.\end{aligned}</math> By means of ''ADMM''<ref>{{cite journal |last1=Boyd |first1=Stephen |title=Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers |journal=Foundations and Trends in Machine Learning |date=2010 |volume=3 |issue=1 |pages=1–122 |doi=10.1561/2200000016 |issn=1935-8237}}</ref>, the cost function is decoupled into simpler sub-problems, allowing an efficient <math display="inline">\mathbf{\Gamma}</math> estimation. Algorithm 2 describes the procedure, where <math display="inline">\hat{D}_{c,m}</math> is the DFT representation of <math display="inline">D_{c,m}</math>, the convolutional matrix for the term <math display="inline">\mathbf{d}_{c,i}\ast \mathbf{z}_{i}</math>. Likewise, <math display="inline">\hat{\mathbf{x}}_{m}</math> and <math display="inline">\hat{\mathbf{z}}_{m}</math> correspond to the DFT representations of <math display="inline">\mathbf{x}_{m}</math> and <math display="inline">\mathbf{z}_{m}</math>, respectively, <math display="inline">\mathcal{S}_{\beta}(.)</math> corresponds to the Soft-thresholding function with argument <math display="inline">\beta</math>, and the <math display="inline">\ell_{1,2}</math> norm is defined as the <math display="inline">\ell_{2}</math> norm along the channel dimension <math display="inline">c</math> followed by the <math display="inline">\ell_{1}</math> norm along the spatial dimension <math display="inline">m</math>. The reader is referred to (<ref name="wohlberg_2016_convolutional" />, Section II) for details on the ''ADMM'' implementation and the dictionary learning procedure.
 
'''Algorithm 2: Color image inpainting via the Convolutional Sparse Coding model.'''
 
''Input:''
 
<math display="inline">\hat{\mathbf{D}}_{c,m}</math>: DFT of convolutional matrices <math display="inline">\mathbf{D}_{c,m}</math>,
 
<math display="inline">\mathbf{y}=\{\mathbf{y}_{r},\mathbf{y}_{g},\mathbf{y}_{b}\}</math>: Color observation,
 
<math display="inline">\lambda</math>: Regularization parameter,
 
<math display="inline">\{\mu, \rho\}</math>: step sizes for <code>ADMM</code>,
 
<code>tol</code>: tolerance factor,
 
<code>maxiters</code>: maximum number of iterations.<br />
 
:::<math display="inline">k\gets k+1</math>
 
'''Repeat'''
 
:::<math display="inline">\{\hat{\mathbf{z}}_{m}\}^{(k+1)}\gets\underset{\{\hat{\mathbf{x}}_{m}\}}{\text{argmin}}\;\frac{1}{2}\sum_{c}\big\|\sum_{m}\hat{\mathbf{D}}_{c,m} \hat{\mathbf{z}}_{m}-\hat{\mathbf{y}}_{c} \big\|+\frac{\rho}{2}\sum_{m}\|\hat{\mathbf{z}}_{m}- (\hat{\mathbf{y}}_{m}+\hat{\mathbf{u}}_{m}^{(k)})\|_{2}^{2}.</math><br />
 
:::<math display="inline">\{\mathbf{y}_{c,m}\}^{(k+1)}\gets \underset{\{\mathbf{y}_{c,m}\}}{\text{argmin}}\;\lambda \sum_{c}\sum_{m}\|\mathbf{y}_{c,m}\|_{1}+\mu\|\{\mathbf{x}_{c,m}^{(k+1)}\}\|_{2,1}+\frac{\rho}{2}\sum_{m}\|\mathbf{z}_{m}^{(k+1)}- (\mathbf{y}+\mathbf{u}_{m}^{(k)})\|_{2}^{2}.</math><br />
 
:::<math display="inline">\mathbf{y}_{m}^{(k+1)}=\mathcal{S}_{\lambda/\rho}\big( \mathbf{x}_{m}^{(k+1)}+\mathbf{u}_{m}^{(k)} \big).</math><br />
 
:::<math display="inline">k \gets k+1</math>
 
'''Until''' <math display="inline">\|\{\mathbf{z}_{m}\}^{(k+1)}-\{\mathbf{z}_{m}\}^{(k)}\|_{2}< </math><code>tol</code> or <math display="inline">i></math> <code>maxiters</code>.
 
== References ==
{{reflist|colwidth=30em}}
 
== External Links ==
<ref>{{cite web |title=SPORCO |url=http://brendt.wohlberg.net/software/SPORCO/ |website=SParse Optimization Research COde (SPORCO)}}</ref>
{{Source Wikipedia}}
{{Kept on Wikipedia|Biggest version}}

Revision as of 12:00, 29 December 2024




The Convolutional Sparse Coding paradigm is an extension of the global Sparse Coding model, in which a redundant dictionary is modeled as a concatenation of circulant matrices. While the global sparsity constraint describes signal 𝐱N as a linear combination of a few atoms in the redundant dictionary 𝐃N×M,MN, usually expressed as 𝐱=𝐃Γ for a sparse vector ΓM, the alternative dictionary structure adopted by the Convolutional Sparse Coding model allows the sparsity prior to be applied locally instead of globally: independent patches of 𝐱 are generated by "local" dictionaries operating over stripes of Γ.

The local sparsity constraint allows stronger uniqueness and stability conditions than the global sparsity prior, and has shown to be a versatile tool for inverse problems in fields such as Image Understanding and Computer Vision. Also, a recently proposed multi-layer extension of the model has shown conceptual benefits for more complex signal decompositions, as well as a tight connection the Convolutional Neural Networks model, allowing a deeper understanding of how the latter operates.

Overview

Given a signal of interest 𝐱N and a redundant dictionary 𝐃N×M,MN, the Sparse Coding problem consist of retrieving a sparse vector ΓM, denominated the Sparse representation of 𝐱, such that 𝐱=𝐃Γ. Intuitively, this implies 𝐱 is expressed as a linear combination of a small number of elements in 𝐃. The global sparsity constraint prior has been shown to be useful in many ill-posed inverse problems such as image inpainting, super-resolution, and coding.[1][2][3]. It has been of particular interest for image understanding and computer vision tasks involving natural images, allowing redundant dictionaries to be efficiently inferred [4][5][6]

As an extension to the global sparsity constraint, recent pieces in the literature have revisited the model to reach a more profound understanding of its uniqueness and stability conditions.[6]. Interestingly, by imposing a local sparsity prior in Γ, meaning that its independent patches can be interpreted as sparse vectors themselves, the structure in 𝐃 can be understood as a “local" dictionary operating over each independent patch. This model extension is denominated Convolutional Sparse Coding (CSC) and drastically reduces the burden of estimating signal representations while being characterized by stronger uniqueness and stability conditions. Furthermore, it allows for Γ to be efficiently estimated via projected gradient descent algorithms such as Orthonormal Matching Pursuit and Basis Pursuit, while performing in a local fashion[5]

Besides its versatility in inverse problems, recent efforts have focused on the multi-layer version of the model and provided evidence of its reliability for recovering multiple underlying representations[7]. Moreover, a tight connection between such a model and the well-established Convolutional Neural Network model (CNN) was revealed, providing a new tool for a more rigurous understanding of its theoretical conditions.

The convolutional sparse coding model provides a very efficient set of tools to solve a wide range of inverse problems, including image denoising, image inpainting, and image superresolution. By imposing local sparsity constraints, it allows to efficiently tackle the global coding problem by iteratively estimating disjoint patches and assembling them into a global signal. Furthermore, by adopting a multi-layer sparse model, which results from imposing the sparsity constraint to the signal inherent representations themselves, the resulting “Layered" Pursuit algorithm keeps the strong uniqueness and stability conditions from the single-layer model. This extension also provides some interesting notions about the relation between its sparsity prior and the forward pass of the Convolutional Neural Network, which allows to understand how the theoretical benefits of the CSC model can provide a strong mathematical meaning of the CNN structure.

Sparse Coding Paradigm

Basic concepts and models are presented to explain into detail the Convolutional Sparse Representation framework. On the grounds that the sparsity constraint has been proposed under different models, a short description of them is presented to show its evolution up to the model of interest. Also included are the concepts of Mutual Coherence and Restricted Isometry Property to establish uniqueness stability guarantees.

Global Sparse Coding Model

Allow signal 𝐱N to be expressed as a linear combination of a small number of atoms from a given dictionary 𝐃N×M,M>N. Alternatively, the signal can be expressed as 𝐱=𝐃Γ, where ΓM corresponds to the sparse representation of 𝐱, which selects the atoms to combine and their weights. Subsequently, given 𝐃, the task of recovering Γ from either the noise-free signal itself or an observation is denominated sparse coding. Considering the noise-free scenario, the coding problem is formulated as follows: Γ^ideal=argminΓΓ0s.t.𝐃Γ=𝐱. The effect of the 0 norm is to favor solutions with as much zero elements as possible. Furthermore, given an observation affected by bounded energy noise: 𝐘=𝐃Γ+𝐄,𝐄2<ϵ, the pursuit problem is reformulated as: Γ^noise=argminΓΓ0s.t.𝐃Γ𝐘2<ϵ.

Stability and Uniqueness Guarantees for the Global Sparse Model

Let the spark of 𝐃 be defined as the minimum number of linearly independent columns: σ(𝐃)=minΓΓ0s.t.𝐃Γ=0,Γ0.

Then, from the triangular inequality, the sparsest vector Γ satisfies: Γ0<σ(𝐃)2. Although the spark provides an upper bound, it is unfeasible to compute in practical scenarios. Instead, let the mutual coherence be a measure of similarity between atoms in 𝐃. Assuming 2-norm unit atoms, the mutual coherence of 𝐃 is defined as: μ(𝐃)=maxij𝐝𝐢𝐓d𝐣2, where 𝐝i are atoms. Based on this metric, it can be proven that the true sparse representation Γ* can be recovered if and only if Γ*0<12(1+1μ(𝐃)).

Similarly, under the presence of noise, an upper bound for the distance between the true sparse representation Γ* and its estimation Γ^ can be established via the Restricted Isometry Property (RIP). A k-RIP matrix 𝐃 with constant δk corresponds to: (1δk)Γ22𝐃Γ22(1+δk)Γ22, where δk is the smallest number that satisfies the inequality for every Γ0=k. Then, assuming Γ0<12(1+1μ(𝐃)), it is guaranteed that Γ^Γ*224ϵ21μ(𝐃)(2Γ01).

Solving such a general pursuit problem is a hard task if no structure is imposed on dictionary 𝐃. This implies learning large, highly overcomplete representations, which is extremely expensive. Assuming such a burden has been met and a representative dictionary has been obtained for a given signal 𝐱, typically based on prior information, Γ* can be estimated via several pursuit algorithms.

Pursuit Algorithms for the Global Sparse Model

Two basic methods for solving the Global Sparse Coding problem are Orthogonal Matching Pursuit (OMP) and Basis Pursuit (BP) . OMP is a greedy algorithm that iteratively selects the atom best correlated with the residual between 𝐱 and a current estimation, followed by a projection onto a subset of pre-selected atoms. On the other hand, BP is a more sophisticated approach that replaces the original coding problem by a linear programming problem. Based on this algorithms, the Global Sparse Coding provides considerably loose bounds for the uniqueness and stability of Γ^. To overcome this, additional priors are imposed over 𝐃 to guarantee tighter bounds and uniqueness conditions. The reader is referred to ([5], section 2) for details regarding this properties.

Convolutional Sparse Coding Model

A local prior is adopted such that each overlapping section of Γ is sparse. Let 𝐃N×Nm be constructed from shifted versions of a local dictionary D𝐋n×m,mM. Then, 𝐱 is formed by products between D𝐋 and local patches of ΓmN.

File:Structure of the Convolutional Sparse Coding Paradigm.svg
The global dictionary is expressed in terms of a stride convolutional matrix. So, signals can be generated in terms of stripes of the sparse representation multiplies by a shift-invariant local dictionary.

From the latter, Γ can be re-expressed by N disjoint sparse vectors αim: Γ{α1,α2,,αN}T. Similarly, let γ be a set of (2n1) consecutive vectors αi. Then, each disjoint segment in 𝐱 is expressed as: 𝐱i=𝐑i𝐃Γ, where operator 𝐑in×N extracts overlapping patches of size n starting at index i. Thus, 𝐑i𝐃 contains only (2n1)m nonzero columns. Hence, by introducing operator 𝐒i𝐑(2n1)m×Nm which exclusively preserves them: 𝐱i=𝐑i𝐃𝐒iTΩ(SiΓ)γi, where Ω is known as the stripe dictionary, which is independent of i, and γi is denominated the i-th stripe. So, 𝐱 corresponds to a patch aggregation or convolutional interpretation: 𝐱=i=1N𝐑iT𝐃Lαi=i=1m𝐝iz𝐢. Where 𝐝i corresponds to the i-th atom from the local dictionary 𝐃L and z𝐢 is constructed by elements of patches α: z𝐢(α1,i,α2,i,,αN,i)T. Given the new dictionary structure, let the 0, pseudo-norm be defined as: Γ0, maxiγi0. Then, for the noise-free and noise-corrupted scenarios, the problem can be respectively reformulated as: Γ^ideal=argminΓΓ0,s.t.𝐃Γ=𝐱,Γ^noise=argminΓΓ0,s.t.𝐘𝐃Γ2<ϵ.

Stability and Uniqueness Guarantees for the Convolutional Sparse Model

For the local approach, 𝐃 mutual coherence satisfies: μ(𝐃)(m1m(2n1)1)1/2. So, if a solution obeys Γ0,<12(1+1μ(𝐃)), then it is the sparsest solution to the 0, problem. Thus, under the local formulation, the same number of non-zeros is permitted for each stripe instead of the full vector!

Similar to the global model, the CSC is solved via OMP and BP methods, the latter contemplating the use of the Iterative Shrinkage Thresholding Algorithm (ISTA)[8] for splitting the pursuit into smaller problems. Based on the 0, pseudonorm, if a solution Γ exists satisfying Γ0,<12(1+1μ(𝐃)), then both methods are guaranteed to recover it. Moreover, the local model guarantees recovery independently of the signal dimension, as opposed to the 0 prior. Stability conditions for OMP and BP are also guaranteed if its Exact Recovery Condition (ERC) is met for a support 𝒯 with a constant θ. The ERC is defined as: θ=1maxi𝒯𝐃𝒯𝐝i1>0, where denotes the Pseudo-inverse. Algorithm 1 shows the Global Pursuit method based on ISTA.

Algorithm 1: 1D CSC via Local Iterative Soft-Thresholding.

Input:

𝐃L: Local Dictionary,

𝐲: observation,

λ: Regularization parameter,

c: step size for ISTA,

tol: tolerance factor,

maxiters: maximum number of iterations.

{αi}(0){𝟎N×1} (Initialize disjoint patches.)
{𝐫i}(0){𝐑i𝐲} (Initialize residual patches.)
k0

Repeat

{αi}(k)𝒮λc({αi}(k1)+1c{𝐃LT𝐫i}(k1)) (Coding along disjoint patches)
αi 𝐱^(k)i𝐑iT𝐃Lαi(k) (Patch Aggregation)
{𝐫i}(k)𝐑i(𝐲𝐱^(k)) (Update Residuals)
kk+1

Until 𝐱^(k)𝐱^(k1)2< tol or k> maxiters.

Multi-Layered Convolutional Sparse Coding Model

By imposing the sparsity prior in the inherent structure of 𝐱, strong conditions for a unique representation and feasible methods for estimating it are granted. Similarly, such a constraint can be applied to its representation itself, generating a cascade of sparse representations: Each code is defined by a few atoms of a given set of convolutional dictionaries.

Based on this criteria, yet another extension denominated Multi-layer Convolutional Sparse Coding (ML-CSC) is proposed. A set of analytical dictionaries {𝐃}k=1K can be efficiently designed, where sparse representations at each layer {Γ}k=1K are guaranteed by imposing the sparsity prior over the dictionaries themselves [7]. In other words, by considering dictionaries to be stride convolutional matrices i.e. atoms of the local dictionaries shift m elements instead of a single one, where m corresponds to the number of channels in the previous layer, it is guaranteed that the Γ0, norm of the representations along layers is bounded.

For example, given the dictionaries 𝐃1N×Nm1,𝐃2Nm1×Nm2, the signal is modeled as 𝐃1Γ1=𝐃1(𝐃2Γ2), where Γ1 is its sparse code, and Γ2 is the sparse code of Γ1. Then, the estimation of each representation is formulated as an optimization problem for both noise-free and noise-corrupted scenarios, respectively. Assuming Γ0=𝐱: Find{Γi}i=1Ks.t.Γi1=𝐃iΓi,Γi0,λiFind{Γi}i=1Ks.t.Γi1𝐃iΓi2εi,Γi0,λi

In what follows, theoretical guarantees for the uniqueness and stability of this extended model are described.

Theorem 1: (Uniqueness of Sparse Representations) Consider signal 𝐱 satisfies the (ML-CSC) model for a set of convolutional dictionaries {𝐃i}i=1K with mutual coherence {μ(𝐃i)}i=1K. If the true sparse representations satisfy {Γ}i=1K<12(1+1μ(𝐃i)), then a solution to the problem {Γi^}i=1K will be its unique solution if the thresholds are chosen to satisfy: λi<12(1+1μ(𝐃i)).

Theorem 2: (Global Stability of the noise-corrupted scenario) Consider signal 𝐱 satisfies the (ML-CSC) model for a set of convolutional dictionaries {𝐃i}i=1K is contaminated with noise 𝐄, where 𝐄2ϵ0. resulting in 𝐘=𝐗+𝐄. If λi<12(1+1μ(𝐃i)) and ϵi2=4ϵi121(2Γi0,1)μ(𝐃i), then the estimated representations {Γi}i=1K satisfy the following: ΓiΓ^i22ϵi2.

Projection-based Algorithms

As a simple approach for solving the ML-CSC problem, either via the 0 or 1 norms, is by computing inner products between 𝐱 and the dictionary atoms to identify the most representatives ones. Such a projection is described as: Γ^p=argminΓ12Γ𝐃T𝐱22+βΓpp{0,1},

which have closed-form solutions via the hard-thresholding β(𝐃T𝐱) and soft-thresholding algorithms 𝒮β(𝐃T𝐱), respectively. If a nonnegative constraint is also contemplated, the problem can be expressed via the 1 norm as: Γ^=argminΓ12Γ𝐃T𝐱22+βΓ1, s.t.Γ0, which closed-form solution corresponds to the soft nonnegative thresholding operator 𝒮β+(𝐃T𝐱), where 𝒮β+(z)max(zβ,0). Guarantees for the Layered soft-thresholding approach are included in the Appendix (Section 6.2).

Theorem 3: (Stable Recovery of the Multi-layered Soft-thresholding Algorithm) Consider signal 𝐱 that satisfies the (ML-CSC) model for a set of convolutional dictionaries {𝐃i}i=1K with mutual coherence {μ(𝐃i)}i=1K is contaminated with noise 𝐄, where 𝐄2ϵ0. resulting in 𝐘=𝐗+𝐄. Denote by |Γimin| and |Γimax| the lowest and highest entries in Γi. Let {Γ^i}i=1K be the estimated sparse representations obtained for {βi}i=1K. If Γi0,<12(1+1μ(𝐃i)|Γimin||Γimin|)1μ(𝐃i)ϵi1|Γimax| and βi is chosen according to: Γi0,s<12(1+1μ(𝐃i)|Γimin||Γimax|)1μ(𝐃i)ϵi1|Γimax| Then, Γ^i has the same support as Γi, and ΓiΓi^2,ϵi, for ϵi=Γi0,(ϵi1+μ(𝐃i)(Γi0,1)|Γimax|+βi)

Connections to Convolutional Neural Networks

Recall the Forward Pass of the Convolutional Neural Network (CNN) model, used in both training and inference steps. Let 𝐱Mm1 be its input and 𝐖kN×m1 the filters at layer k, which are followed by the Rectified Linear Unit ReLU(𝐱)=max(0,x), for bias 𝐛Mm1. Based on this elementary block, taking K=2 as example, the CNN output can be expressed as: 𝐙2=ReLU(𝐖2TReLU(𝐖1T𝐱)+𝐛1)+𝐛2). Finally, comparing the CNN algorithm and the Layered thresholding approach for the nonnegative constaint, it is straightforward to show that both are equivalent: Γ^=𝒮β2+(𝐃2T𝒮β1+(𝐃1T𝐱))=ReLU(𝐖2TReLU(𝐖1T𝐱+β1)+β2).

File:CNN Convolutional Layers.svg
Convolutional Layers from the Forward Pass Algorithm
File:ReLU and Nonnegative Soft Thresholding Functions.svg
Contrast between the Rectified Linear Unit function and the Nonnegative Soft Thresholding Pointwise Nonlinearities

As explained in what follows, this naive approach of solving the coding problem is a particular case of a more stable projected gradient descent algorithm for the ML-CSC model. Equiped with the stability conditions of both approaches, a more clear understanding about the class of signals a CNN can recover, under what noise conditions can an estimation be accurately attained, and how can its structure be modified to improve its theoretical conditions. The reader is referred to ([7], section 5) for details regarding their connection.

Pursuit Algorithms for the Multi-layer CSC Model

A crucial limitation of the Forward Pass is it being unable to recover the unique solution of the DCP problem, which existence has been demonstrated. So, instead of using a thresholding approach at each layer, a full pursuit method is adopted, denominated Layered Basis Pursuit (LBP). Considering the projection onto the 1 ball, the following problem is proposed: Γ^i=argminΓi12𝐃iΓiΓ^i22+ξiΓi1, where each layer is solved as an independent CSC problem, and ξi is proportional to the noise level at each layer. Among the methods for solving the layered coding problem, ISTA is an efficient decoupling alternative. In what follows, a short summary of the guarantees for the LBP are established.

Theorem 4: (Recovery guarantee) Consider a signal 𝐱 characterized by a set of sparse vectors {Γi}i=1K, convolutional dictionaries {𝐃i}i=1K and their corresponding mutual coherences {μ(𝐃i)}i=1K. If Γi0,<12(1+1μ(𝐃i)), then the LBP algorithm is guaranteed to recover the sparse representations.

Theorem 5: (Stability in the presence of noise) Consider the contaminated signal 𝐘=𝐗+𝐄, where 𝐄0,ϵ0 and 𝐱 is characterized by a set of sparse vectors {Γi}i=1K and convolutional dictionaries {𝐃i}i=1K. Let {Γ^i}i=1K be solutions obtained via the LBP algorithm with parameters {ξ}i=1K. If Γi0,<13(1+1μ(𝐃i)) and ξi=4ϵi1, then: (i) The support of the solution Γ^i is contained in that of Γi, (ii) ΓiΓ^i2,ϵi, and (iii) Any entry greater in absolute value than ϵiΓi0 is guaranteed to be recovered.

Applications of the Convolutional Sparse Coding Model: Image Inpainting

As a practical example, an efficient image inpainting method for color images via the CSC model is shown[6]. Consider the three-channel dictionary 𝐃N×M×3, where 𝐝c,m denotes the m-th atom at channel c, represents signal 𝐱 by a single cross-channel sparse representation Γ, with stripes denoted as 𝐳i. Given an observation 𝐲={𝐲r,𝐲g,𝐲b}, where randomly chosen channels at unknown pixel locations are fixed to zero, in a similar way to impulse noise, the problem is formulated as: {z^i}=argmin{𝐳i}12ci𝐝c,i𝐳i𝐲c22+λi𝐳i1. By means of ADMM[9], the cost function is decoupled into simpler sub-problems, allowing an efficient Γ estimation. Algorithm 2 describes the procedure, where D^c,m is the DFT representation of Dc,m, the convolutional matrix for the term 𝐝c,i𝐳i. Likewise, 𝐱^m and 𝐳^m correspond to the DFT representations of 𝐱m and 𝐳m, respectively, 𝒮β(.) corresponds to the Soft-thresholding function with argument β, and the 1,2 norm is defined as the 2 norm along the channel dimension c followed by the 1 norm along the spatial dimension m. The reader is referred to ([6], Section II) for details on the ADMM implementation and the dictionary learning procedure.

Algorithm 2: Color image inpainting via the Convolutional Sparse Coding model.

Input:

𝐃^c,m: DFT of convolutional matrices 𝐃c,m,

𝐲={𝐲r,𝐲g,𝐲b}: Color observation,

λ: Regularization parameter,

{μ,ρ}: step sizes for ADMM,

tol: tolerance factor,

maxiters: maximum number of iterations.

kk+1

Repeat

{𝐳^m}(k+1)argmin{𝐱^m}12cm𝐃^c,m𝐳^m𝐲^c+ρ2m𝐳^m(𝐲^m+𝐮^m(k))22.
{𝐲c,m}(k+1)argmin{𝐲c,m}λcm𝐲c,m1+μ{𝐱c,m(k+1)}2,1+ρ2m𝐳m(k+1)(𝐲+𝐮m(k))22.
𝐲m(k+1)=𝒮λ/ρ(𝐱m(k+1)+𝐮m(k)).
kk+1

Until {𝐳m}(k+1){𝐳m}(k)2<tol or i> maxiters.

References

  1. Jianchao Yang; Wright, John; Huang, Thomas S; Yi Ma (November 2010). "Image Super-Resolution Via Sparse Representation". IEEE Transactions on Image Processing. 19 (11): 2861–2873. Bibcode:2010ITIP...19.2861Y. doi:10.1109/TIP.2010.2050625. PMID 20483687.
  2. Wetzstein, Gordon; Heidrich, Wolfgang; Heide, Felix (2015). "Fast and Flexible Convolutional Sparse Coding": 5135–5143.
  3. Wohlberg, Brendt (2017). "SPORCO: A Python package for standard and convolutional sparse representations". Proceedings of the 16th Python in Science Conference: 1–8. doi:10.25080/shinma-7f4c6e7-001.
  4. Mairal, Julien; Bach, Francis; Ponce, Jean; Sapiro, Guillermo (2009). "Online Dictionary Learning for Sparse Coding". Proceedings of the 26th Annual International Conference on Machine Learning. ACM: 689–696. doi:10.1145/1553374.1553463. ISBN 9781605585161.
  5. 5.0 5.1 5.2 Papyan, Vardan; Sulam, Jeremias; Elad, Michael (1 November 2017). "Working Locally Thinking Globally: Theoretical Guarantees for Convolutional Sparse Coding". IEEE Transactions on Signal Processing. 65 (21): 5687–5701. arXiv:1707.06066. Bibcode:2017ITSP...65.5687P. doi:10.1109/TSP.2017.2733447.
  6. 6.0 6.1 6.2 6.3 Wohlberg, Brendt (6–8 March 2016). "Convolutional sparse representation of color images". 2016 IEEE Southwest Symposium on Image Analysis and Interpretation (SSIAI): 57–60. doi:10.1109/SSIAI.2016.7459174. ISBN 978-1-4673-9919-7.CS1 maint: Date format (link)
  7. 7.0 7.1 7.2 Papyan, Vardan; Romano, Yaniv; Elad, Michael (2017). "Convolutional Neural Networks Analyzed via Convolutional Sparse Coding". J. Mach. Learn. Res. 18 (1): 2887–2938. arXiv:1607.08194. Bibcode:2016arXiv160708194P. ISSN 1532-4435.
  8. Beck, Amir; Teboulle, Marc (January 2009). "A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems". SIAM Journal on Imaging Sciences. 2 (1): 183–202. doi:10.1137/080716542.
  9. Boyd, Stephen (2010). "Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers". Foundations and Trends in Machine Learning. 3 (1): 1–122. doi:10.1561/2200000016. ISSN 1935-8237.

External Links

[1]


This article "Convolutional Sparse Coding" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:Convolutional Sparse Coding. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.

Page kept on Wikipedia This page exists already on Wikipedia.
  1. "SPORCO". SParse Optimization Research COde (SPORCO).