^{†}Corresponding author. Email: xiangli@aoe.ac.cn
^{*}Project supported by the National Natural Science Foundation for Distinguished Young Scholars of China (Grant No. 61225024) and the National High Technology Research and Development Program of China (Grant No. 2011AA7012022).
Coded aperture snapshot spectral imaging (CASSI) has been discussed in recent years. It has the remarkable advantages of high optical throughput, snapshot imaging, etc. The entire spatialspectral datacube can be reconstructed with just a single twodimensional (2D) compressive sensing measurement. On the other hand, for less spectrally sparse scenes, the insufficiency of sparse sampling and aliasing in spatialspectral images reduce the accuracy of reconstructed threedimensional (3D) spectral cube. To solve this problem, this paper extends the improved CASSI. A bandpass filter array is mounted on the coded mask, and then the first image plane is divided into some continuous spectral subband areas. The entire 3D spectral cube could be captured by the relative movement between the object and the instrument. The principle analysis and imaging simulation are presented. Compared with peak signaltonoise ratio (PSNR) and the information entropy of the reconstructed images at different numbers of spectral subband areas, the reconstructed 3D spectral cube reveals an observable improvement in the reconstruction fidelity, with an increase in the number of the subbands and a simultaneous decrease in the number of spectral channels of each subband.
As a significant research direction in the field of optical remote sensing, imaging spectrometry has many applications including remote sensing, ^{[1]} geology, ^{[2]} biomedicine, ^{[3]} and the ocean environment.^{[4]} In the early years, priority was given to conventional dispersive imaging spectrometry.^{[5, 6]} With the development of technology, new types of imaging spectrometers appear constantly, and play important roles in different application fields.
Recently, a novel imaging spectrometry called compressive coded aperture spectral imaging, also known as coded aperture snapshot spectral imaging (CASSI), ^{[7– 9]} has been put forward. Based on the conventional imaging spectrometry, a coded mask, which modulates and compresses the threedimensional (3D) spatialspectral datacube about the scene, is introduced appropriately in the light path. The 3D spatialspectral information about a scene of interest is first encoded and captured with one snapshot at the twodimensional (2D) detector array. Compressed sensing (CS)^{[10, 11]} theory is then used to reconstruct the 3D datacube from the 2D measurement. Compared with the conventional imaging spectrometry, CASSI eliminates the disadvantages of low light throughput and temporal scanning, greatly reduces the quantity of original data, and alleviates the pressure of the storage and transmission of information.
According to CS theory, when the number of sampling points does not satisfy the Nyquist sampling theory, the sufficient condition that the signal can still be reconstructed precisely is that the observed signal is possessed of sparsity. In the current CASSI system, the process from 3D spatialspectral information to 2D compressed sensing imaging is realized, but sparse sampling is only taken into account in the spatial dimension while aliasing and compressing are considered in the spectral dimension. As a result, the fidelity of the reconstruction is very low. In this paper, the imaging process of CASSI is described and the key factors which affect the reconstruction of the 3D spatialspectral information are analyzed. An improved proposition is presented based on the concept of CASSI. Through computer simulation and comparison, the efficiency of this betterment is verified.
CASSI integrates spectral imaging, numerical computation method, compressed sensing, and other disciplines. Through a combination of optical and computational methods, the field of view (FOV) is expanded from a line in the conventional imaging spectrometer to a plane which greatly increases the light throughput. Through a combination of optics and compressed sensing, spatialspectral imaging information of a 3D datacube is acquired with just a single 2D measurement of the coded and spectrally dispersed source field. The quantity of original data acquired by the instrument is greatly reduced and the stability of the instrument is remarkably improved. The basic principle of CASSI is shown in Fig. 1.
As shown in Fig. 1, the incident light propagates through the objective optics to the coded mask considered as a specific modulator, and then the modulated signal passes through the collimating optics to the dispersive element for coded and spectrally dispersed information, and finally, after the imaging optics, the coded and spectrally dispersed signal is received by the detector to obtain compressed sensing spatialspectral aliasing images. Assuming that the optical system is a 4f system, the system in Fig. 1 can be equivalently simplified as shown in Fig. 2.
Let T(x, y) be the transmission function printed on the coded mask. The spectral density entering into the system is denoted as f(x, y; λ ). After being modulated by the coded aperture, the spectral density arriving at the collimating optics is
Assume that the dispersion only occurs along the x direction and the dispersive element is provided with dispersion coefficient β (λ ) and central wavelength λ _{c}. After propagating through the coded mask, the collimating optics, the dispersive element, and the imaging optics, the light has the spectral density at the focal plane array (FPA):
Particularly, the measurement at the detector array is an intensity image of the incident object rather than the spectral density. The obtained image on the detector can be represented as an integration process of the spectral density in the range of wavelengths at the corresponding point,
In Eq. (3), we can say that CASSI is also a “ point to point” imaging technique, and the information acquired by the detector is the superposition of the signals of different wavelengths and spatial points. Therefore, when adopting the same optical system, the energy efficiency of CASSI is close to the ordinary camera and the detecting sensitivity is higher than the spectral imager with a slit in the light path.
The reconstruction of CASSI is to invert the 3D spatialspectral information about the object scene from the aliasing 2D image. As the quantity of the measurements on the detector is much less than the number of spatial points multiplied by the number of spectral channels, the system is underdetermined. This underdetermined problem can be resolved by CS theory.
In the CS theory, as long as the signal is compressible (or sparsely represented), then it can be reconstructed with high probability at a sampling frequency much lower than the standard Nyquist sampling frequency. The two main principles of CS theory are sparsity and incoherence. The former pertains to the signals of interest, and the latter pertains to the sensing modality. A signal f ∈ R^{N} is Ksparse on an orthonormal basis Γ if f can be represented as a linear combination of K vectors of Γ
with K ≪ N and at most K nozero components in α .^{[12]}
If we use an observation basis H_{M× N} (M ≪ N) incoherent with Γ to observe the coefficient vector, after linear transformation, the observing collection ω _{M× 1} is acquired. Then f can be reconstructed with high probability from ω _{M× 1} using the optimization method. The mathematical model of the CS theory is shown in Fig. 3.
For the reconstructing process of compressed sensing information, the essence of the final analytic function is to solve the nonlinear optimization problem under certain conditions.^{[13]} That is,
where γ is a regularization parameter to adjust the sparsity of the data. In the case that the observation matrix is known, the key work to solve Eq. (5) is to select the appropriate sparse projection matrix, yielding the optimal α .
In general, the imaging chain of CASSI includes two processes, i.e., 2D compressed sensing imaging of 3D spatialspectral information, which is the process of information acquisition; and 3D spatialspectral datacube reconstructed from a 2D compressed sensing image, which is the process of information reconstruction.
For the process of information acquisition, choosing the coded function (the observation matrix), which will affect the accuracy of reconstruction, is especially significant. There are already some researches of the design and selection of the coded function, including the form of continuous orthogonal function, discrete Smatrix codes and random function.^{[14– 16]} However, in most cases, the correlation length of the observed object is usually very small. Random function is the optimal form of the coded function.^{[17, 18]}
For the process of information reconstruction, compressed sensing reconstruction algorithms are mainly used. Currently, there are varieties of reconstruction algorithms, including matching pursuit (MP), ^{[19]} orthogonal matching pursuit (OMP), ^{[20]} gradient projection for sparse reconstruction (GPSR), ^{[21]} and twostep iterative shrinkage/threshold (TwIST).^{[22]} The above algorithms behave well in compressed sensing imaging of 2D sparse sampling. However, the reconstructed result is not perfectly compellent in the cases of compressed sensing and aliasing of 3D spectral images. As can be seen in Ref. [8], the reconstructed images appear distortional in some way. There are two reasons for this phenomenon. First, the solution reconstructed by the TwIST algorithm is not optimal. Second, the quantity of information acquired by the detector from single sampling does not satisfy the reconstruction condition with high probability. In CASSI, the latter is the main reason, because the core of reconstruction condition lies on the restricted isometry property (RIP)^{[23]} which accounts for an optimal fraction among the quantity of sparse sampling, observation data and the reconstructed data. That is,
where K is the sparsity of the signal and M is the size of the measured signal. In other words, to reconstruct a signal with sparsity K completely, the quantity of the measurements M must meet the above condition. For instance, if the input datacube is composed of 512× 512 pixels and 33 spectral channels, the quantity of the FPA measurements will be 512× 544. Supposing the spatial dimension and the spectral dimension are also Ksparse, an equation can be obtained as
Through iterative computation, K is approximately 6. Apparently, whether in spatial dimension or in spectral dimension, it is extremely difficult to truly express the original information with 6 sparse sampling points. Therefore, merely through single exposure sampling in CASSI, it is insufficient to reconstruct the 3D datacube with high probability.
Furthermore, in the CASSI system, sparse sampling only occurs in the spatial dimension but aliasing sampling appears in the spectral dimension. Since spatialspectral aliasing only appears in the dispersive dimension, we select a slice in the ydimension to describe sparse sampling and spatialspectral aliasing as shown in Fig. 4. The intensity of each pixel at the focal plane array is the aliasing information of the spatial sparse sampling information expanded and migrated along the spectral dimension. For example, the intensity of e_{5} can be expressed as
Only e_{5} and t_{1} ∼ t_{5} are known in Eq. (8). Considering the pixels e_{1} · · · e_{9}, we can structure a system of linear equations as follows:
As shown in Eq. (9), the system is underdetermined. Without any constraint, it is unable to obtain the accurate solution. Thus, in CASSI, the inversion of 3D spatialspectral datacube is not only the reconstruction of the sparse sampling information, but also the process of resolving aliasing spatialspectral information. To ensure the correctness of the solution, the spectral cube should be possessed of high sparsity within the set of solutions. To improve the accuracy of the solution, the amount of sampling information should be increased.
According to the above analysis, only by a single exposure is it insufficient to reconstruct the 3D datacube accurately. So one proposed multiframe imaging to improve the accuracy of the solution, ^{[24]} and a higherorder computational model leading to image reconstructions less dependent on calibration also improves the image quality reconstruction of the underlying signals.^{[25]} It is worth noting that the increase of exposure shots only increases the amount of sampling information in the spatial dimension and does not pay significant attention to the reconstruction of spectral aliasing. Because of the long sampling time, the advantage of snapshot imaging is lost. In addition, there must exist a moving element to change the coded aperture in different shots, which will weaken the stability of the system.
Because the CASSI system usually works in a wide spectral range and the spectral information does not commonly meet the high sparsity condition, it is difficult to reconstruct the datacube accurately. Thus we look for an optimization scheme which meets high sparsity condition in the set of solutions and operates finely in a wide spectral range. To ensure high sparsity in the spectral dimension, we propose the improved system through the method of band division.
In the improved system, the mask in the first image plane is not only a coded function but also a spectrally bandpass filter array. The first image plane is divided into several continuous segments. In order to enhance continuity of the spectral dimension, there is an overlapping channel between adjacent subband areas as shown in Fig. 5. Each subband is continuous in the spectral dimension and spatial dimension, but not complete in the 3D space.
Assuming that the object scene {f_{ijh}} has M × N pixels and L spectral channels, the coded mask {t_{pq}} has M × N elements, the fullband is segmented into Q subbands and each subband consists of P spectral channels. Between adjacent subband areas, an overlapping channel is considered. P, Q, and L are related by
The measurement at the focal plane array can be expressed as
Equation (12) can also be written as
where ⌈ χ ⌉ rounds the elements of χ to the nearest integers greater than or equal to χ . Therefore, the scanning of the field of view (FOV) is required to acquire the complete 3D datacube. Compared with multiframe imaging through shifting the coded mask in traditional CASSI, the sparsity of the subband is obviously improved and multiple shots are achieved along with the relative movement of the instrument and the object scene. There is no need to increase movement components inside the instrument. So it has the advantage of high stability and good adaptability to the environment.
We still take a single slice of a spectral cube for example, and suppose that the number of spectral channels changes from five in CASSI to three as shown in Fig. 6. Like the traditional CASSI, the intensity of each pixel at the detector array is composed of a coded linear combination of the spectral information from the respective data cube slice. Due to the spectral band range being reduced, the system of linear equations can be written as
In Eq. (15), with the decrease of the number of spectral channels, the numbers of unknowns and equations are also reduced. The decrement of unknowns is in accordance with the number of the pixels in singleband, multiplied by the number of the reduced channels, which is quicker than the variation of the number of equations. At the same time, the sparsity in the spectral dimension is improved. Therefore, the accuracy of the solution under this condition is higher than the CASSI system.
In order to verify the effectiveness of improvement, some imaging simulations are carried out. The original source is generated by free data.^{[26]} For computational regions, our source is limited to one of the spatial regions (512× 512 pixels, and 33 spectral channels). An intensity image of the original source is shown in Fig. 7. The coded function also consists of the discrete random 512× 512 binary element pattern.
According to Eq. (11), L = 33, the number of the subbands (Q) and the number of spectral channels of each subband (P) could be in different combinations. Q = 1, P = 33 (the traditional CASSI); Q = 2, P = 17; Q = 4, P = 9; Q = 8, P = 5 are seriatim implemented in the same settings. The TwIST algorithm is used to reconstruct the spectral cube from continuous pushbroom imaging results. The intensity images of the reconstructions are shown in Fig. 8.
In order to quantitatively analyze the advantage of the improved system, the peak signaltonoise ratio (PSNR) and information entropy^{[27]} are used to evaluate the fidelity of reconstructed datacube as shown in Table 1. Information entropy is defined as
where p(φ _{l}) is the normalized frequency of the lth gray value in image φ _{l}.
Figure 9 shows PSNR and entropy change of the reconstructed data cubes each as a function of the number of subbands. The benefit achieved by the improved system is quantitatively remarkable by comparing with the PSNR and the entropy change of the reconstructed data cubes. The improvement of PSNR approaches to 5.3 dB and information entropy is also closer to the original when the number of subbands is 8. As can be seen in Ref. [25], the reconstruction can achieve a 4dB improvement. In order to obtain multiframe images, the system must stare at the object for a long sampling time, so it is favorable for capturing the static scene. However, in airborne and spaceborne applications, there is usually relative movement between the instrument and the object. The advantage of the system in this paper will emerge.
CASSI fully demonstrates the advantages of a multidisciplinary fusion, and makes up for the shortcomings of the conventional imaging spectrometry. In this paper, an improved system is presented, which commendably accounts for the degree of spatialspectral aliasing image, and improves the spectral sparsity within the reconstructed spectral channels. The quality of the reconstructed data is noticeable with higher PSNR and closer information entropy than the traditional CASSI. Since there does not exist any moving element in the system, its stability and reliability are enhanced. This amelioration makes CASSI more suitable for real applications such as aerospace remote sensing. Moreover, the TwIST algorithm generates image smoothing and debases the image resolution. The reconstructions are not really optimal. Our future research will focus on more accurate reconstructions through appropriately changing reconstructed basis. However, there are also some theoretical and technical issues which need to be addressed. First, there is no quantitative evaluation criterion about the sparsity of information, which determines the minimal number of observations to reconstruct the compressive information. The coded function also needs to be more optimal for high reconstruction fidelity.
1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 
