Selective Sampling for Compression and Effective Reconstruction

The current standard when creating a digital image is to have a grid of sensors with each sensor collecting information that defines its associated pixel value in a digital image. For many applications storing or transmitting the entire set of pixel values is not desirable due to its large size. In these cases processing is performed on the original data which reduces its size, but results in an image which is an approximation of the original. We present an alternative to this approach that will only use a subset of the existing sensors, thus collecting less initial data and therefore avoiding the need for the additional step of having to reduce the data size after sampling. For this approach, which uses sub-sampling, we define an image construction method which takes the sub-samples and produces its own approximation of the fully sampled image. With this construction method and a strict constraint on the kinds of images we are allowed to capture, we show that there exists an optimal set of pixel locations to sample to minimize the squared error between the constructed image and the fully sampled image. Since our method has less information about the image when it decides how to reduce its size, when compared to the current standard, our method will result in a less accurate approximation of the original image. Even with this lowered performance we will show that using our supplied construction method and sample locations under the specified constraint on the original image, the proposed method creates an image that is a visually acceptable approximation.

Introduction A digital image, is a picture which is made up entirely of shaded squares called pixels, whose shadings are selected from a finite set. In figure 1, a zoomed in view of the commonly used test image, Barbara, shows that indeed the image is made up of shaded pixels.

Figure 1: Barbara and sub-images
To create the Barbara image using a digital camera, there is a sensor in the camera dedicated to each pixel in the image. Each pixel's shading would be stored as a number. As an example of this the most zoomed in portion in figure 1 could also be viewed as a grid of numbers whose entries represent their associated pixel shading as shown in figure 2.
Saving all of these pixel values is often referred to as the raw image. The problem with storing the raw image is that it takes up a lot of memory. To counter this issue the raw image is encoded in such a way that its total memory footprint is Figure 2: Barbara sub-image numerical representation greatly reduced, but this comes at the expense of the image, meaning once this new data is decoded the resulting image is no longer the same as the raw version, rather it is a good approximation. This is known as lossy compression. The progression of taking a full grid of samples and then performing lossy compression is the typical way in which digital images are created and then compressed for storage. This chain of events will be referred to as complete sampling followed by compression.
Before moving on it is worth stating that many of the details of digital cameras, digital images, and image compression will be greatly simplified or ignored. For example we will only be working with gray scale images in this paper, this will all be done to avoid unnecessary complexity.
In this paper we will present an alternative to complete sampling followed by compression. We will propose taking a reduced set of samples, and by doing so, our data will be smaller than the raw image and thus will not need the step of lossy compression, while also requiring less sensors. Since we will have only collected some of the pixel values we will need a construction method that takes the pixel values we do have and estimates the others we do not.
We will place a strict constraint on the types of images we are allowed to sample which will be described later. With the construction method and the constrained image we will then show that there is an optimal set of pixel locations to sample such that the total squared error between our constructed image and the raw image is minimized.
In chapter 2, we will present some background material on a popular lossy compression method that served as a motivation for the proposed method. In chapter 3, we will show how a certain kind of image approximation technique yields visually pleasing results, and as such will be leveraged in our image construction method. Then in chapter 4 we will layout our image constraint, our construction method, and show that there are better locations to sample than others to minimize the squared error between the would-be raw image and our constructed image. In chapter 5 we present some result of applying the presented approach and finally in chapter 6 give a conclusion.

Inspiration and Background Material
The framework for our method is heavily inspired by a very popular image compression standard known as sequential Joint Photographic Experts Group (JPEG). For the purpose of context and to lay the ground work for what is to come we will step through a high level description of JPEG and some of its components The JPEG standard for compression has five major components, three of which we will utilize in some respect in our proposed method. The first of the five being JPEG partitions a raw image into sub-images; the rest of the JPEG processes operate on each sub-image independent of the others. A common subimage size is 8 pixels by 8 pixels. In figure 3 we show an example of a 16 by 16 pixel image being partitioned into 8 by 8 sub-images. Secondly, JPEG takes the   The third step of uniquely scaling down each DCT2 coefficient is performed followed by the fourth step of rounding each scaled coefficient toward zero. These two steps are of little importance in relation to this paper, but are necessary to mention as steps in the JPEG process. The fifth step is the ordering of the rounded scaled coefficients. JPEG uses the zigzag ordering shown in figure 6. This ordering Figure 6: Zigzag ordering of the rounded scaled DCT2 coefficients tends to create long runs of zeros at the tail end of the indexing. This is desirable for further compression.
Once every sub-image has undergone JPEG compression it has been greatly reduced in size and is easy to store or transmit. At some point we will wish to view the image and will then need to decompress the image. JPEG decompression will undo every step in reverse order with the exception of the round toward zero step as it is non invertible. The result will be an image, though this image is not exactly the original it is usually a very close approximation. Undoing the DCT2 is performed by the 2D Inverse Discrete Cosine Transform (IDCT2). The IDCT2 scales each image in the DCT2 basis by its associated coefficient and then sums all the scaled images together with a pixel wise summing operation to create an image.
The DCT2 and the IDCT2 are at the core of our proposed method, so it is worth taking a finer look at these two operations. As we will show the images in the DCT2 basis are built of vectors from the 1D Discrete Cosine Transform (DCT1) basis and therefore the DCT1 is a good starting point for a proper introduction to the DCT2.
In the DCT1 basis there is a fair amount of number repetition, out of the 64 numbers there are only 14 that are unique with 7 of the 14 being the negative of the other 7. One of the more important properties of the DCT1 basis is that it is The DCT2 is the two dimensional version of the DCT1 and therefore has many similar properties. Instead of a basis of vectors as in the DCT1 case, the DCT2 can be intuitively visualized as a basis of images. Again all 64, 8 by 8, images in the basis, denoted B(1), B(2), ..., B(64) are shown in figure 5 with the numbering referring to its location in the zigzag ordering scheme shown in figure 6.
Each image in the DCT2 basis is the outer product of vectors from the DCT1 basis, v T i v j . For example, the (1, 2) indexed 8 by 8 matrix in figure 5, B (2), is the outer product of v 1 and v 2 from figure 7, and is depicted in figure 8. To show that images in the DCT2 basis are orthonormal under element wise multiplication (.*) followed by summation we will use their DCT1 outer product definition, and their orthonormal property, v i v T j = 1 if i = j and zero otherwise.
The IDCT2 by its name undoes what the DCT2 performs. The IDCT2 takes the coefficients and scales each image in the basis accordingly and then sums them together to get the raw 8 by 8 image, R. With this definition of the IDCT2 we can represent it as a system of linear equations.
In short, the IDCT2 operation can be performed by a matrix multiplication if the coefficients and pixel values are stacked in vectors rather than grids. We can then call the IDCT2 matrix M , the raw image R, and the coefficient matrix C, represented as a column vector, r and c respectively, to yield When referring to the entries of r and c their subscripts will coincide with the numbering associated with the zigzag indexing. For example R 2,2, is equal to r 5 .
Similarly when referring to elements of B it might be more consistent to reference there entries by the zigzag ordering as well. Notice that the IDCT2 matrix is orthonormal since the DCT2 basis is orthonormal With this in hand we can show how the squared error in an estimate of the DCT2 coefficients is equal to the squared error in their image reproduction from these estimated coefficients. We will use this later when we want to show that our method minimizes the squared error in our image construction. Instead of showing this directly we will show that we minimize the squared error in the DCT2 coefficient estimates.

Image Approximation
If you have ever viewed a web page containing an image, using a slow internet connection, you may have experienced first observing a very blocky image, which as time passes becomes more detailed. If you are familiar with this effect, chances are you have viewed a progressive JPEG image [1] [2]. The idea behind progressive JPEG is an approximation of the JPEG image can be sent quicker than the entire JPEG image and would provide the user with enough information about the image to decide if they wish to wait longer for the higher quality version. How this is done is quite simple. The JPEG coefficients of each sub-image, the DCT2 coefficients quantized and rounded, are not sent all at once. Rather disjoint coefficient sets for each sub-image are sent and the user recreates each sub-image using all of its received JPEG coefficients while assuming the ones not received are zero. Eventually when the full set of coefficients has been received, the complete JPEG image can be created. The question then becomes what order should the coefficients be sent such that each approximate sub-image is as good as possible. If we sent the JPEG coefficients of all sub-images in descending order of the magnitude of their associated DCT2 coefficients, then this would minimize the squared error between all of the approximate sub-images and the JPEG sub-images.
The problem with sending the JPEG coefficients with the largest associated DCT2 values for each sub-image first, is that the ordering cannot be assumed and would need to be coded into the messages. This would enlarge the data which we already decided needed to be temporarily reduced for the sake of timeliness.
The compromise was to send the coefficients in the zigzag pattern, as based on empirical data the magnitudes tend to decrease with this indexing [3][4] [5]. As a quick check of this characteristic for the DCT2 we took the mean energy of all the DCT2 coefficients of every 8 by 8 sub-image on a set of Portable Gray Map (.pgm) images and have shown this for each coefficient in figure 9. The set of images used for this test consist of all the images on a Purdue Universtiy test image website [6].   way to tell what ordering of the DCT2 coefficients would guarantee that their magnitudes would monotonically decrease. With this in mind one ordering that tends, but does not guarantee, to decrease over its indexing is the zigzag ordering.
In the next chapter we will assume that a sub-image is made up of the first s DCT2

Image Constraint, Construction Method, and Optimal Sample Locations
As previously mentioned we will place a strong constraint on the kind of image we are allowed to capture. This constraint: if the raw image is taken, each 8 by 8 sub-image is s-sparse in the DCT2 domain. Meaning at most s of the DCT2 coefficients are nonzero. Further the indices of the s potential nonzero coefficients are known and the same for each sub-image. We will assume the s coefficients that might not be zero are the first s coefficients as defined by the zigzag ordering. Thou we are making this assumption it is worth mentioning that with slight modification the presented method will work with any predefined set of s potential nonzero coefficients. All subsequent references to the ordering of the DCT2 coefficients, or the images in the DCT2 basis, will imply the zigzag ordering.
Now that we have our image constraint we can present our construction method. Remember we are proposing taking a reduced set of samples, thus the need for a method to estimate pixel values at locations we did not sample. The construction method will operate on each sub-image region independently. The number of samples we will be permitted to take for each sub-image region will be n which is equal to s − 1 , giving us a compression ratio of 64:n. For simplicity, the construction technique can be broken up into two components. The first being a DCT2 coefficient estimator, and the later, the constructor, which takes the IDCT2 of the estimated coefficients and creates a sub-image. We will present the method with an example running parallel with the general form.
Let us assume s equals 3. This is something we know a priori, so we know the raw image, R, if it were captured would be a linear combination of the first three images, B(1), B(2), B (3), from the DCT2 basis. This can be seen in figure 13 with the note that the images on the left side of the equality have the same color scale while the raw image on the right side of the equality has a different color scale. This was done to show the detail in each image. We could also write what Or in the general case as Since we are only allowed 2, or n, samples we can not solve for all three coefficients instead we will assume c 3 , or c n+1 , is zero and estimate the first two, or n. By assuming c 3 , or c n+1 , is zero. We can now represent r 19 and r 41 as Or in general form yielding our estimator for s equal 3 as It is worth noting that we must pick locations to sample such that the matrix we created above is invertible or our estimator yields no results. In addition we could have picked different locations to sample and ended up with a different matrix and a different sample vector, r, and as such completely different coefficient estimates. Finally, if we have a different s value, our vectors and matrix would have been scaled in size accordingly. Now that we have finished the estimation portion of the construction technique all that would be left to do is zero fill the rest of the coefficients and take the IDCT2 giving us our approximate image.
The construction method was not designed in haste. The idea that we assume the last indexed coefficient is zero rather than any of the others was addressed at the end of the last chapter. As well the idea that we do not need all of an image's coefficients to accurately depict an image's important information was the main topic of the last chapter. With this in mind, if our estimator is able to get accurate estimates of the first n coefficients for each sub-image the complete image would seem to be a visually acceptable approximation.
Finally we arrive at the question we wanted to address: under our image constraint and using the presented construction method, are there better locations to sample such that the error in our reconstructed image can be minimized in a least squared sense? The answer is yes. As we addressed at the end of chapter 2, if we estimate the DCT2 coefficientsĉ in column vector form, and recreate a sub-image by taking the IDCT2, the squared error in the estimated image is equal to the squared error in the coefficient estimate. Given our image constraint and our coefficient estimator, our least squares coefficient error is Realizing that we can not control what the value of c 2 n+1 is, the only part of equation 6 that we can control is All the values on the right side of the equality in equation 7 are dependent on which pixel locations we choose to sample, with some sample locations not even being valid if the matrix given is not invertible. To minimize the value of equation 7 we currently must check over all combinations of n pixel locations and look to see which combinations yields the smallest squared error. It is the case that for some values of n there are multiple combinations that minimize the squared error as will be shown in the next chapter.

Selective Sampling Results
To find the minimum squared error for our construction method given our image constraint for a given s value we currently look at all 64 s−1 combinations of pixel locations and if they are valid locations, that create an invertible matrix, we calculate the value for equation 15. Unfortunately for an s value greater than 8 this becomes extremely processing intensive and as such we have only been able to obtain results up to and including s equals 8.
Performing this brute force search seems to yield multiple sets of sample locations that tie for the title of being optimal. Below in figure 14 to figure 20 we show all combinations of optimal sample locations for s equals 2 to 8. It is interesting that for each optimal combination its vertical and horizontal reflection is also an optimal sample combination. This is not proven but rather an observation for S from 2 to 8.
Since the Barbara image does not meet our image constraint for each s value we create versions of the Barbara image that do. We do this by taking each sub-image's DCT2 and zeroing out the necessary coefficients values for a given s value then reconstruct each sub-image by taking the IDCT2. In figure 21 to figure   27 we show the constrained image followed by the one produced by one of the optimal sample sets for a given s value. In figure 28 to figure 34 we show the the same results zooming in on the upper left sub-image. In these figures it can be observed that the sub-sampled reconstructed image using the optimal sample locations creates a visual similar image to the original constrained image. This is the result we had stated we would show.

Selective Sampling Conclusion
To recap, a brief introduction of the current standard for lossy image compression, complete sampling followed by compression, was given. It was stated that this approach requires a sensor for each pixel location in an image and needs further processing after all the sensor data is collected. Before going into any detail we stated that we would propose a compression method that would not require a sensor for every pixel in the image and by collecting less initial data would not be burdened with the need for compression processing on the sampled data.
As a segue into a more detailed understanding of the current standard and our proposed alternative we stepped through a simplified version of a popular lossy image compression method, JPEG. While going through some of the steps in the JPEG processing, we were also introducing some of the ideas we would leverage in our proposed method. The JPEG concepts of operating on sub-images independently as well as using the DCT2 and IDCT2 were all going to be exploited in the proposed alternative.
The last bit of information presented before getting into the proposed approach was that of image approximation. More specifically how using a sub-set of a sub-image's DCT2 coefficients can be used to produce good image approximations. The reason for this was to put our approach on good footing as we do not attempt to estimate all of a sub-images DCT2 coefficients. Furthermore, the image approximation method lent insight into which coefficients we decided not to attempt to estimate.
Once, in chapter 5, we laid out our image construction method and the constraint on the kinds of images we where allowed to capture and how many samples were allowed per sub-image. Once the rules of our problem were defined, we walked through the matrix algebra of the squared error until we had an equation that was the product of a positive number we could not control and another positive number which was dependent on the sample locations. This isolated the squared error due to the pixel locations that were chosen from sampling. This is the key point in this paper, that selectively sampling certain locations can decrease the error in our image approximation. Furthermore, we found the optimal sample combinations for s equals 2 to 8.
Unfortunately, the current method for finding the set of optimal sample combinations is a brute force search which imposed limitations on the depth of our results. The observation of symmetry among optimal combinations and the fact that the DCT2 basis only consists of 56 unique numbers with only 28 unique magnitudes could be insightful in finding a more direct way to find the optimal combinations or possibly reduce the size of the exhaustive search.
Last, we acknowledge that the image constraint was extremely tight but the benefit of this was to have a definitive result on which to build before moving on to a weaker constraint. This was accomplished.