Video Steganalysis for Digital Forensics Investigation

Increasing use of steganography in espionage and exfiltration of company secrets means that it is important to find ways to detect such activity. Because the amount of data being transferred is also growing, channels that can hide larger amounts of data are going to become increasingly attractive. This research will focus on detecting hidden data in one such medium, namely MPEG video.


Introduction
Steganography is the process of hiding information in "plain sight". Secret communication is possible by modifying a cover medium is to embed data. An analog example might be adding microscopic dots to an image or document. Digital steganography manipulates bits of data to embed the secret message with minimal impact on the interpretation of the original data.
Video steganography is an emerging sub-field of digital steganography. Most digital steganographic methods have relied on exploiting file formats to hide information in parts of files either not parsed or parts invisible to the user in normal processing and use. Some more advanced methods hide data in the noise produced by lossy compression formats, such as JPEG images or MP3 audio files. For example, compare the two flowers in Figure 1 and try to decide which is stegged.
Many of the JPEG image steganographic techniques carry over to MPEG video, however the steganalysis for video can be different because of the increase in the volume of data. Given the relatively high capacity of video, it is likely that it will be the next most popular carrier to discreetly transfer large amounts of data. The adoption rate of steganography in general and video steganography in particular is not well known, however there have been recent accounts in the news of law enforcement finding evidence of its use on suspect machines. Since steganography scanning tools are not yet mature enough for regular use, most of the evidence comes from steganographic tools themselves being installed.
Videos provide fairly high bandwidth for data embedding and are frequently posted and transferred on-line. The goal of steganalysis is to reduce the effective bit rate of data embedding in video by reliably detecting the higher embedding rates.
This research will focus on developing methods to detect steganography in digital video. The highest capacity channel in video is changing the Discrete Cosine Transform (DCT) coefficients that are used to encode frames of video. This method is easy to implement based on adaptations of existing JPEG steganographic tools to MPEG encoders, and therefore is likely to be the most prolific type.
Chapter 2 provides background on MPEG compression, digital steganography and digital steganalysis. Chapter 3 presents the methodology that this project developed for video steganalysis. The results of this method are found in Chapter 4.
Finally, concluding remarks and a discussion of the future direction of video steganalysis is presented in Chapter 5.

CHAPTER 2 Background
In order to understand the problem space, it is necessary to first describe how digital steganography works and how it is applied during compression MPEG video. The following presents only the relevant parts of the MPEG specification.
Examination of how data can be hidden and how others have sought to find will follow.

MPEG Compression
Almost all lossy compression techniques that deal with perceived media exploit that human senses do not distinguish small changes in high frequency information.
Visually this manifests as high detail areas of an image. In raw video every pixel is represented by 3 bytes, either by separating into red, green and blue (RGB), or, more likely, luma and two chroma components, called YUV or YCbCr. The human eye is less sensitive to color than intensity so MPEG always encodes using YUV with the chroma components down sampled by a factor of two horizontally and vertically. To aid the removal of high frequency data, MPEG uses Discrete Cosine Transform (DCT) to convert the raw (spatial) data into the frequency domain.
It does this in 8×8 blocks of pixels. The upper left coefficient is called the DC coefficient and represents an average intensity across the block, and the rest are AC coefficients. The DCT coefficients that come from the conversion are then divided (quantized) by amounts weighted by their importance. For example, the lowest frequency, the DC coefficient, is always divided by 8, whereas the highest frequency component is divided by 83 in the default quantization matrix [1]. The default quantization matrix is designed to give the best trade-off in compression and image quality.
MPEG files typically have three types of video frames. These are called Intracoded, Predictive-coded, and Bidirectionally predictive-coded, however they are usually abbreviated to I, P, or B frames. I-frames are entirely encoded using DCT values as discussed above. P-frames and B-frames are coded with reference to other I-or P-frames, as shown in the top of Figure 2. These references are in the form of motion vectors that indicate blocks to copy, as shown in the bottom of Figure 2. This research looks at changes to the visual data, not the metadata.
For the purposes of this project, steganography and watermarking work interchangeably, however watermarking is usually more focused on perceptual invisibility and less on statistical invisibility. That is, a watermark should not change the appearance of the video, however attempting to remove the watermark should destroy the quality of the video. Detection of the existence of a watermark is usually not considered as a criterion. Most watermarks embed data that resolves to a black and white image, however that is not required.

Images
While digital steganography can take many forms, it has achieved most of its popularity in images. This is mostly because of the prevalence of images and their relatively high capacity. Early forms of image steganography took the form of changing the least significant bits of pixel color values [3,4]. Pixel based leastsignificant-bit (LSB) steganography is unreliable when images are compressed using lossy JPEG compression, which is the most common image file type.
Steganography in JPEG images is easy to implement using libraries that give direct access to the quantized DCT coefficients. Several implementations of this exist ( [5,6,7,8]). Changing these values (usually by LSB bit-flipping) and then storing them back produces steganography that is much harder to detect than changing pixel values directly, despite that this change will effect a larger visual block (8×8 pixels).

Video
There are many stages at which data may be embedded into compressed digital video. The first stage would work in the spatial domain directly, changing the actual color values [9,10,11,12,13]. These techniques usually do not survive compression, so they require error correcting codes to ensure data fidelity. The second stage would be during frequency transform. This can be done with either Discrete Cosine Transform [14,15,16,17,18,19,20] or Discrete Wavelet Transform [21]. There is also an example in [20] of changing the value of motion vectors to embed data, although this will generally have a lower bit-rate. Finally, in [22], they change Huffman code pairs for AC coefficients in the bit stream directly. This has the advantage of being fast because the video is not fully decoded to images.
Watermarking techniques such as [9] are designed to withstand noise attacks by embedding data throughout several bit planes of the image. Bit planes are single bit cross-sections of an image which provide different levels of detail. Most steganographic techniques will only modify the least significant bit, or the lowest numbered bit-plane. By modifying higher bit planes (up to the fourth bit plane), [9] assures that trying to destroy the watermark also destroys the video. As in most watermarking techniques, data fidelity is not significant as long as strong correlation with the known watermark signal is found.
In [10] the goal is to avoid detection of the watermark, as well as provide resistance. By using only embedding the watermark in several small regions within a frame, and consistently identifying those regions across frames in a scene, they provide resistance against cropping and row/column deletion.
Although Discrete Wave Transform(DWT) is similar to DCT, since it is a type of frequency transform, data embedded using DWT will go back into the spatial domain before the video compression starts. Because it does not necessarily use the same block size as DCT, data embedded via DWT can be more robust than either DCT or spatial embeddings. This is shown in [13] by performing watermarking in several frequency bands and showing the results of various attacks. The other pre-compression DWT example in [12] uses the high frequency bands to determine where embedding in the low frequency bands will have the least impact.
Although not widely used, Motion-JPEG2000 uses DWT for video compression. In [21], they look at changing the bits of the DWT coefficients based on a complexity metric.
Since MPEGs also contain motion vector information, which provides subpixel resolution in block copies, there is room for some data to be embedded there without disturbing the image quality. This is the technique used in [20]. Since only P and B frames contain motion vectors, they also use DCT embedding in the I frame for control information needed by their algorithm.
By analyzing a given MPEG video, [22] finds unused variable-length-code (VLC) pairs that are used in the lossless compression phase. By using a "key" of these unused pairs, and modifying existing VLCs in the compressed stream, data can be embedded easily. This method has a very low bit-rate and can sometimes have a large key size.
In the category this project is focusing on, that of DCT embedding, there are several watermarking techniques, but no steganographic implementations. In [16] they embed data in the mid-range frequency coefficients where the complexity of the data embedded is less than the block it is being inserted into. Removing high-frequency coefficients to produce a pattern across blocks is employed in [17].
Adapting the embedding rate based on quality and frame type is done in [18]. An MPEG-4 based implementation is offered in [14] where testing shows that data embedding is visually noticeable in low bit rate videos.

Digital Steganalysis
This section is broken into two pieces. The first explains some of the work done with static images, which has direct applicability to video in that they use similar encodings. The second piece explains the work that has been done with video.

Images
While pixel-based LSB embedding is easy to implement and visually undetectable if there are no solid color areas in the image, it is also fairly easy to detect by simple statistics [23].
Most current work looks at DCT embedded steganography. Early attempts to detect this type of steganography using techniques such as image quality metrics [2] and wavelets [24,25]. There have been also been many approaches to detecting DCT-based steganography [26,27,28]. However, in [29] Fridrich achieves the best performance to date, with percent accuracies in the high nineties for some steganographic programs at 25% of the maximum embedding rate. There are two reasons this technique works so well. The first is by using a reference image for "calibration". In "blind" steganalysis, the original non-stegged image is not available for comparison. By slightly cropping the suspect image, it is possible to create an image that is similar enough to make a good approximation. The approximation is then used as a comparison for the statistics from the original. The second reason for increased accuracy is from focusing on exactly the data is being changed. It is specifically looking at the distribution of DCT values both between blocks and within blocks, whereas other the techniques were looking at less specific metrics, in hopes of detecting many types of steganography. It is this set of features that will form the basis of the work here, as the focus is on DCT encoded embeddings in MPEG.

Previous work
The URI research group has done previous work [30] in the area of image steganalysis by evaluating some feature sets, starting with Farid's work with Wavelets [31] and ending with Fridrich's work in [29]. Fridrich's feature sets showed the best accuracy, and were then evaluated under varying conditions. In particular, whereas previously the models were built with images against one quality level and then only tested against images of the same quality, our group evaluated using fewer models that spanned ranges of quality. Ranges of 10% image quality gave sufficiently good results, and then images of quality less than 50% are easily classified by the 50%-59% range model. Thus only 5 models need to be built, instead of 100. In addition, a model trained with a low embedding rate did well in detecting images with higher embedding rates, which gives a further reduction in the number of models necessary.

Video
For video steganalysis, an early but comprehensive treatment is from Budhia [32]. This work looked at detecting data embedded using additive white Gaussian noise in the spatial domain. By using data from surrounding frames, which they call collusion, an estimation of the current frame is achieved. Several different collusion approaches are tried, including simple linear averaging, weighted averaging and block based reconstruction of reference frames. Block based reconstruction searches for similar blocks in nearby frames and copies them into a new reference frame. The difference of this reference frame and the original is then used to estimate the embedded data. Their features use statistics such as kurtosis, entropy, and 25 th percentile over this estimation. They mention that their technique can apply to the DCT domain and test it using two different methods of embedding, though without considering the encoding process (for example, P/B frames).
A performance enhancement on [32] is proposed by Jainsky in MoViSteg [33] which also uses motion estimation to reconstruct a frame. They employ an asymptotic relative efficiency based detector, which "is efficient for large samples and weak signals" [33]. The detector uses an adaptive threshold that is based on statistics from sample frames in the video. While they do not give overall accuracy, they report at 60% true positive to 10% false positive rate at 75 dB Peak Signal-to-Noise-Ratio (PSNR).
Most recently in [34], B. and F. Liu use collusion with a window of frames limited by a predetermined correlation threshold. They use a simple linear collusion that averages the surrounding frames. While they obtain good results (from 88-100% at 40% embedding, depending on the embedding scheme), the watermarking techniques they test against make very distinctive changes in the DCT values used. Two of them increase the range of values, which will show up in the global histogram. Another simply removes several DCT values in select blocks, which would cause noise in the dual histogram.

Tools
Most of the implementation of this project used MATLAB [35] for manipulation of frame data. Because of its natural ability to deal with multi-dimensional data, manipulating and gathering data is easily expressed in its programming language. For example, sum(I(:)∼ = 0) expresses the number of non-zero elements in an array, regardless of the number of dimensions. Additionally, some of the existing image steganography and steganalysis work was available in MATLAB form.
MPEG encoder [36] and decoder [37] libraries were also available as C extensions to MATLAB.
The frame classifiers used the Linear Discriminant Analysis implementation from the statistical programming language R [38]. R was able to import the features extracted, build and evaluate models, and classify the frames. Other scripts converted the frame classifications into video classifications.

CHAPTER 3 Methodology
The goal of this project, to classify videos as stegged or not stegged, requires several steps. The first is to convert raw videos into MPEG. Raw videos must be used to ensure that the stream has not been re-compressed, which can confuse the steganalysis [1]. The next step embeds data into the MPEG files to create the stegged files. Steganalysis then extracts data from the stegged files to create features for the statistical classifier. Because the steganalysis methods are derived from image steganalysis, the features are output on a per frame basis. The statistical classifier will then make the decision on whether each frame is stegged. Finally, the video steganalysis looks at some or all of the frames to determine if the entire video is stegged.

Video Steganography
Before video steganalysis could begin, a prototype video steganography tool needed to be developed. Although there is at least one freely available tool that embeds data into video, it does so at the pixel level, before compression. Because the data is embedded before compression, much of the data could be lost during compression. To get around this requires repeated embedding with error correcting codes, which significantly reduces the capacity of the channel. Embedding after compression makes this unnecessary unless the goal is to outwit an "active warden" that manipulates the video, by re-compression, for example. for most videos suggests that they do not hold as much data, and so might not be worth using anyway.

Data generation
The stegged videos used for analysis were created using all combinations of two algorithms, two video qualities, and five embedding rates. The video qualities are normal (qscale=2) and low quality (qscale=5). The embedding rates were chosen to match previous work done with images for comparison. The following is a summary of those parameters, and the description follows: Although embedding data into a video by slightly changing the DCT coefficients should not seem to change the size of the file, it does. This happens because the coefficients are encoded using a variable length encoding scheme (VLC) where certain values might be much longer than others, even though they only differ by one. Because of the way that MBSteg models the DCT coefficients into buckets, the types of changes it makes are not least significant bit changes. That is, it will change an 1 to a 2 (or vice versa), which swaps two bits. This is different from our SimpleSteg which always only flips the last bit. The cumulative effect of these changes, when passed through the VLC encoding, is that MBSteg embedded files are slightly smaller (less than 1% up to 50% embedding) than the clean version, whereas SimpleSteg embedded files are slightly larger (less than 2% up to 50% embedding).

Video Steganalysis
To determine if a video is stegged using the above techniques, the first step will be to determine if the individual frames are stegged. Because of the similarity of JPEG and MPEG, and of the steganographic technique, it follows that the feature sets used in DCT-JPEG steganography will apply here as well. Once the frames are classified, another classification is done with those results to decide if the video as a whole is stegged.
As discussed in section 2.3, there are many feature sets available for images that could be used. Here only two are explored. The reasoning is that our prior work [5,6] with the Fridrich/Pevný set shows its accuracy to be very high. The other feature set used is [7] which is a little bit more recent. While a good feature set is important, the methodology used here can easily be updated to newer feature sets as they become available.

Feature Sets Description
Through a series of papers [8,9,10,11] • The second part is a set of histograms for 5 of the lowest frequency AC coefficients, over the same −5..5 range.
• The third part are dual histograms across 9 of the lowest frequency AC coefficients, capturing distribution of the values.
• The fourth part is a measure variation across all the DCT modes.
• The fifth part is a measure of the "blockiness" of the image, measured in the spatial domain.
• The sixth part is a co-occurrence matrix of pairs of neighboring DCT coefficients.
• The seventh part uses a Markov process based approach that observes the difference of DCT modes across neighboring blocks.
The Liu feature set from [7] measures the joint occurrence of small valued DCT coefficients both within a block, and across adjacent blocks. For example, it will look at how many times AC coefficient (3,2) = 2 when AC coefficient (2,1) = 4, both within the same block, and then with both the block to the right and the block below. The range of coefficients to look for is a parameter, which the authors set to −6..6 for a total of 169 features, and this value is also used in our testing.

Frame Classification
The Fridrich [11] feature set is used as a baseline. Liu's feature set, which is slightly more recent, gives a comparison point. Although extensions to these were planned that would included inter-frame statistics, the tests using next frame approximation, discussed next, indicated that this would be of limited value since the noise from motion outweighs the noise from the steganographic data.
For the approximation of the cover frame needed in the Fridrich set, three different approaches were taken.
cropping The first method is the same cropping and re-compression of the image done in [11]. This involves converting the frame back to the spatial domain, cropping the image by four pixels in both directions and then re-compressing the result. The reason for using 4 pixels is to cut the 8x8 block used in the DCT transform, but not significantly change the image. The re-compressed image will have similar DCT coefficients that can be used in the approximation.
next-frame The second approximation technique simply uses the next frame of the video, which should be fairly close to the current frame unless there is a scene cut, which happen infrequently.

frame-averaging
The last method is one of the approximation methods from [12] that averaged frames on either side of the one under consideration. Averaging frames from either side can give two benefits. First, it reduces the effect from a scene change that the next-frame method might have. Second, frame under consideration could be considered the middle of the frames before and after, so their average will be closer than either one of those frames. In [12] they find that in most cases one frame from each side is sufficient.
Note that to do either of these last two requires manually recalculating nearly all of the DCT values for most frames, since in most cases they do not appear in the bitstream. For example, following an I frame is either a P or B frame. Since these frame types usually do not encode much of the picture as full DCT values, instead being residual values from motion vectors, the full spatial frame needs to be re-transformed to DCT values. The cropping method also needs to do this transform for one the equivalent of one frame per frame, and so is the same as using the next frame in complexity.

Video Classification
The above describes only a decision process for an single frame. Since a video consists of several frames, and the goal is to decide if the video as a whole is stegged, a separate decision procedure is required. The obvious choice, taken in [12], may be a "majority rules" approach where if over half of the frames are reported as stegged, than the whole video is considered stegged. This requires decoding the entire video, which may be very costly.
As shown in section 4.2, a better solution might be use a sequential test where after each frame a decision can be made whether to accept it as stegged, reject it as non-stegged, or test the next frame. A statistical reference book [13] provides just such a method. If more stegged frames than would be predicted by the false positive rate appear, then the video is likely stegged. Conversely, if more clean frames than would be predicted by our false negative rate appear, then the video is likely not-stegged. Indeed, the method also allows for specifying how confident it should be in the determination.
A simplifying assumption made here is that either all frames have embedded data, or no frames do. Some forms of relaxing that assumption have simple solutions, and others do not. For example, the assumption that data is embedding starts from the beginning of the video and continues until the entire message is encoded might be handled by looking for a fall-off in the number of stegged images over time. However, a steganographer that embeds randomly throughout a video and keeps the overall number of frames embedded below the frame classifier's false-positive rate would be much harder to detect.

List of References CHAPTER 4 Evaluation
Evaluation is done into two parts. First, classification systems for each frame are considered based on the parameters discussed in section 3.2. Second, methods for classifying the entire video are analyzed.

Feature Selection
Evaluation started with checking the efficacy of different approximation methods for the cover image as well as varied sets of features. In particular, tests were performed on approximation by cropping, using the adjacent frame, and frame averaging. Both the features used in Fridrich and Pevný's paper [1] as well as newer features from Liu [2] were tested.
The original paper by Fridrich applied an SVM with a Gaussian kernel for classification. In our prior work with images [3], an LDA gave similar performance, with less overhead in model building. Early in this current project, a comparison of LDA and linear SVM's showed that LDA performance was superior, and so the research proceeded with that. For comparison purposes, another attempt to use radial kernels using tune.svm function, from R [4] package e1071 [5], was performed. To evaluate the detection technique, cross-validated, trained LDA models classified a test data set of videos that are stegged at different embedding rates. Two different steganographic techniques were tested. One of them is a modified version of MBSteg [6] that embeds data in each frame. The second is a more naïve implementation that just changes every AC DCT value with abs(x) > 1 up to the embedding rate percentage.
Previously, our JPEG work had very good accuracy down to 15% embedding [3]. Since one of the goals of steganalysis is to lower the effective bandwidth of a steganographic carrier, the effectiveness of the models was tested on several lower rates, as listed below. The cover approximations do not apply to the Markov process used in Liu's feature set. All other combinations of these parameters are valid and were tested.
In the analysis below, note that expectation is that accuracy on videos with data embedded by MBSteg is lower than those from SimpleSteg, due to the differences in how the steganography programs work. Additionally, the accuracy is lower as the embedding rate goes down as there are fewer changes to detect.  Table 5. Error over B frames P and B frames. Because of this, the steganalysis will proceed by looking only at the I frames of the video.
The next set of tables (Table 6 through Table 9) gather the results from tests run across only the I frames of each video.
It appears from Table 9 that the Liu feature set is not well suited to MPEG videos, as its best performance is a 12.5% error rate. That may be because of the low resolution of the videos. As their paper shows, accuracy of their method depends on image complexity, with lower complexity making detection easier. Because the video resolution is small, but generally has the same field of view as an image, its complexity is fairly high.
Using the Fridrich feature set, Tables 6 through 8  Frame Averaging do consistently better than Next frame.
In evaluating performance on individual videos, it is clear that videos with a lot of motion are classified more accurately by cropping than by the other approximation techniques. This indicates that the changes even in 1/30th of a second are too great to use adjacent frames as an reference, either independently or averaged with other nearby frames. From these results, it is clear that the best choices for future tests would be the features from [1] with the cropping method of cover approximation in all future tests. All the remaining data presented uses features extracted in this way.  Table 9. Error rates using [2] under varying Quality and Embedding Rate

Video Classification
The above results give error rates on a frame by frame basis. The goal now is to classify the video as a whole. In [7] they use a simple majority rule, so if more frames are classified as stegged, then the video is considered stegged. The effect of varying that threshold is shown in Because the false positive and false negative rates are similar (see Table 6), a majority rule does, in fact, give the lowest overall error. That is, from the table it is evident that using a threshold of 50% of the frames has the lowest error rate.
Note that this is also due to similar prevalence in the test set. In real-world use, this might need adjustment. In the test data, there are 50% stegged videos and 50% clean videos. A real data set will have a much higher number of clean videos, so the classifier might be adjusted to expect a lower prevalence of steg. However making this adjustment also make it less reliable for positively identifying stegged video frames. See [8] for a full description of how adjusting the LDA cutoff changes the Positive Predictive Value for the image classifier.  Table 11. Video classification errors with majority rule than those for the individual frames as shown in Table 6. This is because the error rates of the frame classifier are low enough that the percentage of clean or stegged frames can stabilize above the threshold.
There is one further optimization that could be made when classifying video.
Because decoding video is an expensive operation, it is desirable to stop processing as soon as decision can be reached. What is necessary is a method that looks at the cumulative results as it processes each frame and decides whether or not it has seen sufficient evidence to classify the entire video. To do this, a method found in [9] for a sequential test of a binomial distribution seems appropriate.
The method uses four parameters, α, β, θ 0 , and θ shows the constraint under which it must continue processing the video.
If the false positive rate of α f and false negative rate of β f for the frame classifier, then θ 0 = β f and θ 1 = 1 − α f .
If d m < A m , then the video is considered stegged, because of the m frames processed, more of them are classified stegged than would be expected by the false positive rate and a margin wide enough for α. On the other hand, if d m > R m , then more frames are classified as non-stegged than expected for the frame classifier's false negative rate plus the margin of β, so the video must not be stegged.
Two plots demonstrating this method are shown in Figure 3. The upper figure shows a non-stegged video tested with the parameters for a SimpleSteg 10% embedded video of with a qscale of 2. The second shows a stegged video with an embedding rate of 20% and a qscale of 2. Note that the distance between the accepting and rejecting lines is further apart at the lower embedding rate because there is more uncertainty in the underlying frame classification. Table 12 shows the result of the sequential test. Table 13 shows the comparison between the sequential test and the majority test. The sequential test is generally lower than that of the majority rule, especially for the low embedding rates. They both perform equally well, perfectly in fact, in the easiest case of 50% embedding using SimpleSteg. They are also comparable for MBSteg at that embedding rate, with majority rule getting 2.8% error, and Sequential getting 3% error for qscale=2.
At the lower embedding rates it seems the accuracy is much worse, with 50% for the sequential test while at 20-25% for the majority rule at the 10-15% embedding rate.
The reason the sequential test does worse is because the underlying classifiers have high false positive and false negative error rates, it makes it difficult for the sequential test to complete and be confident in the result. The videos are very short clips, so it is possible that a longer test might yield better results. For the higher embedding rates, however, the sequential tests are nearly as accurate and only need to process a fraction of the frames to make that determination.   was designed for small images and low bandwidth. This means that quality of the image is generally very poor. Even in the image results, there is a drop off in classification as the quality of the image decreases. This happens because more of the data is forced to zero, so no useful statistics can be collected.
Detecting steganography is only the first step for investigators. The next step is to extract the data. While some work in this has been done, such as in stegbreak [2], it is necessarily incomplete. The first issue is deciding which of several steganographic techniques was applied. Second it is necessary to decide which particular program embedded the data (see [3] for an example of multiclass analysis), and in some cases even the version is important. Third, in order to extract the data most steganographic programs require a password. If the program uses any sort of encryption, and most do, then testing each password requires expensive calculations.
Because of this, even relatively low false positive rates means spending an inordinate amount of time trying to crack encryption, only to find no data is there. Deciding that the correct data has been extracted is also a challenge. Some steganographic programs will tell the user if data was extracted successfully, however from a steganographer's point of view that is a bad design. Stegbreak measures the entropy of the extracted data, which works well if the embedded data is text or a well known file format. However, if a steganographer is aware of this technique, they could easily encrypt their data before embedding it, where it will be encrypted again. This would easily foil the entropy test, as encrypted data should be indistinguishable from noise.
In the statistical analysis of the image classifier provided in [4], it is shown that with some manipulation of the threshold used, the classifier can be more selective in deciding to classify an image or frame as stegged. It also shows that these classifiers work best when the prevalence of steganography nearly one-to-one with non-stegged media. This is obviously not the case. Therefore an examiner must work to pre-classify data by other means, such as being part of other correspondence, or by time line information that correlates with the investigation.
While steg-analysts continue to make headway in the arms race with steganographers, a good steganographer will always have the upper hand. The variety of techniques and places to hide data are nearly unlimited, and steganalysis cannot begin without some knowledge of the technique employed. Even given the caveats listed here, hopefully this research will deter some steganographers from assuming video is a high-capacity way to get their message out.
Although this work focused on MPEG videos, almost all other video codecs use DCT and quantization to reduce the amount of data and make further compression possible. Other codecs also have similar concepts of key and differential frames, but differ in the spacing, calculation, and encoding. To use the steganalysis performed in this project in other codecs, some analysis of the spread of coefficient values through the frame types to see if the assumptions of the feature set used still hold.
Different models might need to be built for various frame types. Most of the newer codecs have greater distance between key frames, so models specific to differential frames will become more important. Some newer codecs allow the quantizer to change within a frame, to adjust the level of detail in part of the image. This might mean re-evaluating the feature sets which currently build models that are quality level dependent. Additionally, other frame classifiers might yield more accurate results, as indicated by preliminary testing of the Gaussian kernel SVM shown in the Appendix.

APPENDIX Preliminary SVM Results
As explained in Chapter 4, a late development included testing an SVM model for frame classification instead of the LDA model. The results here show the error rates of a radial kernel using the parameters C = 1, γ = 2 −8 .