SPARTA: SYSTEM FOR PORTABLE ACQUISTION WITH REAL-TIME ANALYSIS

The aim of this thesis is to design, develop and test a new portable system for digital forensics imaging with real-time analysis over every live file. Currently large magnetic hard drives are infeasible to perform sequential imaging taking over 40 hours to complete before beginning with any forensic analysis. Attempted approaches included performing a limited (sparse) collection and performing a distributed live analysis using a high-end server environment, neither of which would be sufficient for field use. I designed and developed the code to test the system and developed comprehensive testing scenarios. I show that magnetic disk fragmentation has a direct, mostly linear impact over the speed at which a disk can be imaged and every live file be processed simultaneously. I show that RAM has a near exponential impact on simultaneous magnetic disk forensic imaging with all live file processing. I demonstrate that CASE/UCO has the potential to be the interoperable file format for digital forensics metadata exchange. I also demonstrate that a system for simultaneous forensic disk imaging with all live file analysis can be assembled with commercial off-the-shelf parts for less than $1000.

as to what is the relationship between disk fragmentation, RAM and processing cores such that disk imaging performance is not severely impacted by core forensic processing in parallel.
The overall goals for this research were: 1. Correctness -To develop a portable digital forensic imaging system that will create a correct bitstream forensic image file with simultaneous correct processing of each logical file.
2. Efficiency -To develop a portable digital forensic imaging system that will perform a bitstream copy of a source medium with simultaneous live file processing 3. Cost -To develop a system with commercial-off-the-shelf (COTS) parts with reasonable cost (less than $1000).

Digital Forensics Imaging
The National Institute of Standards and Technology (NIST) has one definition of digital forensics as "the application of science to the identification, collection, examination and analysis, of data while preserving the integrity of the information and maintaining a strict chain of custody for the data." [2] Forensics is especially necessary when the examined evidence is to be used in court proceedings, where preservation of evidence is critical to establish authenticity and provenance for the examined artifacts.
The basic standard that is used for preservation of digital evidence is a bitstream image, which is a bit-for-bit copy of a source medium called a forensic image. [2] When traditional hard drives were small in capacity, making a forensic image outside of a lab environment was relatively straightforward and not time intensive.
However, in the last decade, storage sizes have grown to the point that even making a basic forensic image is problematic, let alone processing all the data after the creation of the forensic image. [3] For example, the website StorageReview.com tested a 6TB Western Digital magnetic hard drive. [4] The maximum bandwidth exhibited by the drive for sequential reads was 214.53 Megabytes per Second (MB/s). At that bandwidth, to read all of the sectors on disk to make the forensic image would take approximately 7.77 hours. There would also need to be a full read-back of the data for hash verification to ensure that the data was read and copied correctly. Waiting on-site for at least 7 hours  for just forensic imaging, without verification and without forensic artifact analysis is   not feasible, especially when dealing with multiple systems and multiple drives. If 6   TB is difficult enough to address, as of the end of 2018, major retailers were selling 14 terabyte (TB) hard drives. [5] The days of imaging on-site without artifact analysis are quickly ending due to the explosion of larger hard drives. 1. Basic sector imaging, copying the data sector-by-sector or grouping physical sectors together.

Basic Hardware Imagers
2. Creating a cryptographic (or multiple cryptographic hashes) of the source bitstream while copying the data.
3. Performing a read-back of the written data for computing a verification hash for validating that the data copied is an exact match to the source data.
Based on limited testing and datasheet analysis, none of the devices logically analyze the file system for file cluster boundaries for intelligent grouping of sectors for imaging. Additionally, none of the listed devices performs file analysis because none of the devices inspects any of the logical data as it is being copied. For a 14TB hard drive sold today, any of these devices would take over 24 hours to image, making this process unfeasible.

Basic Software Imagers
Before hardware imaging devices were commonplace, disk imaging was performed using primarily Linux tools.

Triage
Various techniques have been adopted to attempt to combat the size problem among digital forensics, the majority of which fall into a form of triage, which attempts to selectively analyze items prior to imaging. [12] The advantage of this technique is that it can be performed in the field and does not require all systems to be brought back to the lab for analysis only to discover that the system did not contain relevant evidence. The problem with this technique is that it requires analysis prior to preservation to identify systems to preserve and analyze in depth at a later point. This delays the preservation of digital evidence, which could take hours as indicated above.

Evimetry
The issues surrounding software involved in digital forensics imaging and large drives have been tackled by developers. One of the most promising tools written by Dr. Bradley Schatz is Evimetry. [13] Evimetry allows for triage during imaging, allowing the examiner to indicate the areas of the disk to prioritize to collect and preserve for analysis. This is accomplished by using the Advanced Forensics Format (AFF) version 4, which is an evidence file format that creates a forensic image in a non-sequential manner. While extremely promising, Evimetry only outputs the image to AFF4, which is gaining support among commercial forensic tool manufacturers but is not as widely supported as Expert Witness Format (E01) or raw dd files. This is addressed by using a filesystem bridge developed by Schatz Forensics to allow the image file to be mounted to the examination system to be analyzed using any forensics tool.
However, Evimetry does not address the issue of analyzing every file while creating the bitstream image that SPARTA addresses.

Sifting Collectors
An alternative method is the creation of a sparse or partial logical image of the evidence as opposed to a full physical bitstream forensic image. The most promising of these techniques is the use of sifting collectors. [14] The problem with this technique is the fact that the entirety of the evidence is not collected nor examined.
During a criminal trial, the use of sifting collectors will lead to the argument from the opposing counsel that there is a possibility of exculpatory evidence in the region not collected nor analyzed, leading to potential doubt among the jury in the process of the usage of sifting collectors.
The figure below outlines the high-value areas identified by Grier and Richard III as being important versus those that are not used for forensic analysis.

Figure 1 -Breakdown of Typical Disk According to Grier & Richard III
In contrast to Sifting Collectors, SPARTA will analyze all live files, including those identified Windows OS files, in the event that individuals are masquerading files as Windows files.

LOTA
The most promising area for expediting forensic imaging and analysis is the use of parallel processing: while making the forensic image, perform automated analysis at the file level. This is outlined in a process called a latency-optimized target acquisition (LOTA). [15] However, there are key holes in the paper surrounding the LOTA system that make it unsuitable for a field device for processing all live files during a full physical acquisition.
1. There is a lack of research surrounding the requirements to perform targeted processing of files while performing a bit-stream image of a digital evidence source. The LOTA system details only processing cores required for certain tasks, but does not address memory requirements beyond "assuming a RAMrich configuration…" [15] In a portable field device that is cost-conscious, a clear establishment of RAM requirements is necessary.
2. The paper clearly identified file fragmentation as an area outside the scope of the project. What is lacking is quantifying the relationship between file fragmentation on-disk with efficacy of parallel processing of logical file data.
The LOTA system reference HDD was described as "best case scenario as it was created in one shot." [15] There is no published research literature to analyze the relationship between file processing for digital forensics and file fragmentation, which is something that this thesis seeks to establish.
The SPARTA system, in comparison to the LOTA system, will be a field-capable system that seeks to process all live files while simultaneously creating a full forensic disk image with verification. The results of the live file processing will be saved to a metadata file for importing into an analysis tool so that efforts do not need to be duplicated after acquisition and partial processing.

FIREBrick
Portable digital forensic tools are necessary for situations in which the digital source mediums cannot leave the environment. This is common with civil litigation cases where the computers are the property of the opposition and the only authorized work is to make a digital forensic image. Portable open source digital forensic devices have been developed, such as the FIREBrick prototype. [16] Yet when comparing the FIREBrick prototype against the LOTA system, for example, the FIREBrick appears to be woefully underpowered to perform any parallel processing.

Foundational Related Work
INDXParse is a suite of forensic tools written by Willi Ballenthin, a Reverse Engineer at Mandiant/FireEye. [17] The suite provides a range of capabilities for parsing different data structures unique to the NTFS file system. The prototype for SPARTA will perform the simultaneous analysis of NTFS volumes, and Mr.
Ballenthin has made his INDXParse NTFS parser tools available as open source. The project leverages the tools in the suite to parse the master file table records of the drive to build the cluster-to-sector map described in Chapter 3.
The Sleuth Kit (TSK) is a collection of utilities for forensics developed by Dr.
Brian Carrier of Basis Technology. [18] The singular usage of TSK in this project is the utility to extract the Master File Table (MFT) from the NTFS formatted volume prior to creating the bitstream image. To do this, I utilized the following two commands that are a part of TSK: mmls, which will list the partition table of a disk and the starting offsets; and icat, which will extract a file based on the inode number or MFT entry number. The exact sequence of commands used for extracting the MFT from a given drive is as follows: The first command will result in an integer number corresponding to the sector number of the NTFS volume. The second command will extract the file corresponding to MFT entry number 0 of the volume starting at the provided sector number of the provided source drive. MFT entry number 0 always refers to the MFT itself, so the result of this command will be the MFT encapsulated as a file for processing.

CASE/UCO
SPARTA will export all file processing results to an intermediary forensic

Test Dataset Developed
For testing SPARTA, I created a standard lab-built dataset that is fragmented to precise percentages. The dataset will be representative of a real world system: the extracted files from a fully functional Windows 10 system with the entire Govdocs1 corpus data, approximately 1 million files made available through the Digital Corpora.
[22] This drive is approximately 282 GB of data on a 320 GB magnetic hard drive.
This is in contrast to the testing done in the LOTA system: an ext4 formatted volume with approximately 1.8 million files and the m57 drive from the Digital Corpora, which was a 10 GB virtual disk.

Fragmentation Generation Tools
To create fragmented disks for testing, I used the tools provided by the company Raxco software as a part of the PerfectDisk tools for artificially creating disk fragmentation. The specific tool I used was called SCRAMBLE.exe. [23] This tool will take a provided disk volume and proceed to implement a pseudo-random fragmentation algorithm to a non-deterministic number of files on the disk. I then tested the fragmentation percentage using the built-in Windows Disk Defragmentation tool for Windows 10. Should the fragmentation percentage exceed the target, I used the Piriform Defraggler tool to selectively defragment a number of files and retest the volume fragmentation to ensure I was in the correct threshold.
This process allowed for five different physical disks containing the exact same data to be fragmented at the following levels: 0%, 5%, 10%, 20% and 50%.

SPARTA Requirements
The overall system for the SPARTA prototype is a system that will take in as an input a full disk or disk image with a single NTFS volume. NTFS is the focused file system for two reasons: it is designed with a singular structure defining all file cluster allocation (the Master File Table) and it is the default file system for all Windows systems. As output, the prototype will produce the following: a bitstream image file (dd) containing the entire contents of the input disk and a CASE/UCO metadata file with the results of the live file forensic analysis.

Software Design and Development
After performing the background research, I decided to write the project in Python 2.7. While Python is an interpreted language with known performance limitations, the theory was that with modern processing the limitations of the interpreted scripting/programming language would not be as much of a bottleneck as transfer speeds from the SATA bus. The test results described in Chapter 4 confirmed the theory.
Additionally, the additional components of the CASE/UCO implementation as well as the INDXParse library indicated that the project implementation would be expedited by using Python to align with the existing libraries.

The development environment was a licensed academic version of PyCharm by
JetBrains.

System Design Flow and Pseudocode
While most digital forensic imagers begin immediately reading the bitstream from the source disk, SPARTA requires some preprocessing to process these files live. The preprocessing steps are as follows b. If the file has non-resident data, process the cluster-runs which contain the list of clusters that contain the data for the file.
c. Map out each cluster run from a logical offset from the previous run (as stored in the Master File Table) to a physical offset from the beginning of the volume. This data will be stored in the data structure called a cluster map.
d. Determine which cluster run is the last in the physical ordering and mark it as the last cluster. This will allow the system to know if all clusters for a given file have been read from the bitstream.

End of preprocessing for SPARTA
After preprocessing completes, the main portion of the system begins to execute.
The main thread begins to read the bitstream image from the source drive. The read for the preliminary portions of the drive, before the NTFS volume, are done sector-bysector (512 bytes at a time) and written subsequently to the destination drive. While the sectors are being read in individually, the MD5 and SHA1 imaging hashes are being computed.
Once the bitstream read pointer reaches the first sector of the NTFS volume (the Volume Boot Record), the logic proceeds with the main SPARTA parallel processing engine as follows: 1. Create a thread-safe queue called unprocessed file queue.
a. Create a thread pool for processing these file objects. Initial testing created ten threads in the pool.
2. Create a thread-safe queue called processed file queue.
a. Create a thread pool for these processed files. Initial testing created one thread in this pool.
3. Set the cluster number to 0.
4. Lookup the current cluster number in the cluster map built from the Master File Table. 5. If the cluster number does not appear in the map, then read the cluster, compute the stream hash, and write the cluster to the bitstream output.
6. If the cluster is in the map a. Based on the data in the cluster map, read the entire cluster run length, which is avariable number of consecutive clusters depending on the run.
b. Add the data block read-in to the correct logical offset of the file object.
c. If this was the last cluster run for the file (in other words, if all logical clusters have been read), then add the file object to the queue created in step one.
d. If this was not the last cluster run, then mark in the cluster map that the file object has been created and waiting for the rest of the data.
e. Write the cluster-run data to the output data stream.
7. Move on to the next cluster if we are not at the end of the drive.
8. Once we reach the end of the drive, wait until the file processing object queue is empty before ending the main thread.
While the bitstream is being processed, the thread pool created above works on the file object queue as follows: 1. Wait until an item is in the unprocessed file queue.
2. Hash the logical file data based on the logical file size extracted from the MFT. The processed file queue is designed to output the results of the file processing to the CASE/UCO output standard as follows: 1. Wait until an item is in the processed file queue.
2. Pop the top item off the queue and send it to the CASE/UCO processor. This will write the appropriate JSON structure to conform to the CASE/UCO standard.
3. Wait until we have another entry.
The following diagram illustrates the overall proposed digital forensics imaging process utilized by SPARTA.

Fundamental Differences With SPARTA Compared to Other Tools
The following is a comprehensive list of functionality that is unique to SPARTA compared to other industry tools.
1. SPARTA performs an initial pass over the Master File 3. SPARTA writes the forensic processing results into a CASE/UCO format that will be a standard for data exchange between forensic tools. It is unclear if Wirespeed or LOTA export metadata into any format that can be utilized by other tools.

Operating System Configuration
The Operating System chosen for the prototype was a Linux operating system.
Linux was primarily chosen for both performance and low cost. Linux is a modular system that would allow for a full field unit to remove all unnecessary components from the prototype to maximize efficiency. This is in contrast to consumer operating systems like Microsoft Windows or Apple MacOS, that do not allow for the customization necessary for a performance field unit. Additionally, Microsoft Windows has a retail cost of $129.99 for the Home Edition, which would be an unnecessary cost for the unit.
A requirement of a forensics duplicator/imager is the ability to protect the source evidence from alterations from normal operating system behavior during the forensic ACTION=="add|change", SUBSYSTEM=="block", ENV{UDISKS_IGNORE}="1" The two above rules would prevent any attached devices from being automounted as read/write to the Operating System. Attached devices can be mounted as read/write, but would require specific user action to do so. This ensures that the operator would specifically indicate the destination device for all imaging actions. There is the understanding that a user error could result in overwriting of the source (for example, a misused DD command), but this is an assumed risk for all Linux-based forensic imaging devices.

Hardware Design
The main aspect of SPARTA that differentiates it from many other research projects focusing on parallel processing of digital evidence during forensic imaging is the fact that it is designed as a low-cost field unit. In order to achieve the research goal of cost-affordability, an investigation into the current hardware availability was necessary to determine if such a goal was attainable.
A selection of motherboard, processor, RAM, video card and solid-state drive (for the Operating System and OS swap space) was necessary for the basic components of the field prototype. Due to the portability requirements of the field prototype, the motherboard would be limited to only Mini-ITX selections. At the time of this research, the latest AMD selection in the Mini-ITX motherboard format was using the AM4 processor socket, while the Intel selection used the 1151 processor socket format. Based on these socket formats, the following processors would be the exemplar processors in the various core configurations needed for testing the performance of the system (Note that for each processor core selection, the processor listed is the lowest cost processor available from Newegg.com, a major retailer of hardware components): was decided to build the entire prototype around an AMD platform due to the lower price on all processors with equivalent CPU cores available.
While the amount of RAM available would vary between 8 GB and 32 GB, the type and speed of the RAM would not used as a variable for testing. The RAM chosen would be based on the least expensive configuration available at the time of testing.
The solid-state drive would be used for both the operating system, SPARTA software as well as the operating system virtual memory. To attempt to reduce the impact of the system secondary memory throughput as a performance bottleneck of the system, a selection would be made of a high-speed solid-state drive, preferably of the M.2 variant so that the system would be I/O bound based on the SATA source and SATA destination drives.
Based upon the research above, the following components were used for building and testing the SPARTA prototype as of 6/19/2018. The following are images of the completed prototype device. Note that the backside has two internal SATA and two internal SATA power connectors for the source and destination disks.

Development of SPARTA Testing Dataset
In order to establish equivalent testing for the SPARTA prototype as well as existing forensic tools, I developed a dataset that would be representative of a typical user's Windows system. The dataset included files and folder structure used for Windows 10, extracted from a virtual machine provided by the Microsoft Windows Dev Center.
[24] Additionally, the files available from the GovDocs corpus from the Digital Corpora were included to give a variety of typical user files. [22] Approximately 410,000 files from the GovDocs were included in the dataset, with After ensuring that each of the source drives had the dataset copied, I marked them with a specific measure of fragmentation: 0%, 5%, 10%, 20%, and 50%. Raxco software provides free utilities for generating disk fragmentation, to be used for testing defragmentation tools. [25] By utilizing a combination of the Scramble utility, which performs an entire disk fragmentation and Piriform's Defraggler [26], which allows for file-based defragmentation, I was able to accurately establish the appropriate levels of disk fragmentation on each disk as measured by the Windows 10 built-in tool for defragmenting disks. Each of the tools was configured to make a single DD bitstream images, unsegmented, with MD5 and SHA1 hash computations and verifications. The results of the testing are as follows. The SPARTA prototype was tested as a simple forensic imager with no file analysis. The prototype was configured at the maximum specifications of 8 cores and 32 GB of RAM. Three test runs of operating as a simple disk imager were performed, and the results recorded. The average time for SPARTA to perform as a disk imager was 1 hour, 43 minutes, equaling those of the simple imaging software and hardware tools. This result demonstrates that the decision to use Python as the base programming language did not increase the time necessary to perform forensic imaging as compared to established tools. There was a concern that using a language like Python, which is not as efficient as C or C++, would increase the time to image, but the results clearly show that not to be the case.

Testing Processing Tools
Many of the processing tools for forensics available and widely used in industry are closed-source paid products. I have valid professional licenses for the following tools that I used to test evidence processing: EnCase v. 6.19.7 by OpenText, Forensic Explorer v4.3.5 by GetData, and X-Ways Forensics 19.

Figure 5 -Forensic Tool Processing Times
With the exception of 10% fragmentation with EnCase and 50% fragmentation with Autopsy, each of the tests performed with an increase in fragmentation led to an increase in the time necessary to process the evidence. Further analysis would need to be performed to determine the rate of increase for processing time compared to fragmentation as no apparent linear or exponential pattern appears to fit well with the data.

Combined Imaging and Processing Times
Taking the average tool imaging time of 1 hour, 43 minutes with the average processing time for each measured level of fragmentation, we arrive at the following  4:09 The following measurements are charted below:

Figure 6 -Combined Average Imaging and Processing Times
We now have the baseline time measurements to determine if the SPARTA design, code and hardware can meet the efficiency standard of being within 10% of industry standard tools based on the averages taken above.

Testing the SPARTA Prototype With Variable Cores and RAM
The SPARTA prototype was used to test each scenario with varying disk fragmentation percentages of 0%, 5%, 10%, 20% and 50%, varying processing cores of 2, 4, 6 and 8, and varying available system RAM of 8 GB, 16 GB and 32 GB. Each scenario was run three times, with the averages of the three runs documented in the Average Processing and Imaging Times The time listed in the scenario above is the total time to perform the bitstream imaging, verification as well as perform the full file hash computations and file signature analysis.

Analysis of Memory Dependency
The When analyzing the processing speed of the SPARTA imaging and processing and comparing all cases for which 8 processing cores are present, we get the following chart comparing times with fragmentation and RAM.

Figure 7 -SPARTA Test Results with 8 Cores
The results indicate a direct relationship between available RAM and processing times, with the dependency becoming more accentuated as fragmentation increases as shown below.       Table 6 above does not seem to reflect a significant change in processing time when comparing the effects of varying the processing cores. The chart below lists all results when processing the drives using a fixed 32 GB of RAM.

Figure 11 -SPARTA Test Results with 32 GB RAM
We can see that when there is a fixed amount of RAM at 32 GB, the differences between the slowest time for 0% fragmentation at 2:16:39 and the fastest time at 0% fragmentation at 2:15:49 is only 50 seconds, which is negligible at less than 1% compared to the total imaging and processing time is approximately 2 hours and 16 However, this difference seems to be exaggerated by the high amount of disk fragmentation at 50%.
Another interesting observation is that the greater the fragmentation, generally the greater the differences in time between the shortest and fastest processing. At 0%, the difference is 50 seconds; at 5%, the difference is 1 minute, 45 seconds; at 10%, the difference is 1 minutes, 38 seconds; at 20%, the difference is 2 minutes, 34 seconds; and at 50%, the difference is 9 minutes and 50 seconds.
However, the overall comparison at least for 32 GB of RAM suggests that fragmentation has a near linear effect on processing time as shown below.

Figure 12 -Fragmentation Effects on Processing Speed with 32 GB RAM
The results from evaluating the performance of SPARTA given 16 GB of RAM largely reflect the results from 32 GB of RAM.

Figure 13 -SPARTA Test Results with 16 GB RAM
We again see that the difference between the most and fewest cores at 0% fragmentation is 4 minutes, 54 seconds, but it is curious that the 2 Core version performed better than the 8 core version. However, at 50% fragmentation the pattern matches the previous runs with the 8 core version performing 17 minutes and 50 seconds faster than the 2 core version, which equates to a 7.5% speed increase.
The results from testing SPARTA with 8 GB of RAM is largely the same as testing it with 16 and 32, with the data demonstrating little variation in testing times between 2 and 8 cores and a linear increase in time to process based on fragmentation.

Figure 14 -SPARTA Test Results with 8 GB RAM
It is interesting to note that for each of the test scenarios, having 50% fragmentation produced the greatest variability of times to process, even within the testing groups.

Analysis Of SPARTA Correctness Goal
There are three different aspects of SPARTA that needed to be tested for correctness: 1) The disk image being created. This would be tested using the created MD5 hash over the entire bitstream and comparing that hash to industry tools.
2) The file hashes being created. This would be tested by using the tool output and comparing it with industry tools.

SPARTA Test Results 8 GB Ram
3) The file signatures being created. This would be tested by comparing the signatures identified with the base file types, as none of the files had renamed file extensions to create a mismatch with file signatures.

Full Disk Imaging Correctness Analysis
The comparison was done using the SPARTA 0% fragmented disk, as all testing for industry tools were performed using this drive.
The Tableau TX-1, TD2u, dc3dd and Guymager all reported the drive and created image file to have the following hashes: MD5 -92ba9cf58f755ec346eef3806771c96c; SHA1 -84ef8b1962c3aae4b8fce032f9a4627f6f4b8086.
The first SPARTA test was with no file processing. The log indicated that the source MD5 hash was 92ba9cf58f755ec346eef3806771c96c and the destination (image file) created hash was a match at 92ba9cf58f755ec346eef3806771c96c.
The second SPARTA test was with file processing with 0% fragmentation. The log indicated that the test run with full file processing still generated a disk image with consistent hashes of 92ba9cf58f755ec346eef3806771c96c for MD5 hashes. Each test configuration with varying cores and RAM generated the same disk image with the same MD5 hash, showing correctness with creating disk images.

File Hash and Signature Correctness Analysis
For testing file hash correctness, each of the fragmentation variations CASE output files were saved for analysis. Ten files were chosen from this group to analyze their signatures and hashes to determine if they were computed correctly. For the hashes, they were compared against an analysis performed by X-Ways, a well-known forensic tool. For the signatures, they were compared against the extensions of the files since the files did not have a mismatch between the extension and the file signature.
The following files were compared and extracted: DOC 73E83DCDEEA4DB48A83EEFFFEF856D27 After performing the SPARTA tests on all five levels of fragmentation, the file list above was analyzed for extension and MD5 match. All five test cases produced the same results shown below, as well as an indication as to whether it compared the hash and/or signature correctly. indicate an error in the signature match lookup functionality of the tool or an error in the signature tables used. However, this is a slight error that can be corrected in future iterations or production systems. What is more important is the full data reassembly being performed correctly as indicated by the valid hash match.

Analysis of the SPARTA Efficiency Goal
One of the goals of the SPARTA research is to demonstrate that parallel forensic imaging and processing can be performed faster than sequentially imaging, verifying and then processing the data. The question is further refined due to the previous observation that disk fragmentation has a direct impact on processing speeds, even when only processing evidence after imaging. So each level of fragmentation is analyzed independently and configurations for which SPARTA is faster in all scenarios will be determined to match the efficiency goal.
The following table demonstrates all configuration times at 0% fragmentation as well as the average industry speeds. As seen above, every configuration with 16 GB of RAM or greater will beat industry standard averages.
The following table demonstrates 5% fragmentation.   Once again, SPARTA is faster in every configuration at 20% fragmentation compared to industry tools. fragmentation where all configurations in which there is at least 16 GB of RAM, SPARTA will have faster imaging and processing speeds compared to industry tools.
So combining the results of all efficiency metrics establishes that given a simultaneous digital forensics processing system with at least 16 GB of RAM, it will outperform parallel processing using industry standard tools. It is notable that any processing core configuration had no bearing over the ability for the system to reach its efficiency goal.

Analysis of the SPARTA Cost-Effectiveness Goal
Based on the results of the efficiency goal, we have determined that the minimum specifications for the SPARTA prototype are 16 GB of RAM, with any core configuration. The design outlined in section 3.4 used a 32 GB RAM configuration.
We can alter the configuration slightly to reduce the cost while still keeping the configuration to meet the efficiency goals as follows:

Analysis of the SPARTA Cross-Compatibility Goal
To achieve the cross-compatibility goal, a suitable metadata interchange format was selected, the Cyber-investigation Analysis Standard Expression. The ontology saves the results of forensic analysis into a standard JSON file. A Python API is available on GitHub and was used in the software implementation in SPARTA.
In testing to determine whether the CASE output from SPARTA matches the raw comma-separated values used in determining the correctness, it appeared as though the CASE output was not consistent with the expected outputs. This could be to a miscoding or may be due to an implementation issue in the API. All file metadata is reflected accurately in the CASE output but the file signature and cryptographic hash information is not accurate. Further testing and debugging with the CASE Python API developers may be necessary, but this would be a fairly easy fix for production systems.

CONCLUSION
The primary goal of the dissertation was to determine if a new process for digital forensics imaging could be developed to allow for simultaneous processing and forensic imaging of magnetic hard drives. Based on my professional experience and discussions with other examiners, I observed that a lot of human time was wasted waiting for a drive to be imaged before any forensic analysis could be performed. To save human time, different strategies have been researched to solve this problem. The strategy I have employed was to determine if enough processing power was available in commercial off the shelf (COTS) parts to allow for a limited set of forensic analysis to be performed over all live files faster than sequentially processing all live files after forensic imaging.
The additional research quantification involved determining the limiting factors of being able to perform all live file imaging. Specifically, the research endeavored to determine if available system RAM, processing cores, or disk fragmentation were the major factors impacting system performance to be able to perform simultaneous disk imaging and forensic file processing.
After performing nearly 570 hours of testing (nearly 24 straight days), the conclusions of the research are as follows: 1) Simultaneous processing of all live files during forensic imaging is not only possible but will perform faster than sequentially imaging then performing forensic analysis using industry standard tools.
2) Processing cores has no significant impact over the ability to perform the requisite tasks.
3) Disk fragmentation has a near-linear impact over performance for all types of file analysis, whether performed after the disk has been imaged or during forensic imaging. 4) RAM has the greatest impact on whether forensic file analysis can be performed during disk imaging and the impact appears to be exponential. With only 8 GB of RAM available, file analysis cannot be performed, but with 16 GB or greater, all files can be subjected to a limited set of forensic analysis during bitstream imaging.
The contributions to the research community are significant. The first being the establishing fact that disk fragmentation has a direct impact on the speed for file analysis. This has wide-ranging implications in both research and industry tools. No published research has quantified the effects of disk fragmentation on file processing, and from the test results it is apparent that there is a direct impact.
Another contribution is the creation of fragmented datasets that can be used in the forensic research community. While the dataset base is the Windows Operating System and the Digital Corpora datasets, they are fragmented to precise measurements and can be used for other testing.
Yet another contribution is the research results to indicate that all standalone digital forensic imaging devices used in industry are obsolete. Manufacturers and designers should be designing devices that can perform a selected set of digital forensic file processing tasks while creating the bitstream image. Utilizing the available processing power of the hardware, so long as enough RAM is provided, will result in faster time for forensic investigators to begin analyzing processed data.
There are a series of limitations in the research project and implementation. The first being the focus on magnetic hard drives. The design of solid state drives would suggest that fragmentation does not play a role in ability to process data since data access is constant across flash memory. However, without comprehensive testing, this remains only a theory.
The second limitation is the selected set of file processing tasks, namely file hash and file signature analysis. There are many other types of forensic file processes that can be performed, including file indexing, compressed file expansion, registry analysis, photo EXIF data analysis, and keyword searching. While Roussev began to outline CPU cores necessary to achieve some of these tasks, further research needs to be performed to determine both CPU cores and RAM requirements to perform different forensic file analysis techniques while factoring in disk fragmentation.
The third limitation is the limitation on the CASE standard. While it has not gained as widespread adoption as I hoped by the end of this research, the list of contributing companies is promising and hopefully as time progresses more tools will allow for importing of CASE data.
A fourth limitation is the base dataset used for testing. The only filesystem tested was NTFS. FAT, APFS, ext and ZFS all have different structures for tracking fragmentation and can lead to differing results in the effects of fragmentation on forensic file analysis.
A potential limitation could be the size of the source disk compared to the internal SSD used for swap space. Should the drive be too big for all of the incompletely analyzed file fragments to be stored in the swap space, the system could crash. A potential remedy is to turn off all simultaneous file processing when the source disk is over double the size of the internal swap space, ensuring that all file fragments could be stored in swap until the complete file is read and removed from the swap space.
The most easily identified future work is to do testing on Solid State Drives similar to what I did in testing the SPARTA prototype. Comprehensive testing of fragmentation on solid state drives (since SSDs support sector-based file systems such as NTFS and FAT) could lead to conclusions on whether or not fragmentation has any bearing on SSD performance. Additionally, testing SSDs will allow forensic examiners to determine if the increased cost of SSDs will lead to more efficient human time.
An additional area of future work would be to apply the same technique to other file systems, such as FAT. Having this ability to perform simultaneous processing of live files while imaging large USB removable disks formatted with FAT32 would be a boon to forensic investigators.
An additional area of future work would be to expand on this research and Roussev's work to determine the full requirements of forensic file processing, to include all types of file processing normally and potentially performed by forensic investigators. This research established that more than CPU cores are variables in efficacy of file processing, but further research can be performed in this area.
This research endeavor began because I had logged too much time waiting for a disk to finish imaging before I could begin with forensic file analysis. I wanted to determine if, while I was waiting around for the image to be created, if some forensic file analysis could be performed over all live files and if it would save time in the long-run. My research conclusively states that the answer is yes: by giving a system enough RAM, regardless of how much disk fragmentation exists, forensic analysis can be performed while creating the bitstream image and it would save time compared to established processes. I hope that forensic imaging device manufacturers read this paper so that new devices can be made to speed up what we are trying to do: establish truth in a court of law. 'Real-Time Analysis') parser.add_argument('source', action="store", help="Source Path (Device or DD Image") parser.add_argument('destination', action="store", help="Destination File Path") parser.add_argument('metadata', action="store", help="Path for file metadata") parser.add_argument('mft_path', action="store", help="Source MFT path") parser.add_argument('--file_processing', action='store_true', default=False, dest='file_processing') arg_results = parser.parse_args() with open("signatures_GCK.txt", "r") as signatures: for line in signatures: currline = line.split(",") fileDescription = currline[0] fileSig = currline [1].replace(" ", "") fileExt = currline [4] fileCategory = currline [5].strip('\n') # fileSigBytes = fileSig.split(" ") # trying to convert the string to a byte array fileSigBytes = bytearray.fromhex(fileSig) file_signatures.append((fileDescription, fileSigBytes, fileExt, fileCategory)) # reading MFT for processing cluster_map = parseMFTForFiles(arg_results.mft_path) #printClusterMap(cluster_map) # we are building a dictionary of files that actually contain the binary data for each file # the key will be the MFT record number, the value will be the binary data files = {}