TOWARDS A UNIFIED MODELING LANGUAGE (UML) PROFILE TO ADDRESS DIGITAL FORENSIC EVIDENCE COMPLEXITIES

There is significant complexity in digital forensics due to the numerous device types and device implementations. This complexity is exacerbated by the need for digital evidence to be understood by a wide variety of stakeholders with varying technical backgrounds. This study showed the utility of using software engineering Unified Modeling Language (UML) modeling techniques for addressing this complexity. Extensible, executable models for the digital forensics domain were developed depicting the relevant computational mechanisms involved in the who, what, when, where and how attributes of digital evidence creation. Artifacts generated from the executable models enable a systematic constructive methodology utilizing the principle of abstraction and pattern discovery to provide a top-down view of the commonalities across implementations. It was demonstrated that the abstracted, top-down view was equivalent to implementation specific detailed views. In addition, it was shown that the executable model artifacts could be used by software applications to illustrate the creation of digital forensic evidence at various levels of detail. Lastly, a profile was constructed to extend UML with digital forensic domain relevant concepts and vocabulary to help enable forensic domain stakeholders, who may not have a software engineering background, to apply modeling to digital forensics. The UML profile and the defined constructive methodology provided concrete artifacts to assist others in the future to develop digital forensic models.


vi LIST OF TABLES
The top-level model needs to evolve over time to address additional implementations as new devices or versions of devices are introduced. The profile also needs to be extended to address additional forensic use-cases, as required.
This chapter introduces forensic complexities and software engineering modeling. The problem statement is documented through a discussion of the ways in which modeling can address digital modeling complexities. Lastly, research questions, the hypothesis, and objectives are identified.

Overview forensics
In digital forensics, stakeholders are concerned with understanding the who, what, where, when, and how attributes of digital evidence. They need to know what evidence is available and where to look for it. In addition, they need to know, from a timeline perspective, when the evidence was created and, if possible, who created it.
Also, to defend the validly of the digital evidence, it is important to know how the evidence was created and how it could be changed.
To be able to answer these questions, the stakeholders need to have an understanding of how digital evidence is created on a device to the appropriate level of abstraction (e.g., detail) for their role (see Table 1   There is additional complexity in that a device is not static, but rather has dynamic behavior that is significant in understanding the creation of evidence.
Devices interact with users and other devices, and components within a device interact with other components. This research focused on the modeling of devices, device components, and the interactions of the device components. UML provides a formalism such that the model's graphical depiction is consistent with an underlying mathematical basis. UML-based models can be constrained so that they are unambiguous, which allows models to be internally consistent and executable as a programming language. Through model execution, the model behavior can be observed and recorded.
UML modeling utilizes principles, such as abstraction, to address complexity.
Abstraction ensures that only the important details necessary for a particular stakeholder are addressed. In addition, UML can also be used to identify common structural and behavior patterns of the system being modeled. This is beneficial since one representation of an implementation is less complex than having multiple unique representations of the same underlying device or component functionality.
UML utilizes object oriented terminology and software engineering concepts, which can be a barrier for individuals without a software background. To address this, a profile can be created to extend UML to address a domain (i.e., a specified sphere of activity or knowledge). The profile allows for a system to be modeled with concepts and terminology familiar to a domain stakeholder. As an example, a widely accepted UML profile for system engineering is the System Modeling Language (SYSML).
SYSML is used by numerous industries (e.g., auto, railway, defense, etc.). This work explored using UML profiles for the digital forensic domain.

Problem statement
There is significant complexity in digital forensics due the numerous possible

Research questions and objectives
The questions to be answered in this study were: 3. If so, is there a potential approach in applying software modeling holistically across the digital forensic domain?
The first question was addressed in this work by answering the following question:  Can implementation commonalty be identified and measured?
The second question was addressed in this work by answering these questions:  Can models be used to create top-level generalized diagrams?
 Can models facilitate digital forensic learning applications?
The third question was addressed in this work by answering these questions: Digital forensics tends to be a bottom-up process in that evidence gathering procedures focus on specific implementations. This work introduces a top-down approach which utilizes commonalities across implementations. The top-level models and associated top-down views, are more abstract than the implementation models and will provide alternate approaches in addressing digital forensics complexities.
Patterns were utilized to identify commonalities across the computational mechanisms being modeled. The identification of patterns was utilized to determine the degree of commonality in the different implementations of the computational mechanisms being modeled. In addition, the patterns can be reused in other modeling efforts and were also utilized to construct this works top-level models.
means of quantifying commonality between models and levels of abstraction between models. These metrics were utilized to assess potential relationships between these model and implementation properties.
Lastly, this work identified an initial digital forensic for the computational mechanisms which are the subject of this study. The profile identifies a set of modeling elements which define the model elements which are relevant to the vocabulary and concepts of the digital forensic domain. A process on how to extend the profile was also defined.

CHAPTER 2 2 REVIEW OF THE LITERTURE
The validity of digital forensic data can be subject to significant scrutiny, such as the highly publicized Anthony murder case in which the digital forensics tools provided contradictory results [1] [2]. There is a critical need to train computer forensic professionals to properly gather all relevant evidence and to have the staff to process evidence in a timely manner [3]. The most significant challenge in digital forensics is the lack of qualified people and recommended the development of new tools and capabilities [4].
Visualization techniques can be used to enhance learning and understanding.
Visualization has been used in addressing information that might be of interest in forensic investigations [5]. Visualization has been also specifically used for understanding of digital forensic information [6] [7] [8] [9] [10] [11].
UML is often used to facilitate application development as illustrated in [12].
Digital forensics investigations are performed on software-based devices that utilize standard computer architectures. Therefore, software engineering modeling techniques are also available to the digital forensics domain.
Models manage complexity by formally capturing both the static and dynamic design aspects of a system. A formal model is computationally rigorous and can be directly utilized in both the development and runtime aspects of applications. A model provides an abstraction of the system. Abstraction removes detail to allow a higher level view through which to facilitate understanding [13]. Model abstraction needs to provide the level of detail required to address the generalized attributes that are of interest, in this case the digital evidence attribute.
Patterns in software engineering are utilized to make designs more efficient by reusing common design approaches [14]. Patterns are documented with artifacts such as UML class, object, and sequence diagrams. An approach to specify UML patterns was discussed in [15]. Additionally, [16] discussed pattern types and how patterns can be utilized in developing domain-specific models.
UML is a widely accepted software modeling approach and is an Object Management Group (OMG) standard [17]. UML can be used to model both static and dynamic aspects of software. Model Driven Architecture (MDA) utilizes UML to support design by providing capabilities such as model execution and model transformation (e.g., code generation). Models can be executed, similar to code, on model virtual machines to simulate model behaviors utilizing frameworks that are based on the OMG Foundational UML [18] and OMG Action Language to Foundational UML (ALF) standard [19] [20]. An example of an equivalent modeling framework implementation is eXecutable Translatable UML [21]. Executable models are of significant importance to this research since they allow the actual behaviors of the modeled device/device components to be captured and utilized by applications.
UML profiles are a mechanism for extending UML to reflect the terminology and concepts of a particular domain. UML profiles provide a concise dialect that consists of stereotypes (i.e., new model meta-elements), tags (e.g., attributes) and constraints that are support by UML compliant tools [22]. The stereotypes and tags capture domain terminology/concepts. Object Constraint Language (OCL) rules are utilized to define the constraints, which are used to provide additional model precision and can be used for model validation. OCL is an extension of UML [23] that can formally specify UML.
Software engineering metrics have been developed for objected oriented software systems [24]. Examples of software reuse metrics are seen in [25].
Modeling and formalism in digital forensics and cyber security. In addressing modeling in digital forensics, it is useful to also look at how modeling is used in cyber security. The cyber security domain is directly related to the digital forensics domain in that the underling systems to be investigated or analyzed are computer systems that are based on similar technical concepts at similar levels of abstraction. In the digital forensic use-case, the analysis is reasoning about the existence of digital forensic evidence on the computer system, whereas in the cyber security use-case, the focus is on computer system security vulnerabilities.
Although there are a number of papers devoted to the use of formalism in digital forensics and cyber security, they have a different focus than the model proposed from this work. There is a significant amount of literature recommending the use of modeling in digital forensics to formally model the digital forensic process [26] [27] [28] [29] [30].
The formalism of modeling can be utilized algorithmically to reason about the system being modeled. A Turing Machine-Based model to address evidence is identified in [31]. A modeling method for forensic analysis formally using graphs to address attack vulnerabilities is discussed in [32]. The use of modeling for the analysis of evidence in storage media is introduced in [33].
Garfinkel [34] defined a limited XML schema to formally capture forensic case information. The intent of the schema was to provide an Application Programmer Interface (API) for digital forensic tools to share data sets.
The cyber security modeling focus is on identifying system vulnerabilities and identifying likely attack scenarios [35] [36]. These models incorporated aspects of the underlying system architecture and in some cases also included a model of the human element. A method to extend UML to address security concerns has been introduced in [37]. between UML profiles and DSMLs can be interchanged [42] and profiles can be used as a mechanism to design DSMLs [43].
In the cyber security realm utilizing profiles to incorporated security patterns is discussed in [44]. The utilization of security patterns for development of more cyber resilient systems is addressed in [45] [46]. Fernandez and Petrie [47] suggested that UML and security patterns can be used as a mechanism to teach secure system design.
An example of the utilization of a pattern in digital forensics to isolate forensically interesting network data was reported by [48].

Modeling in other domains: System Engineering Modeling Language (SysML). A significant example of extending UML for other disciplines is seen in the System
Modeling Language (SysML). SysML is a profile extension of UML with a focus on system engineering of complex systems and system-of-systems through their lifecycles. SysML has been applied to complex systems in many industries, including aircraft, automotive, defense, IT, medicine, and space systems. As an example, the utilization of SysML for auto-embedded systems are discussed in [49] [50]. The application of SysML for railroad crossings was identified in [51].

Contributions of this study.
This study showed the benefits and provided a unique approach to extend software modeling techniques into the digital forensic domain.
The actual subjects of the model were the computational mechanisms of the software architecture of evidence creation. The computational mechanisms as defined by this work were the forensically relevant data structures and control flow as dictated by the relevant device component(s). In other works for digital forensics, UML was used to model evidence acquisition, analysis of evidence, and attack methodologies. Other modeling techniques were utilized to reason on the validity of evidence. These works did not address the system level constructs of the computational mechanisms. None of the UML modeling techniques investigated for this study addressed executable models nor utilizing the outputs of executable models to be utilized by other applications.
Other works did address utilizing UML to produce animations. However, these animations were not related to digital forensics. This work utilized the formalism which UML provides to generate formal artifacts which could be parsed and animated for a forensic application. This animation provided insight on forensic applications based on models that can enhance domain understanding.
This work utilized the principle of abstraction along with discovered commonality patterns to define top-level models from detailed implementations. A resulting repeatable modeling process on how to construct the abstracted top-level was identified. An equivalent process was not seen in the literature review. Profiles are a common mechanism to extend UML in other domains, as seen with SysML. However, the literature search did not identify any digital forensic related profiles. This work also identifies a process to construct profiles from toplevel views.

METHODOLOGY
This work has two primary focus areas, the construction of top-level models and profiles, and the development of a learning application utilizing models artifacts.
The first focus area results in the construction of the top-level models to address digital forensic complexities through abstraction. The top-level models facilitate the construction of profiles which make modeling more accessible to digital forensic stakeholders. The second focus area addressed complexities by demonstrating that modeling could facilitate the development of applications utilized to enhance the understanding of digital forensic stakeholders.
A combination of expert review, analysis, test, and metrics were used to show that the objectives were met, see Table 3-1. Constructive methods "are heuristics that build up a complete solution from scratch by sequentially adding components to a partial solution until the solution is complete" [52]. For this work, a constructive method was utilized to define the top-level models and profiles. In addition, the constructive method provided the steps to perform the constructive analysis and test, as identified in Table 3 This chapter begins by addressing the digital forensic problem space and how it relates to this work. Next, the constructive methodology is discussed, the forensic subjects to be modeled are identified, followed by the modeling implementation approach. The metrics to address commonality and complexity are then introduced.
Lastly, the animated application is described.

Mapping to the digital forensic problem space
The digital forensic problem space that this work addressed is combinatorically explosive. There is an uncountable number of digital forensic scenarios that can be executed on practically an uncountable number of implementations of devices and device components. the context of this work include media analysis, media management analysis, file system analysis, application analysis, network analysis, operating system analysis, executable analysis, image analysis, and video analysis, as originally identified by [54]. Additional areas that were added over time include RAM (Random Access Memory), mobile, and database forensics.
Given a forensic area, there are forensic use-cases which identify specific investigation types. A use-case, in the context of this work, was a set of actions performed by the suspect on the targeted system or device that would be of interest to the forensic stakeholder. A use-case scenario was one realization of a use-case which requires an initial configuration. A forensic attribute was either the evidence or contributes to evidence identification for the forensic stakeholder.
This work utilized use-case descriptions to describe a high-level scenario and the forensic attributes that were of interest for a particular type of forensic evidence creation. The descriptions were accompanied by an associated use-case diagram that provided additional details of the activities, the system boundary, and participants (e.g., actors) in the scenario. The combination of the use-case description and usecase diagram provided the specification for that which was modeled.
The forensic scenario and forensic attributes of what, when, where of the evidence assist in determining the data structures of the specific implementation. It should be noted that the forensic attributes may be at a much higher level of abstraction than the actual implementation of the underlying device or component.
The how of evidence creation is reflected in how the underlying computational data structures are utilized and how they change during a scenario.  Table 3-2.

Eight-Step constructive method
The eight steps which comprised the constructive methodology are detailed in   The specification was used to develop the implementation specific models.
Once developed, the implementation specific models were analyzed for commonalities from both a black box and white box perspective. From a black box perspective, common functionality across the implementations was identified as a functional group.
The functional groups were utilized to extend the initial use-case. The resultant extended use-case described the top-level model. From a white box perspective, the forensic data structures and scenario control flow were analyzed to determine common implementation patterns.
The top-level model implements the functional groups and the associated usecase scenario. An analysis was performed to determine which implementation pattern should be utilized to implement the functional group. The top-level and implementation models were shown to be equivalent by ensuring that the attributes of the modeled data structures could be transformed to the forensic attributes.   Model specification (step 1). The first step was to determine the use-cases and usecase descriptions. The use-cases identified the actors, the systems (i.e., devices), the activities or functions, and components utilized by a suspect for a given scenario. The high-level functions were typically at the operating system or at the application level (e.g., Microsoft Word, Chrome browser, command line, etc.). After the use-case and the associated scenario were determined, the associated forensic attributes were identified. The use-case description defined a specific usage of the use-case. The forensic attributes set the model abstraction level.

Develop specific models (step 2).
There were three specific model implementations developed for the use-case. The relevant components (e.g., operating system, applications) and forensic data structures of the implementation were the focus of the static model. The operating system and applications were the typical components in the model along with the forensic data structures. The forensic data structures were often modeled from tables in forensic documentation or documentation which described the functionality of interest. The use-case scenario was the basis for the behavioral model that was developed as a state diagram and utilized action language.
In addition, the initial conditions for each implementation scenario where defined.

Execute models (step 3).
The specific models were instrumented so that upon model execution, an XML script was created. The XML script logged the behavior of the model during the implementation with respect to the forensic attributes. In the specific models, these captured attributes may not have been the exact forensic attributes, but they could be related to the forensic attributes. For each implementation, the model behavioral script was verified against the source documentation to ensure the model implementation exhibited the expected behavior. Associated Object Action Language (OAL) was created to achieve the desired behavior of the generation of the forensic attributes for the use-case scenario. Metrics were taken to quantify commonality across the use-case and to quantify model abstraction.

Validate top-level model (step 6).
Model equivalence showed that the behavior of the top-level model was equivalent to the behavior of all the specific models for a given scenario. Figure 3-5 shows conceptually how model equivalence was determined.
The executable models were instrumented such that an XML script was generated, capturing the behavior of the top-level model and the behavior for each implementation-specific model with respect to the forensic attributes of the given scenario.
To verify that the specific implementation models were equivalent to the top- This mapping was analyzed to assess equivalence. If the top-level and specific models executed the same use-case scenario and each model either directly accounted for the forensic attributes or could map to the forensic attributes through a transformation, it was reasonable to claim that the specific models were equivalent to the top-level model.  As shown in Table 3  were captured as stereotypes in the profile diagrams. The top-level model data types were used to assist in developing the stereotype tags. However, not all top-level data types were utilized, specifically if they were too detailed or implementation specific.

Develop/integrate DF profile (step 8).
The use-case profiles were analyzed to create one digital forensic profile. Common stereotypes across all the area profiles were then refactored to ensure that one representation worked across all areas. The area profiles were adjusted so as to not duplicate the stereotypes which resided in the common stereotypes. The common stereotypes and the adjusted unique profiles were combined to create the overall digital forensic profile. Metrics were taken to quantify commonality across the use-cases.

Constructive Method Contribution.
A contribution of this work was the modeling of the relevant computational mechanisms for evidence creation. How this modeling was performed is identified by the constructive method. The constructive method itself has resulted in artifacts and processes which are also contributions of this work. The artifacts and processes are: 1. Identified a process to construct top-level implementation views of computational mechanisms, 2. Identified modeling patterns to catalog commonalties in computational mechanisms, and 3. Introduced a digital forensic profile along with a process to extend the profile.

Model specification: use-cases and use-case descriptions-(step 1)
Three model specifications were defined for the file system create/delete usecase, the browser, browse and download use-case and the RAM list process/network connection use-case. For each of these use-cases, a use-case description and use-case diagram are defined.

File system forensics
The file system implementations were based on [55] and were augmented with materials from URI forensic coursework. Figure 3-7 depicts the file system create/delete use-case and Figure 3-8 provides the use-case description that included the high-level scenario and the associated forensic attributes.
The file system allocation/delete use-case scenario begins when the suspect "saves" a new file. At some later point in time, the user deletes the file by moving it to "trash". The file system use-case was investigating evidence that was created during file allocation and file deletion. Evidence of interest included information on the file itself, times, and information, all of which could be used to find evidence, and file slack, which could be areas in which data can be hidden.
When modeling a use-case, only the functions that are available for all of the implementations should be included in the use-case scenario to ensure the consistency of the results in the top-level model. For example, since the journaling capability is not available in FAT file system, it is not addressed in the file allocation/deletion usecase scenario. The major data structures of the FAT file systems include the boot sector, the FAT, the directory structure, and the clusters. The boot sector contains the information required for the operating system to determine locations of the relevant data structures.
File contents are stored as clusters. The directory is represented as a set of tables whose entries contain information on a specific file or a subordinate directory. The FAT table provides information about the clusters in which the file information is stored.
An example of a file allocation and deallocation in the FAT file system are shown in Figure 3-9. In this scenario, the file " root\dir\file1.dat" which is 6000 bytes is to be allocated to a FAT file system which has a cluster size of 4096 bytes. The FAT scenario begins when the OS reads values from the Boot Sector to determine cluster size and the location of key file structures. The OS then reads the root directory to determine the cluster number of the next directory in the path. Once the target directory is found the metadata for the file entry in the target directory is inserted.
This includes long file name (LFN) and short file name (SFN), size, timestamps, and the setting of associated flags. The OS then determines the cluster to be used as the start cluster for the file. This cluster is written, and if there is more to write, the next free cluster is determined from the FAT. This process is repeated until there is no additional file information to write. At this point, the write time is updated.
For file deletion the OS determines the location of the target directory, using the same process as described above for file allocation. Utilizing the target directory entry, the target file is located, and the start cluster of the target file is identified.
Utilizing the FAT, the entries in the FAT are marked as "empty", but the contents remain. In addition, the file names in the target directory entry are modified, but not deleted. Times stamps and relevant flags are updated as required.

New Technology File System (NTFS)
The NTFS major data structures are the Master File Table (MFT) and Clusters.
The MFT entries are composed of attributes, which are themselves complex data structures. There are MFT entries for directories and files and other data structures which are of importance to the file system.
An example of a file allocation and deallocation in the NTFS file system is shown in Figure 3-10. In this scenario the file " root\dir\file1.dat," which is 4000 bytes, was to be allocated to an NTFS file system that had a cluster size of 2048 bytes.
The file allocation began when the OS accessed the boot sector to determine cluster size and the requisite information to process the MFT. In the data attribute within the associated MFT entry, the allocation of clusters were defined in terms of data runs. A data run is a method in defining how an ordered set of clusters can be logically encoded in bits.
The MFT Entry Bitmap was processed to determine an empty MFT entry.
This entry had the relavant attributes created or updated. The Cluster Bitmap was used to determine the set of clusters to which the file contents were to be written. The relevant attributes were updated and then the file content was written.
Next, the target directory in which the file resided was updated. Starting from the root directory MFT entry, the entry for the target directory entry was determined by navigating the directory structure. The relevant attributes for the target directory entry were updated.
File deletion is accomplished by starting with the MFT Entry for the root, processing the relevant attributes to determine the MFT Entry for the target directory.
From this, the MFT entry for the target file is determined. The target directory is adjusted to account for the deletion of the file. MFT entry in the Bit Map is processed to indicate that the clusters are available.
After file deletion, the file contents are still in the clusters and the pointers to the underlying attributes still exist.

Extended (EXT) file system
The implementation details for allocation and deallocation of files in the EXT file system are shown in Figure 3 To perform file deletion, the OS system starts from the root directory and process the EXT structures in the same way as described above for file allocation to determine the location of the targeted directory structure. The targeted file elements are removed from the block containing the directory contents and the file Inode was deallocated. In addition, the associated entries for the Block Bitmap are deallocated.
At the end of the deallocation process the contents in the blocks still exist.    The underlying information for the browser application was either stored in databases or files in the file systems.       By analyzing the process list, the forensic analyst is able to determine the applications that are running and potentially the start times for the applications both of which could be tied to user activity. In addition, the forensic analyst may want to inspect for malware. This can be accomplished by:

Browser forensics
 inspecting for unexpected processes, or  inspecting the parent and child to determine anomalies in how the child was launched.
The application utilizes the sockets API to interface with the Communication Object, which represents the network communications of the system. A socket provides an "endpoint" for sending and/or receiving data. A socket can be accessed by a process to either receive or send data. The process knows how to address the socket by accessing the file descriptor in the handle table, which is the implementation of the OSI-7 layer stack. The network stack contains the network information, such as IP addresses and ports for the application.

Modeling overview
The UML is a modeling language based on an Object Oriented (OO) programming approach for software development. UML modeling is equivalent to programming in most procedural languages, but modeling is at a higher abstraction level. UML models are often used to provide the overarching software design for software systems. This design is then utilized to implement the software system in the more expressive procedural languages. In addition, models, like traditional source code, can be executed utilizing a model compiler. The two primary environments to support the modeling for this work were eXecutable Translatable UML [21] and Papyrus [68]. UML consists of fifteen diagrams, seven structural and eight behavior (depicted in Figure 3-18). Structural diagrams such as the class diagram represent the entities being modeled and the relationship between the entities in terms of associations. Another structured diagram is the object diagram, which shows specific instances of the classes. Instances are related by links.
An example of a behavioral diagram is the state machine diagram, which shows the internal behavior of the class. The use-case diagram, another behavioral diagram, identifies system functions which are contained in the system boundary along with the actors or stakeholders who utilize the system based on the use-case functions.
To be valid, a diagram needs to adhere to the UML-specified rules. These rules ensure that the model representation is consistent across the diagrams. A collection may be a set (e.g., a mathematical set with no duplicate elements), a bag (e.g., a set which may have duplicate elements), an ordered set (e.g., elements ordered by position), or a sequence (e.g., a bag in which elements are ordered).

Important association types include:
 Inheritance (represented as an arrow): provides a subtyping ("is-a" relationship);  Composition (represented as a filled in diamond): an association type between class elements which have a "has-a" or "part-of" relationship;  Aggregation (represented as a diamond): an association type between class elements which have a "has-a" or "part-of" relationship. This differs from composition in that the "parts" can exist even if the whole does not;  Dependency Association (represented as a dashed line on an association to a dependency class): a class which provides attributes on an association.  The DataElement has the following functions:  read: to provide the value for a specific key,  write: to modify the value which is associated with an existing key.     provides the metamodel rules for the models which are developed in a UML environment. This is the level of modeling performed in this work. The instantiation of these models resulted in the actual objects that related to some real world entity.
When the model was executed, instantiations were created consistent with the scenario; this is at the M0 level.  OCL is a declarative language that can add constraints to UML diagrams, such as class and state diagrams. OCL is utilized to further specify the meanings in the diagrams to which it is applied. OCL can be viewed in terms of set theory. OCL has a standard library of primitive types and provides operations for both primitive and userdefined collection types. An OCL expression has a context that is the element for which the OCL expression is defined. To get to another element, the expression can have navigation rules to reach another element.
An example of a profile is displayed in Figure 3- The modeling environment provided by xtUML is based on Shlaer-Mellor modeling methodology, which uses a subset of UML notation. This modeling methodology only utilized a subset of UML diagrams and there are some minor differences within the constructs of these diagrams that needed to be addressed for this work. As an example, xtUML state machines do not have specific initial and final state symbols.
The key elements in the executable model for this work, the class diagrams, the state machines diagram, and the OAL action language, are depicted in Figure 3  Papyrus did not affect the outcome because they were mostly notational.
In the example, the OS class has an attribute, osContext, which is a nested data type of OSData is the type of attribute osContext. Figure 3-26 shows the structure of the nested data type as modeled.   The model needed to be initialized before it executed. This initialization is analogous to the instantiation of classes in an object oriented languages or the equivalent of the object diagram in UML.
In the context of this work, the model configuration was based on the use-case scenario being model. The model contained a function that initialized the scenario by instantiating classes and defining links between the instances. To configure the scenario, an initialization function needed to be developed. Figure 3-29 shows a configuration function for the example. This function was manually invoked.
After the configuration function was executed, the instantiated classes and relationships can be seen in the xtUML display environment, shown in    provided information on the entity with the control flow (e.g., the operating system) to the entity on which it was focused (e.g., forensic data structure). Also provided in this operation was the "message" that might be passed between the entities. The attributeDataII operation was used to pass the state of the model data structure attribute to be recorded in an XML file when it was read or written during the scenario. The defined operations were invoked from the model (see Figure 3-34). The operation implementations were static methods in the associated Java application (see Figure   3-35). A snippet of the resulting XML file is shown in Figure 3-36.    The ICM metric is analogous to the reuse leverage metric of [69] and the abstraction metric [70]. The elements that these metrics were measuring are identified in   where:  CIS is the total number of common implementation structures (e.g. common patterns).
 TIS is the total number of implementation structures.
An implementation structure is the modeled data structure and may consist of one or more model elements. The specifics of how CIS and TIS are determined is dependent on how the ICM is utilized. In each case, it is key to determine the common set of structures for the entities which commonality is to be determined.  The application of the ICM equation was as follows:

Case 1 (single implementation). In a single implementation equation 2 was utilized
where:  CIS: the total number of common implementation structures in the implementation. The CIS patterns need to be previously defined for this calculation. As an example the common patterns may be those which are defined for the use-case.
 TIS: the total number of implementation structures in the implementation.
The common patterns in which the implementation measured against will need to be  TIS: the total implementation structures between the two implementations.
This metric can also be applied to additional levels of functionality within the implementations. As an example, in this work, the metric was applied to functional groups to provide additional granularity.
In the example in the figure, there are three implementations being compared.
When I1 and I2 are compared, C1 is the only common structure between the implementations. This common structure occurs twice. In addition, there are a total of ten structures resulting in a metric calculation of 1/5 comparing the two implementations. This is repeated for the other two combinations. The average of these calculations result in a value of 2/5 which defines the ICM for the use-case

Case 3 (comparing two use-cases). Across use-cases equation 2 as utilized where:
 CIS: the common implementation structures across the use-cases.
 TIS: the total implementation structures in both use-cases.
In the example of the figure, the use-cases have three common patterns, C1, C2, and C3. These common patterns occur fifteen times. There are a total of thirty elements which resulted in a metric calculation of ½.

Metric Contributions.
A contribution of this work was the identification of metrics to support commonality and abstraction analysis. This was accomplished with the establishment of the TLAM and ICM metrics.

Animation Application Implementation
A prototype animation application depicting the file creation/deletion scenario for the FAT file system was created. The application showed the key attributes of evidence creation for the relevant underlying computational architecture. The application utilized a script generated by executing a model for the given scenario.
The application provided views to meet the needs of different stakeholder roles.
The animation application read in the behavioral XML script from the FAT scenario. The application animated the behavioral script. The animation showed the control flow of the operating system by showing the forensic data structures that were being read from or written to as the operating system changed states through the scenario steps. The control flow was depicted by an arrow that indicated the data structure that was being read or modified.
The architecture for the animation application is shown in Figure 3

CHAPTER 4 4 FINDINGS
This chapter details the findings of this work. The findings included:  Pattern identification (step 4).
 Top-level general models and forensic area profiles (steps 5 -7).
 Constructive procedure for extensibility.
 Metric results.

 Application analysis.
This chapter discusses the model implementation patterns and functional groups resulting from the analysis of the specific models. The functional groupings provided a finer grain decomposition of the functions defined in the scenario usecases. The constructive analysis resulted in the definition of the top-level models along with their associated profiles. Additional constructive analysis of the use-case profiles yielded the top-level profile. This overall, repeatable, approach was defined in the constructive procedure. Metrics were recorded to assess the resulting commonality across models. Lastly, the results of the animation application will be discussed.

Pattern analysis
The two types of patterns of interest were structural static patterns and behavioral patterns. The static patterns described the underlying architectural data structures of the forensic scenario. The behavioral patterns described the scenario and the resulting control flow as dictated by the operating system or applications, per the scenario. The patterns were derived from analysis of the implementation models.

Forensic static patterns (data structures).
There were six types of (non-unique) static structures which were identified from the analysis of the specific models. Structure was dictated by the classes and the associations between classes. The static structures included:  Single Elements.
 Pointers to Lists.
 Amplification of additional data.
These static structures were further refined by considering the behavioral operations that could be performed on these structures and are shown in Table 4-1 as the forensic static patterns. The table identifies the pattern name, the specific structure utilized for the pattern, whether or not that the elements in the pattern needed to be ordered, and the behavior aspects of the patterns. The behavior of the pattern was defined by the operations that could be performed on the classes of the structure, such as being able to add or delete elements or to read/write elements. It also should be noted that each of these patterns could be implemented with or without nested data types. The nested data types provided additional flexibility when modeling complicated data structures. Single elements structure. The single elements contain one or more key-value pairs, as shown in Figure. 4-1.
The key of the key-value pair has a unique string that is associated with a value that is of type string. The single element structure has the characteristics that it typically does not have multiple instantiations and is static, that is, the number of key-value pairs cannot be added or deleted.  This example was the SUEN pattern which was mapped to the table from [55]. The number of attributes did not change (i.e., it remained static). In addition, these values were startup values and were not modifiable. The unordered list is a collection of 1-to-many of the associated type Node.
There was no ordering of elements imposed by this structure. Data structures in which order does not matter would utilize this model representation.
In other cases, the list needs to be an ordered. The ordered list contained an association of the collection class to one initial entry element. From this initial element, the next Node can be reached via the next association. This is analogous to a linked list. There is an inherent order in this representation enforced by the structure of the model. As an example, the bitmap data structure was modeled as an ordered list to allow the next element in the sequence to be selected. Specifically, in the EXT block bitmap, when allocating blocks to a file it was optimal to choose the blocks sequentially to prevent fragmentation.
If the collection membership needs to change, the operations to either add or delete Nodes are required. An example of a list that never change size is the FAT table. An example of a list that requires new members to be added or deleted from the collection is the list of operating system processes in a process list.
Also of note was that this pattern consisted of two types of constraints. In the      The dynamic patterns in this work were utilized to show the control flow of the scenario. The thread of control was scripted to the specific scenario and was at a very abstract level. The three primary dynamic patterns (shown in Figure 4-11) seen in this work were:     These functions are operating system functions. The RetrieveFSMetaData retrieves the file system metadata. The operating system utilizes this information to determine how to interact with the file system, such as determining the location of file system data structures. AllocateToDUStorage either reads or allocates contents to memory. The DetermineStorageLocation, determines the data units to which file contents are to be written. The AccessDirectory function accesses the directory to access the meta data of the file. Table 4-3 maps the file system use-case activities/functions to the supporting functional grouping functions.    files.
The file system representation in the browser model did not need to be modeled to the same level of detail that was required for the file system use-case. The browser model needed only show how the file system could be traversed to gain access to the specified file.     The following sections identify the requisite analysis and validation in determining the top-level models and the associated profiles.

File systems
The top-level model for file systems was based on the file system functional groupings. Table 4    The operating system dictated the control flow for the file system scenario.
The behavioral states and definitions are shown in Table 4-8. The state machine is shown in Figure 4-17. The states represented the set of steps to perform the scenario based on a file "save as" event or a "delete file" event. The "save as" event was initiated at the word processing application. The delete event was the result of moving the file to trash. In the file system scenario, the state machine went through a sequential set of steps for allocation. Based on an operator-initiated delete, the state machine also traversed through a sequential set of states. These states were the same across all the implementations, however, the ordering of the states differed.  The key profile elements for the file system static profile are identified in Table 4-9. The static profile was derived from the top-level static model and is shown in Figure 4-18. The profile stereotypes were consistent with the top-level classes which represented the forensic data structures. In addition, the data types that were consistent with the forensic attributes were utilized to define the data types of the tags for the stereo types. The tag data types (see Figure 4-19) were categorized as related to either evidence, operating system, and applications. In addition, a time data type is identified along with enumerated types for file system identification. Constraints were added to ensure unique elements in collections and to address transitive closure to prevent infinite loops. The dynamic profile for the states of the state machine is depicted in Figure 4-20.

Browsers
The browser scenario was at a higher level of abstraction than the file system use-case. The same model was used across all three implementations; the differences were with respect to the ways in which the scenarios were initially configured. With regard to the file system, file paths and files names were different across implementations. Additionally, databases, tables, and table contents had different configurations between implementations.
The mapping of the browser functional groups and the pattern implementation is shown in Table 4-10. The browser top-level class diagram is shown in Figure 4   The state machine states and their descriptions are in Table 4-11. The application state changed based on one of three transition events:  Download: An artifact has been down loaded.
 History update: Change in URL.
 Cookie update: Cooking information is stored.
After the transition change has been completed the application goes back to its typical state, ActionProcessed. This is the typical state, waiting to process a user request HistoryUpdated When a URL is visited the history is updated CookieUpdated When cookies are enabled and a site is visited which utilizes cookies, cooking information is updated DownloadUpdated When a user performs download, the download information is updated The key profile elements for the Browser profile are identified in Table 4-12.
The static profile was derived from the Browser top-level static model and is shown in Figure 4-24. The profile stereotypes were consistent with the top-level classes that represented the forensic data structures. The profile also identified OCL constraints.
In addition, the data types that reflected the forensic attributes were utilized to define the data types of the tags for the stereotypes. The tag data types, shown in Figure 4-25, were categorized as related to either evidence, operating system, or application. In addition, data types for time and an enumeration for browser types were identified.
Constraints were added to ensure unique elements in collections and to address transitive closure to prevent infinite loops. The dynamic profile for the behavior states is depicted in Figure 4-26.  Table  Class  Table  Stereotype Class Contains Row(s).

Row Class Row Stereotype Class
Contain data and has meta data on row.

RAM
The RAM scenario was at a higher level of abstraction than the file system use-case. Due to the lack of available information for implementation specifics, the class and state machines were a high level, generic, abstract, textbook-configuration that represented all the configurations. The top-level dynamic and static models for RAM were the same as the implementation specific models. The differences between the models were in the terminology used to describe the specifics of the process and network evidence. These difference were reflected in the data types. The patterns associated with the functional areas, applications, and OS are shown in Table 4-13.   to add a process (e.g., application starting up), or  to add a socket (e.g., application establishes a network connection).
These actions occurred when a new application instance was created. The application had a state machine, and depending on the initial startup parameters, was either in a network enabled state or not. The OS states and descriptions are listed in Table 4-14 and the application state machine states and descriptions are listed in Table   4   Operating system waiting state ProcessAdded Process is added to process list SocketAdded Process is added to process list and a socket handle is created  Application not initialized Executing Application is initialized and is not a network enabled ExecutingAsNetworkClient Application is initialized and is network enabled The key profile elements for the RAM profile are identified in Table 4-16. The static profile was derived from the RAM top-level static model and is shown in Figure   4-31. The profile stereotypes were consistent with the top-level classes which represented the forensic data structures. In addition, the data types that reflected the forensic attributes were utilized to define the data types of the tags for the stereotypes.
The tag data types, shown in Figure 4-32, were categorized as related to either evidence, operating system, or application. In addition, data types for time and an enumeration for Application types were identified. Constraints were added to ensure unique elements in collections and to address transitive closure to prevent infinite loops. The behavior profile for the OS states is depicted in Figure 4-33 and the behavioral profile for the Application behavior states is depicted in Figure 4-34.

Model equivalence
The analysis determined that the top-level models were equivalent to the implementation specific models in all three forensic areas of this work.
File systems. The file system top-level model was equivalent to the file system implementation specific models. The results for the model equivalent scripts are shown in Table 4-17 for the file system area. The only significant anomalies were due to the differences in when time was recorded across the file systems, specifically with regard to the type of time stamps (e.g., modify, access, create, write) and where the time stamps were located (e.g., at the file entry level or the parent directory level).
For the implementation differences, the forensic attribute for time was at a too low level of abstraction. If the time attribute was deemed to be of significant interest, it would need to be re-evaluated and the appropriate level of abstraction would need to be defined. For instance, perhaps the time on file updates might be of interest.
There were also some minor difference in the order in which the forensic attributes were generated, however this had no effect on the forensic attribute values. Browsers. The browser top-level model was equivalent to the browser implementation specific models. Since the specific models and top-level models were the same, the differences in the model were in the scenario configuration of the file system and data base models in the browser implementation.
There were a few instances where the location of these attributes could not be found in the existing documentation. It was verified that the relevant information does exist in the browser, however, the available documentation in the proprietary implementation was not readily available. In addition, there was slight variance in how the implementations stored the information as variables in the data base. For example, in the download forensic attributes, for the name of the downloaded document, one browser included the path in its attribute and another implementation used separate variables for name and for path. Overall, there was an approximately 94% mapping between each of the specific implementation attributes and that of the browser forensic attributes.

RAM.
The RAM top-level model was equivalent to the RAM implementation specific models. Since the specific models and top-level models were the same the differences in the model were in the scenario configurations with regard to the process attributes and network attributes. As with the browsers, the documentation for the specific proprietary implementations was incomplete. However, there was only one instance in which the attribute name could not be determined. Overall, there was an approximately 95% mapping between each of the specific implementation attributes and that of the RAM forensic attributes.

Digital forensic profile constructive analysis (step 8)
The static digital forensic profile was constructed by integrating the forensic use-case profiles. Since the behavior profiles were use-case specific they did not   Evidence data types supporting profile stereotypes.
The operating systems were common across the categories. As a result, the OS stereotypes were transferred from the area profiles to a new OS profile as shown in   showing maximum commonality, which was as expected. On the other hand the file system use-case shows less than 100% commonality. Investigating the file system use-case further, the ICM was applied to three aspects of the file system use-case implementations, the application, the operating system, and the functional groups.
The calculation for these aspects were:  Operating System: 1  Application: 1  Functional group: .72 The operating system and applications were common across implementations.
Further investigation of applying ICM case 2 to the functional groups is shown in

Application analysis
Visual animation description. The GUI depicted the application, operating system, and the various file system data structures, which included, directories, FAT, boot sectors, and clusters. The display, without detailed information, is shown in The animation showed the component that determined the thread of control and the data structure which was the focus of the thread of control for a given time.
The thread of control was typically an application or operating system component.  Expert review. It should be noted that the examiners were exposed only to the application and not the other aspects of this work. As they observed a demonstration of the application they were asked to comment on the following:  Would this type of animated application be useful (as illustrated by the FAT file allocation/deallocation) from an educator/student or analyst stakeholder role?
 Would this type of animated application be useful for other digital forensic stakeholder roles?
The comments provided by the expert reviewers are summarized in Table 4-19.
Comments were received on following topics: domain complexity, level of detail, model sources, roles, and how to improve the tool. Roles Judge/Jury: There is too much detail in the application for a judge and jury 2

Roles
Trained analyst already has accepted forensic tools. 2

Roles
Did not see how would be useful for analysis, but good for teaching. 1 Application Improvement Adjustments to displays recommended to facilitate in understandability (running list of previous steps, "not a fan of the arrows"). Application improvement. The recommended improvements would be useful from both an educational perspective and potentially to assist an expert analyst.

Analysis of comments.
In sum, it was found that modeling could be used to develop applications that would be of use to an educator. The feedback from experts in the field was mixed about whether the tool would assist those in other forensic roles.

Findings summary
The overall findings address what was found during implementation, test, analysis and expert review. The associated finding for the objectives are as follows: Facilitate learning and analysis application. An application was developed to provide an animation utilizing artifacts generated by the model. The detail pattern was utilized to provide additional detail for the more technically advanced stakeholders.
An issue that came up during the expert review, which was also encountered in the development of the specific models, was determining the authenticated sources to In the other extreme case, the source code is available and therefore everything is known. However, this is at a much lower level of abstraction that needs to be understood, which requires significant research to identify what is important from a forensic perspective. Regardless, research needs to be conducted before a model can be created.
The expert review identified that the animation application could be useful in an educational setting. The approach was not as positively received for an analyst stakeholder roles, but there were some thoughts on how the approach may be useful to the analyst. There were also concerns with regard to other stakeholder roles which would need to be addressed.
It can be concluded that modeling can have the potential to facilitate applications to enhance digital forensic understanding. The question that this work did not fully satisfy was that multiple stakeholder roles could be addressed by providing different abstraction levels.
Reduced complexity. The four findings which relate to the complexity objective for the constructive method are as follows:  There were nine static patterns and three dynamic patterns found.
 Top-level models were developed for the three use-cases. The metrics suggest that there is an inverse relationship between commonality and complexity, as measured by the level of abstraction between the top-level model and implementation models. In addition, the commonality in model implementations which was quantified by the metrics, is suggestive that there is commonality in the actual computational mechanisms which were the subject of the models.
The degree of abstraction increased as the level of commonality increased. This is seen with browser and RAM use-cases which were modeled at a higher level of abstraction than the file system model and substantiated with the TLAM metric.
There were commonalities identified across model implementations within use-cases and across use-cases. The ICM provided a relative measure to identify additional areas in which to investigate potential common functions across implementations.
Extensibility. Extensibility of the constructive method was verified in that its steps were repeatable. Extensibility of a top-level model was shown within each usecase through the addition of three implementations. The addition of three use-cases demonstrated the extensibility of the digital forensic profile.

DF UML profile.
A digital forensic profile was constructed from this work.
The elements of the profile can be traced to implementation details. The profile was systematically constructed from top-level models that were constructed from implementation-specific executable models.

Summary
Digital forensics is complex in that there are a combinatorically explosive number of types and configurations of digital devices and the components of which they are composed. In addition, there is a wide range of technical skill required of digital forensic stakeholders, depending on their roles in their organizations. This work utilized software engineering techniques based on executable models to identify two approaches to manage these digital forensic complexities. The utility of the top-level models or the utility of profiles was not evaluated with digital forensic stakeholders. This work focused on identifying why and how software engineering modeling concepts could be employed to digital forensics.
Assessing the potential utility and impact of applying modeling in the digital forensic domain was beyond the scope of this work.
There are a combinatorically explosive number of digital forensic scenarios along with a practically uncountable number of configurations. As such, this number of scenarios addressed by this work was infinitesimally small. This work focused on the computational mechanisms within one device in one forensic area at a time. It did not investigate more complex scenarios utilizing multiple devices. In addition, addressing virtual environments would be of interest. Additional work will need to be completed to determine whether the approach in this work can be scaled up to address additional implementations, use-case scenarios, and forensic areas.
The implemented application only showed one aspect of how modeling can be used from an application perspective, a simple animation. Implementing a more sophisticated animation on additional model scenarios would provide additional insight into the utility of this approach.
The focus of this work was to show how evidence was created. There could be other aspects of digital forensics in which modeling could be used. For example, the information in the model could be used to reason about evidence validity.
Patterns were only touched on in this work and only simple patterns were identified. To gain a better understanding of commonality, all significant data structures used in implementations would need to be represented in a model.
The metric results are suggestive that commonality and complexity are inversely proportional. This result was not rigorously proven.

Next Steps
This work provided a starting framework for applying software engineering modeling techniques to digital forensics. The next steps would include:  Assess whether the constructive method scales up.
 Evaluate the utility of top-level models and profiles with digital forensic stakeholders.
 Add enhanced visualization techniques to target specific stakeholders.
 Determine if modeling can support other types of applications.
 Validate that complexity and commonality are inversely proportional.
 Address additional stakeholder roles.
A variety of additional use-cases would need to be modeled to assess whether the approach scales-up. The constructive method could be assessed for extensibility as follows:  Add additional forensic areas and associated use-cases,  Add implementations to an existing use-case scenario, and  Model a different level of abstraction for an existing use-case.
As Given the top-level models and resultant profiles generated from this work or follow-on top-level models and profiles developed by assessing the constructive method, the utility of the top-level models and profiles will need to be addressed. The utility could be addressed in an educational scenario or in using the profiles to generate digital forensic models.
Further investigation of how applications could facilitate understanding for digital forensic stakeholders is warranted. As an example, more advanced visualization techniques may have utility. In addition, there may be other applications of modeling, other than understanding the who, what, when, where, and how of evidence creation may be of interest. For example, the model provided relationships between elements and from that perspective the model is equivalent to a database.
Applications can be developed to provide a front-end query to generate reports of information of interest which may be beneficial in an investigation. Another use is to develop applications utilizing artificial intelligence techniques to reason about the evidence.
The relationship between the TLAM and ICM metrics should be further investigated to determine if the claim that commonality and complexity are indeed inversely proportional. A proof should be provided to substantiate the claim.
All future investigations should reach out to stakeholders to incorporate their perspectives. For example:  Legal stakeholders, to identify the potential benefit models could provide in presenting a case. In addition, insight could be gained into the ways in which visualization techniques may or may not help in the courtroom.
 Law Enforcement stakeholders (e.g., State Police, detectives), to understand the aspects of the evidence that are important and to understand the types of applications that would be of use for training purposes.
 Educational stakeholders, to gain an understanding of whether models could be tools for educational purposes. Moreover, these individuals could provide evidence to help develop an understanding of whether formalism allows for an exchange of ideas between researchers.
 System Analyst stakeholders, to gain an understanding of whether model formalism could help address their comprehension of the underlying systems or as a means to be able to exchange information. Additionally, these stakeholders could provide valuable insight into the applications that would be of use. Finally, these stakeholders would provide information to help develop an understanding of the authoritative sources that could be utilized.
 Tool Vendors stakeholders, to gain an understanding of whether models could be used as a method to standardize descriptions of digital forensic devices and components. Additionally, these stakeholders could provide information to help to develop an understanding of the utility in developing digital forensic model standards that would be beneficial for the digital forensic tool community.

Contributions of this work
The work provided insights on how modeling would benefit digital forensic in addressing some of the complexities of the field. The unique contributions of this work consisted of the following:  5. Identified metrics to support commonality and abstraction analysis, and 6. Introduced a digital forensic profile along with a process to extend the profile.
The models developed represented the computational mechanisms for the creation of digital forensic evidence. Since the models were executable, the model's behavior could be captured and utilized by an application to animate the model's behavior.
A repeatable constructive methodology was identified to develop and test abstract top-level models from implementation specific models. The constructive method was also utilized to construct profiles. In developing these models, implementation commonality patterns were identified for the modeled computational mechanisms.
The ICM and TLAM metrics were introduced to assess commonality and abstract between top-level and implementation specific models.  The languages of interest, OAL and OCL are model aware in that they can access the model elements. These languages refer to the elements of a class diagram.
The modeling of this work consists of navigating across class or collection class elements to either read or modify attributes. Based on the attribute values the control flow is identified for the next navigation.
OCL is not a programming language but a specification language. OCL can to change the model, it can only return values. OCL is used as either a query language or to specify invariants on classes.
Action languages, such as, OAL are able to instantiate classes, changes class attributes, control state transitions and provide rudimentary constructs to implement functions. The action language can provide the control flow.  An UML association is analogous to the formal concept of a relation. The association shows some type of relationship between two classes and their associated objects. The association contains the arrow which shows how the association is being navigated. The objects of the class in which is being navigated from contain the objects of the domain of definition. The class which is being navigated to are the image. The multiplicity of the association from the source class will be 1 and on the target class will be either 1 or * (meaning 0 or more). In the example, the association can be represented in the relation notation as:

Set Comprehension and Relations.
{(x1 , y1), (x1 , y2), (x1 , y3)} Where x1 is the object being navigated from and y1 , y2 and , y3 are the objects being navigated to. In navigation it is often needed to select a particular object in a collection. For example, see Figure B-3. In this example. The starting context is class X and the objective is to select the object from class Z where key has a value of two. See Figure   B-4 for the OCL representation.