Applying QbD and Pat in Biological Manufacturing for “ Continued Process Verification ”

The objective of this research topic is to show QbD and PAT tools such as multivariate analysis can perform “Continued Process Verification by using a RealTime Multivariate Process Monitoring (RT-MSPM) system. There are not one but many challenges The pharmaceutical and bio-pharmaceutical manufacturers are facing multiple challenges such as changing regulatory requirements, healthcare reforms, economic pressure and availability of advance manufacturing technology to make better quality products at reduced costs. Due to the recent technological developments, significant opportunities exist for improving pharmaceutical development, manufacturing and quality assurance through innovation in product and process development, process analysis, and process controls. The latest FDA guidelines such as QbD, PAT and the 2011 process validation have opened the doors for “Real-Time Process Monitoring” concepts for “Continued Process Verification”. The regulatory agencies have taken the initiative by providing guidelines in last ten years such as, Pharmaceutical cGMPs for the 21st Century A Risk Based Approach, Final Report in September 2004 [1], Guidance for Industry: PAT A Framework for Innovative Pharmaceutical Development, Manufacturing, and Quality Assurance [2], Guidance for Industry Quality Systems Approach to Pharmaceutical Current Good Manufacturing Practice Regulations [3], Internal Commerce for Harmonization (ICH) guidelines [4, 5, 6, and 7], QbD, a perspective from the “Office of Biotechnology Products” (OBP) [8] and lastly, Guidance for Industry Process Validation General Principles and practices utilizing three stages during Process Validation [9]. The objective of agency is to ensure that the most up-to-date concepts of risk management and quality systems approaches are incorporated into the manufacturing. The application of multivariate statistical models for process monitoring can provide information on the challenges that are routinely encountered by drug manufacturers and process can be monitored in real-time to achieve continued process verification (CPV). The outcome of the study is intended to become a benchmark for biological manufacturers who are interested in applying the “PAT tools” for existing legacy products or any new manufacturing process to address challenges [10, 11] such as raw material variation and control of process variability, identifying and monitoring of relevant process parameters in the operating space, RT-MSPM with early fault detection and diagnosis of process upsets and trends. PCA (Principal Component Analysis (PCA) and PLS (Projection to Latent Structure (PLS) are the two popular techniques are used to create the multivariate (MV) models. MV statistical models for process monitoring are used in this study to address the challenges in biologics manufacturing process such as raw material variation and control of process variability, identification and monitoring of relevant process parameters in the operating space and RT-MSPM for Early Detection and Diagnosis of Process Upsets and Trends. The implementation of RT-MSPM assists in meeting the latest process validation guidance requirement to achieve continued process verification (CPV) by monitoring each and every batch in real time. With the use of RT-MSPM tool, every run can be considered as a process validation run. If the process is monitored in real time then the sampling frequency can be reduced significantly, which can result in tremendous cost saving.

The application of multivariate statistical models for process monitoring can provide information on the challenges that are routinely encountered by drug manufacturers and process can be monitored in real-time to achieve continued process verification (CPV). The outcome of the study is intended to become a benchmark for biological manufacturers who are interested in applying the "PAT tools" for existing legacy products or any new manufacturing process to address challenges [10,11]    PAT is a system for designing, analyzing, and controlling manufacturing through timely measurements of critical quality and performance attributes of raw and in-process materials and processes with the goal of ensuring final product quality.
The term analytical in PAT is viewed broadly and includes chemical, physical, microbial, mathematical, and risk analysis conducted in an integrated manner. The goal of PAT is to enhance understanding and control the manufacturing process. Quality cannot be tested into products; it should be built-in or should be by design. Consequently, the tools and principles described in this guidance should be used for gaining process understanding and can also be used to meet the regulatory requirements for validating and controlling the manufacturing process [2].
Using the approach of building quality into products, PAT guidance highlights the necessity for process understanding and opportunities for improving manufacturing efficiencies through innovation and enhanced scientific communication between manufacturers and the agency. Increased emphasis on building quality into products allows more focus to be placed on relevant multi-factorial relationships amongst material, manufacturing process, environmental variables, and their effects on quality. This enhanced focus provides a basis for identifying and understanding relationships among various critical formulation and process factors and for developing effective risk mitigation strategies (e.g., product specifications, process controls, training, etc.). The data and information to help understand these relationships can be leveraged through preformulation programs, development and scale-up studies, as well as from improved analysis of manufacturing data collected over the life of a product.
A desired goal of the PAT framework is to design and develop well understood processes that will consistently ensure a predefined quality at the end of the manufacturing process.
Such procedures would be consistent with the basic tenet of QbD and could reduce risks to quality and regulatory concerns while improving efficiency. Gains in quality, safety, and/or efficiency will vary depending on the process and the product, and are likely to come from reducing production cycle times by using on-line, in-line, and/or at-line measurements and controls ; preventing rejects, scrap, and re-processing ; real time release; increasing automation to improve operator safety and reduce human errors ; improving energy and material use and increasing capacity; facilitating continuous processing to improve efficiency and manage variability.
This guidance facilitates innovation in development, manufacturing and quality assurance by focusing on process understanding. These concepts are applicable to all manufacturing situations.
Process Understanding [2,8] A process is generally considered well understood, when variability from batch to batch is explained, a good run from bad run is predicted, and all the factors that can alter quality, are accounted for and are understood.
A focus on process understanding can reduce the burden for validating systems by providing more options for justifying and qualifying systems intended to monitor and control biological, physical, and/or chemical attributes of materials and processes.
Structured product and process development on a small scale, using experimental design and on-line or in-line process analyzers to collect data in real time, can provide increased insight and understanding for process development, optimization, scale-up, technology transfer, and control. Process understanding then continues in the production phase when other variables (e.g., environmental and supplier changes) may possibly be encountered.
Therefore, continuous learning over the life cycle of a product is important.
Real-time multivariate statistical process monitoring provides a means to proactively monitor this overall process variability. It build the necessary foundation towards predictive monitoring which is aligned with the regulatory agency expectation on risk management and continual process improvement post-commercialization [11,12].
Principles of PAT [12] Pharmaceutical manufacturing processes often consist of a series of unit operations, each intended to modulate certain properties of the materials being processed. To ensure acceptable and reproducible modulation, consideration should be given to the quality attributes of incoming materials and their process-ability for each unit operation.
During the last three decades, significant progress has been made in developing analytical methods for chemical attributes (e.g., identity and purity). However, certain physical and mechanical attributes of pharmaceutical ingredients are not necessarily well understood.
Consequently, the inherent, undetected variability of raw materials may be manifested in the final product.
Establishing effective processes for managing physical attributes of raw and in-process materials requires a fundamental understanding of attributes that are critical to product quality. Such attributes may pose a significant challenge because of their complexities and difficulties related to collecting representative samples. Since the Formulation design strategies are not generalized, the quality of these formulations can be evaluated only by testing samples of in-process materials and end products.
Currently, these tests are performed off line after preparing collected samples for the analysis. Different tests are needed because they only address one attribute of the active ingredient following sample preparation (e.g., chemical separation to isolate it from other components). During sample preparation, other valuable information pertaining to the formulation matrix is often lost.
Several new technologies are now available that can acquire information on multiple attributes with minimal or no sample preparation. These technologies provide an opportunity to assess multiple attributes, often nondestructively.
Appropriate use of PAT tools and principles (described below) can provide relevant information relating to physical, chemical, and biological attributes. The process understanding gained from this information will enable process control and optimization, address the limitation of the time-defined end points discussed above, and improve efficiency.
Process Analytical Technology Tools [2] There are many new tools available that enable scientific, risk-managed pharmaceutical development, manufacture, and quality assurance. These tools, when used within a system can provide effective and efficient means for acquiring information to facilitate process understanding, develop risk-mitigation strategies, achieve continuous improvement, and share information and knowledge.
Producing a product consistently rests on four key areas of technology: multivariate data analysis, process analyzers, process automation/control and knowledge management.
When all of these ingredients are added to the mix, powerful solutions can be realized.
Typically, collecting information from sensor and instruments is not complicated. Servers are bursting with data about processes. However, getting the process engineer the information he or she needs requires intensive IT involvement.
Even more importantly is getting access to this data in real time to make decisions about quality.
In the PAT framework, these tools can be categorized as: Multivariate (more than one variable) data acquisition and analysis II. Modern process analyzers or process analytical chemistry tools III. Process and endpoint monitoring and control tools IV. Continuous improvement and knowledge management tools An appropriate combination of some, or all, of these tools may be applicable to a singleunit operation, or to an entire manufacturing process and its quality assurance.
Multivariate (more than one variable) data acquisition and analysis: From a physical, chemical, or biological perspective, pharmaceutical products and processes are complex, multi-factorial systems. There are many development strategies that can be used to identify optimal formulations and processes. The knowledge acquired in these development programs is the foundation for product and process design. Methodological experiments based on statistical principles of orthogonality, reference distribution, and randomization; provide effective means for identifying and studying the effect and interaction of product and process variables. Traditional one-factor-at-a-time experiments do not address interactions among product and process variables.
When used appropriately, these tools enable the identification and evaluation of product and process variables that may be critical to product quality and performance. The tools may also identify potential failure modes and mechanisms and quantify their effects on product quality.
Modern process analyzers or process analytical chemistry tools: Process analysis has advanced significantly during the past several decades, due to an increasing appreciation for the value of collecting process data. Industrial drivers of productivity, quality, and environmental impact have supported major advancements in this area. Available tools have evolved from those that predominantly take univariate process measurements, such as pH, temperature, and pressure, to those that measure biological, chemical, and physical attributes. Indeed some process analyzers provide nondestructive measurements that contain information related to biological, physical, and chemical attributes of the materials being processed. These measurements can be At-line: Measurement, On-line Measurement and In-line Measurement.
Process analyzers typically generate large volumes of data. Certain data is likely to be relevant for routine quality assurance and regulatory decisions. In a PAT environment, batch records should include scientific and procedural information indicative of high process quality and product conformance. For example, batch records could include a series of charts depicting acceptance ranges, confidence intervals, and distribution plots (inter-and intra-batch) showing measurement results. Ease of secure access to these data is important for real time manufacturing control and quality assurance. Installed information technology systems should accommodate such functions.
Measurements collected from these process analyzers need not be absolute values of the attribute of interest. The ability to measure relative differences in materials before (e.g., within a lot, lot-to-lot, different suppliers) and during processing will provide useful information for process control. A flexible process may be designed to manage variability of the materials being processed. Such an approach can be established and justified when differences in quality attributes and other process information are used to control (e.g., feed-forward and/or feed-back) the process.
The advances in process analyzers made the real time control and quality assurance during manufacturing feasible. However, multivariate methodologies are often necessary to extract critical process knowledge for real time control and quality assurance.
Comprehensive statistical and risk analyses of the process are generally necessary to assess the reliability of predictive mathematical relationships. Based on the estimated risk, a simple correlation function may need further support or justification, such as a mechanistic explanation of causal links among the process, material measurements, and target quality specifications. For certain applications, sensor-based measurements can provide a useful process signature that may be related to the underlying process steps or transformations. Based on the level of process understanding, these signatures may also be useful for process monitoring, control, and end point determination when these patterns or signatures relate to product and process quality.
Design and construction of the process equipment, the analyzer, and their interfaces are critical to ensure that collected data are relevant and representative of process and product attributes. Robust design, reliability, and ease of operation are important considerations.
Installation of process analyzers on existing process equipment in production should be done after risk analysis to ensure this installation does not adversely affect process or product quality.
A review of current standard practices (e.g., ASTM International) for process analyzers can provide useful information and facilitate discussions with the Agency. A few examples of such standards are listed in the bibliography section. Additionally, standards forthcoming from the ASTM Technical Committee E55 may provide complimentary information for implementing the PAT Framework. We recommend that manufacturers developing a PAT process consider a scientific, risk-based approach relevant to the intended use of an analyzer for a specific process and its utility for understanding and controlling the process.
Process and endpoint monitoring and control tools: It is important to emphasize that a strong link between product design and process development is essential to ensure effective control of all critical quality attributes.
Process monitoring and control strategies are intended to monitor the state of a process and actively manipulate it to maintain a desired state. Strategies should accommodate the attributes of input materials, the ability and reliability of process analyzers to measure critical attributes, and the achievement of process end points to ensure consistent quality materials and the final product.
The design and optimization of drug formulations and manufacturing processes within the PAT framework can include steps such as identifying critical attributes, measurement of the critical attributes, design process control to monitor and maintain these attributes within the operating space.
Within the PAT framework, a process end point is not a fixed time; rather it is the achievement of the desired material attribute. This, however, does not mean that process time is not considered. A range of acceptable process times (process window) is likely to be achieved during the manufacturing phase and should be evaluated, and considerations for addressing significant deviations from acceptable process times should be developed.
Where PAT spans the entire manufacturing process, the fraction of in-process materials and final product evaluated during production could be substantially greater than what is currently achieved using laboratory testing. Opportunities need to be identified to improve the usefulness of available relevant product and process knowledge during regulatory decision making. A knowledge base can be of most benefit when it consists of scientific understanding of the relevant multifactorial relationships (e.g., between formulation, process, and quality attributes) as well as a means to evaluate the applicability of this knowledge in different scenarios (i.e., generalization). Today's information technology infrastructure makes the development and maintenance of this knowledge base practical.

Process Validation Guidance
A typical biologics manufacturing process starts with inoculation phase and end up into final product which is distributed to patients as shown in Figure 1 [25]. This process involves several upstream unit operations such as series of cell culture bioreactors, centrifuges, filtration steps and downstream unit operations such as chromatography, ultra-filtration and diafiltration (UF/DF), viral inactivation, etc.
Traditional Process Validation Approach: Per 21 CFR Parts 210 and 211, and of the Good Manufacturing Practice Regulations for Medical Devices, 21 CFR Part 820 [15], every pharmaceutical or biologics manufacturing organization has to go through rigorous testing and qualification phase before they seek approval for large scale manufacturing of drug substance or drug product. The term "qualification" and "validation" are separate terms but are interchangeably used in the industry [16]. The FDA's definition of validation is "Establishing documented evidence that a process or system, when operated within established parameters can perform effectively and reproducibly to produce a medicinal product, that meets its pre-determined specifications and quality attributes [17]".
In other words, each and every piece of equipment used in the manufacturing facility needs to undergo Installation, Operational and Performance Qualification processes to meet the guidance. The automated and computerized systems needs to go through "Software Validation [16] and Part 11 Compliance for electronics records and electronic signature validation [18] to ensure that the data inputs and outputs of these systems are secured and trust worthy just like paper records. It ensures that all the critical equipment is installed correctly; operate with the operating ranges and performs within the acceptable criteria.
Upon completion of above mentioned validation process, the process validation is performed. Process validation is a federal requirement therefore; it is applicable to all the manufacture of pharmaceuticals and medical devices. Per "Guideline on general principles of process validation, May 1987", manufacturing processes needed to be validated. Assurance of product quality was derived from careful attention to a number of factors including selection of quality parts and materials, adequate product and process design, control of the process, and in-process and end-product testing [17].
As stated in old process validation guidance, the manufacturers needed to perform confirmation runs a.k.a process validation runs to prove that the process is capable of effectively meeting the key and critical process parameter acceptance criteria. The key and critical operating parameters were within operating ranges and the process was able to generate the product in a controlled manner. The analytical assays tested the incoming raw materials; in process material and finished product material to ensure that that they meet the specifications. The guidelines suggested establishing robust test protocols to specify "A sufficient number of replicate process runs to demonstrate reproducibility and provide an accurate measure of variability among successive runs [17]". It did not exactly specify how many runs. The manufacturing industry started performing three process validation runs and soon it became a norm. The standard industry practice became three consecutive process validation runs.  [7]. Although; this guidance does not repeat the concepts and principles explained in other guidance's, FDA encourages the use of modern pharmaceutical development concepts, quality risk management, and quality systems at all stages of the manufacturing process lifecycle [9].
Per this new guidance, manufacturers are required to adopt the lifecycle approach by performing the process validation activities in three stages [9] after completing the equipment and facility qualification.
The three stages in the lifecycle approach outlines the development phase where the product knowledge and process understanding is gained to establish the operating space.
The stage 1 is linked to process qualification stage for process validation. Utilizing the stage 1 and 2, the new expectation is to perform the continued process verification (CPV) to ensure that process remains in the control and consistently make the quality product.
Three stages of process validation are outlined below: In Stage 1, process design, the commercial process is defined based on knowledge gained through development and scale-up activities. II.
In Stage 2, process qualification, the process design is evaluated and assessed to determine if the process is capable of reproducible commercial manufacturing. III.
In Stage 3, continued process verification, ongoing assurance is gained during routine production that the process remains in a state of control.
Pre-Requisites of PAT Implementation [2]: In order to get PAT on a more practical and operational level, we can list a number of prerequisites: Infrastructure: Automated data acquisition systems, databases, networks, and synchronization procedures must be in place. The greatest hurdle involved in almost any analysis is generation, integration and organization of data. This is particularly true for the pharmaceutical industry where data are often stored in vast warehouses but rarely, if ever, retrieved and used. Past regulatory environments did not provide incentives for analysis of manufacturing processes because implementing improvements required revalidation and the current condition of pharmaceutical data infrastructures reflects this.
As a result large efforts are required to assemble meaningful datasets. This challenge is further complicated given that laboratory and production data are scattered in various Multivariate characterization: Adequate and informative data must be measured on all steps and ingredients of the process.
Multivariate evaluation of all data: All data should be analyzed together. The data analysis should not focus on variable selection, should not be univariate in nature, and should not involve methods with many adjustable parameters which are prone to over fit.
The data analysis phase should entail simple, transparent, informative, and reversible projection models.
Data and information integration and communication: All data flows and data bases should be integrated onto one common platform. This facilitates use of data, visualization of data, and communication of results.
Design of Experiment (DOE): A suitable use of DOE combined with some of the steps above can augment the analysis and help to ensure that critical system parameters are varied together in a simultaneous to get the optimum information from the experiments.
Strategy for Implementation of PAT [2] The Agency understands that to enable successful implementation of PAT, flexibility, coordination, and communication with manufacturers is critical. The Agency believes that current regulations are sufficiently broad to accommodate these strategies.
Regulations can effectively support innovation when clear, effective, and meaningful communication exists between the Agency and industry, for example, in the form of meetings or informal communications.
The first component of the PAT framework described above addresses many of the uncertainties with respect to innovation and outlines broad principles for addressing anticipated scientific and technical issues. This framework should assist a manufacturer in proposing and adopting innovative manufacturing and quality assurance. The Agency encourages such proposals and has developed a regulatory strategy to consider such proposals.
Ideally, PAT principles and tools should be introduced during the development phase.
The advantage of using these principles and tools during development is to create opportunities to improve the mechanistic basis for establishing regulatory specifications.
Manufacturers are encouraged to use the PAT framework to develop and discuss approaches for establishing mechanistic-based regulatory specifications for their products. The recommendations provided in this guidance are intended to alleviate concerns with approval or inspection when adopting the PAT framework.
In the course of implementing the PAT framework, manufacturers may want to evaluate the suitability of a PAT tool on experimental and/or production equipment and processes.
For example, when evaluating experimental on-or in-line process analyzers during production, it is recommended that risk analysis of the impact on product quality be conducted before installation. This can be accomplished within the facility's quality system without prior notification to the Agency. Data collected using an experimental tool should be considered research data. If research is conducted in a production facility, it should be under the facility's own quality system.
When using new measurement tools, such as on-or in-line process analyzers, certain data trends, intrinsic to a currently acceptable process, may be observed. Manufacturers should scientifically evaluate these data to determine how or if such trends affect quality and implementation of PAT tools. FDA does not intend to inspect research data collected on an existing product for the purpose of evaluating the suitability of an experimental process analyzer or other PAT tool. FDA's routine inspection of a firm's manufacturing process that incorporates a PAT tool for research purposes will be based on current regulatory standards (e.g., test results from currently approved or acceptable regulatory methods). Any FDA decision to inspect research data would be based on exceptional situations similar to those outlined in Compliance Policy Guide Sec. 130.300.4 Those data used to support validation or regulatory submissions will be subject to inspection in the usual manner.
Challenges of PAT Implementation [2,8] For successful implementation of PAT combination of following areas must be considered: Organization support: One of the most important factors in ensuring the success of process analytical methods is strategic organizational support, which can afford to design, implement and maintain the PAT systems. The most PAT systems have significant upfront costs and efforts, they require management support. Due to the recent advancement in the computer technology, we have the capability to collect larger amounts of data. This trend will continue to accelerate in the next decades as the technological developments continues. Multivariate data analysis (MVDA) is becoming increasingly popular because the on-going data collection tends to overload our computers and data-bases with tons of data. It is therefore necessary to work on bigger samples if full advantage is to be taken of all accessible information. It is also necessary to derive as much information as possible from the diversity of the data, rather than restricting attention to subsets of it. The use of multivariate data analysis techniques provides this opportunity and it can be used to reveal the information otherwise impossible to know. By using MVDA we can extract much more information than univariate data analysis techniques for the selected variables and observations. This data then needs to be analyzed so that meaningful information can be extracted from it. In the biological manufacturing process, tremendous amount of data is generated from various sensors during each phase for the respective unit operation. If the manufacturing operation is comprised of the high tech data collection sensors, then the amount of data can be generated for each phase from few seconds interval to several days interval. The in-process test results helps to measure and ensure that the process is running under control. This data is stored on the computer and server databases. In order to perform MVDA, it is important to understand the variability, complexity of the data and the type of data being analyzed. The MVDA is well suited to deal with variability in the complex data, thereby reducing the risk of incorrect inferences.
However; all the data points are needed. One should not disregard the data because the variables are often collinear, either partially or completely. If part of the data is ignored then there is a substantial risk of overlooking the important information.
In MVDA, most common and widely-used methods are PCA and PLS. These methods present the modeling results graphically and the observations and variables are easily available for diagnostics and interpretation. PCA and PLS methods are mainly popular because they can deal with the problems related to dimensionality, co-linearity, noise and missing data. These methods offer a number of diagnostic tools, which facilitate the identification of assignable causes.
PCA and PLS can be used to address three main types of data issues such as overview of data, classification & discrimination and regression modeling.
PCA is a way of identifying patterns in data, and expressing the data in such a way as to highlight their similarities and differences. The main advantage of PCA is that once you have found these patterns in the data, and you compress the data, i.e. by reducing the identification of groups of objects or outliers.
The tasks required of the analyst to carry these out are as follows: Dimensionality Reduction: In case of a The determining of linear combinations of variables: The data matrix X is projected into multidimensional space. PCA method provides the understanding of the relationships between variables. This relationship is transformed into a covariance matrix. The eigenvalues and eigenvectors are the properties of matrix.
The relationship between the variables, their length and the direction of PC vectors is explained by eigenvalues and eigenvectors. The eigenvector are found in square matrix, its direction is not affected by scaling and they are orthogonal to each other. The eigenvalues are closely related to eigenvectors because they always come in pairs. It is important in PCA that each eigenvector to be of unit length [21] that means the variance of the eigenvector is one. If the eigenvalue is zero, the variance of projections on the associated eigenvector is zero. Hence the eigenvector is reduced to a point. If this point is additionally the origin (i.e. the data is centered), then this allows linear combinations between the variables to be found. In fact, we can go a good deal further: by analyzing second-order variables, defined from the given variables, quadratic dependencies can be straightforwardly sought. This means, for example, that in analyzing three variables, y1, y2, and y3, we would also input the variables y12, y22, y32, y1y2, y1y3, and y2y3. If the linear combination y1 = c1 y22 + c2 y1y2 exists, then we would find it. Similarly we could feed in the logarithms or other functions of variables.
Feature selection: the choosing of the most useful variables: In feature selection we want to simplify the task of characterizing each object by a set of attributes. Linear combinations among attributes must be found; highly correlated attributes (i.e., closely located attributes in the new space) allow some attributes to be removed from consideration; and the proximity of attributes to the new axes indicate the more relevant and important attributes. As stated earlier, PCA method calculates the eigenvectors and eigenvalues from the relationship matrix. The eigenvector with the highest eigenvalue is the principal component of the data set. In general, once eigenvectors are found from the covariance matrix, the next step is to order them by eigenvalue, highest to lowest. This gives you the components in order of significance [21]. This step assists in in choosing the most useful variables.
Visualization of multidimensional data: In order to provide a convenient representation of multidimensional data, planar plots are necessary. An important consideration is the adequacy of the planar representation: the percentage variance explained by the pair of axes defining the plane must be looked at here.

Identification of underlying variables:
PCA is often motivated by the search for latent variables. Often it is relatively easy to label the highest or second highest components, but it becomes increasingly difficult as less relevant axes are examined. The objects with the highest loadings or projections on the axes (i.e. those which are placed towards the extremities of the axes) are usually worth examining: the axis may be characterisable as a spectrum running from a small number of objects with high positive loadings to those with high negative loadings.

Identification of groups of objects or of outliers:
A visual inspection of a planar plot indicates which objects are grouped together, thus indicating that they belong to the same family or result from the same process.
Anomalous objects can also be detected, and in some cases it might be of interest to redo the analysis with these excluded because of the perturbation they introduce.
In this process, the principal components are derived as [23]: PCA modeling shows the correlation structure of data matrix X, approximating it by a matrix product of lower dimension (TP'), called principal components plus a matrix of residuals (E). The PCA model is shown by following equation [12]: The PLS approach was originated in 1975 by Herman Wold. He developed a simple way to estimate parameters in the model called NIPALS (Nonlinear Iterative partial least squares). These are later called PLS models. In PLS model, P indicates 'partial' because it is a partial regression since parameter vector (X variable) is considered fixed in the estimation. In 1980, the PLS started to interpret as "Projection to Latent Structures".
PLS is a similar technique which also reduces the dimensionality of the input space X, however, it does this while finding the best regression fit against a response variable Y.
PLS method utilizes regression modelling between two data metrices, usually denoted by X and Y, with the aim of predicting Y from X for new observations. This is achieved by "Linear Multivariate" modelling. In PLS modelling, the aim is to predict complex response or output variables (Y) based on the input variables (X). The precision of PLS model increases with increasing number of X variables.
In process modelling the PLS method finds the relationship between input (X variables) The basic idea of PLS is quite straightforward: First, the weight relations, to their respective unobservable variables, are estimated.
Second, case values for each unobservable variable are calculated, based on a weighted average of its indicators, using the weight relations as an input.
Finally, these case values are used in a set of regression equations to determine the parameters for the structural relations.
This explanation makes it obvious that the most crucial part of a PLS analysis is the estimation of the weight relations. Of course, it would be easier simply to assume equal weights for all indicators, but this approach has two disadvantages: First, there is no theoretical rationale for all indicators to have the same weighting.
Because it can be assumed that the resulting parameter estimates of the structural model depend on the type of weighting used, at least as long as the number of indicators is not excessively large, the (exogenous) assumption of equal weights makes the results highly arbitrary. Second, such a procedure does not take into account the fact that some indicators may be more reliable than others and should, therefore, receive higher weights.
Consequently, PLS uses a more complex, two-step estimation process to determine the weights (w): First, it starts with an outside approximation, in which case values for each latent variable are estimated, based on a weighted average of their respective indicators.
The weights used to calculate this aggregation is determined in a manner similar to a principal-components analysis for reflective or regression analysis for formative indicators. In the next step, the inside approximation, improved case values are determined as a weighted average of neighboring latent variables. For this process, there are three different weighting schemes available, but one can demonstrate that the choice between them has only a minor impact on the final results. Using this second estimate of the case values, the weight relations are modified and the process of inside and outside approximation starts from the beginning again and is repeated until convergence of the case values is achieved.
Hence, being a limited information approach PLS has the advantage that it "involves no assumptions about the population or scale of measurement" and consequently works without distributional assumptions and with nominal, ordinal, and interval scaled variables. However, one has to bear in mind that PLS, like any statistical technique, also requires certain assumptions to be fulfilled. Beyond those known from the standard regression model, the most important assumption is predictor specification. This requirement states that the systematic part of the linear regression must be equal to the conditional expectation of the dependent variable and can be considered as fulfilled in most cases. PLS is quite robust with regard to several inadequacies (e.g., skewness or multicollinearity of the indicators, misspecification of the structural model) and that the latent variable scores always conform to the true values.
However, there is also another side of the coin, namely, the problem of consistency at large. In general, a consistent estimator can be described as "one that converges in probability to the value of the parameter being estimated as the sample size increases".
However, because the case values for the latent variables in PLS are aggregates of manifest variables that involve measurement error, they must be considered as inconsistent.
Therefore PLS modeling consists of simultaneous projections of both the X and Y spaces. The coordinates of the points on the X and Y dimensions constitutes the elements of the T and U score matrices, P' and C' loading matrices and E and F residual matrices as shown in equation-4 and 5. The objective here is [12] to well approximate the X and Y spaces and to maximize the correlation between X and Y. The batch level model is used to predict the final performance variable using the X matrix T scores as shown in equation-5 and equation-6 [12]. T pred, k and Y pred, k in the following equation calculate the estimated scores and quality / performance attributes at time slice 'k' in a given batch.

Pre-Treatment of Data and Scaling Techniques
Like any other statistical application, PCA require the data to be pre-processed prior to using. The variables often have different numerical ranges. A variable with large range has a large variance and a variable with small range has small variance. Since PCA is a maximum variance projection method, it follows that a variable with large variance is more likely to be expressed in the modeling than a low-variance variable. In order to give equal weight to all the variables, the data from the variables needs to standardized. This process is called scaling. A combination of scaling techniques can also be used as shown in Figure 2 [26]. There are following ways of scaling the data: Mean Centering: For PCA to work properly, you have to subtract the mean from each of the data dimensions. The mean subtracted is the average across each dimension. This produces a data set whose mean is zero. that can be used to indicate how far the batch has evolved [12].
Unfolding of three dimensional data into two dimensional data [12]: Using SIMCA software, we can do two levels of batch monitoring; the batch evolution level monitoring (BEM) and batch level monitoring (BLM).
BEM Modelling: The aim of batch evolution level monitoring is to develop a model of the good batches and monitor new batches against this model as they evolve to find out if they are evolving within the confidence limits. A data generated by a batch processes for a biological manufacturing process is arranged in the data blocks as shown in Figure 3a & 3b [30].
The batches are depicted as "I", Variables are depicted as "J" and time points are depicted as "K". In order to do the observation level modeling, the three way batch data table must be unfolded in such a way that the direction of the variable is preserved as shown in Figure 3a [30]. The resulting two-way matrix then has "I*J" rows and "K" In batch level modelling, all the data from input variable matrix (X) and output variable matrix (Y) is available. Therefore, the data from the whole batch is used to create a model. The aim of the whole batch model is to verify the new whole batch is a good batch or bad batch. A data generated by a batch processes for a biological manufacturing process is unfolded in such a way that the direction of the batch is preserved. The resulting two-way matrix then has "I" rows and "J*K" columns as shown in Figure (3b optimal balance between the goodness of fit and its predictive ability. The good ness of fit is given by the parameter R 2 (explained variation) and the goodness of prediction is given by Q 2 (predicted variation). Usually, R 2 and Q 2 vary differently as the complexity of the model increases. Therefore, selection of number of parameters is based on the trade-off between goodness of fit and goodness of prediction as shown in Figure 4 [12].
There is another way of selecting number of components. You can plot eigenvalues of each component against the number of components as shown in Figure 5 [31]. The eigenvalue of the components represent the variation for those respective components.
Any component whose eigenvalue is less than 1.0 is in most cases are eliminated because it reflects the lowest and negligible variance [23].
Setting the Control Limits Hotelling's T 2 Plot: These charts are also used for process deviation detection. They detect deviations that are explained by the process model (if DModX is in control) and within the overall variability but represent unusually high variation comparing to the average process behavior.
The Hotelling's T 2 for observation i, based on A components is calculated by using following formula [12]: where; s 2 ta = Variance of ta according to the class model. Hence if then observation i is outside the 95% confidence region of the mode.    Each row contains data points X ijk from a single batch evolution.   The MV model used for monitoring the batch evolution with respect to a maturity variable in real-time Chemometrics [24] A way of analyzing chemical data, in which elements of both statistical and chemical thinking are combined.

Continuous
Process Verification [9] An alternative approach to process validation in which manufacturing process performance is continuously monitored and evaluated.
Design Space [4] The multidimensional combination and interaction of input variables (e.g. Material attributes) and process parameters that have been demonstrated to provide assurance of quality. Working within the design space is not considered as a change. Movement out of the design space is considered to be a change and would normally initiate a regulatory post approval change process. Design space is proposed by the applicant and is subject to regulatory assessment and approval. DModX Plot [12] The statistic showing the distance of the observation to the MV model plane.
Hotelling's T 2 Plot [12] The statistic summarizes the selected scores. It is a measure of how far away an observation is from the center of the MV model QbD [8,10] A strategic approach to drug development, Quality by Design requires getting the product, process, Packaging and manufacturing "right the first time."

Quality [33]
Per ISO: "Degree to which a set of inherent characteristic fulfills requirements" Loading Plot [12] It is a summary of variables for observations (batches). It is a means to interpret the patterns in score plot.
Maturity Variable [12] The variable indicating the evolution of a batch. It is used to understand how far the batch is evolved compared to the historical batches

PAT [2]
A system for designing, analyzing, and controlling manufacturing through timely measurements (i.e., during processing) of critical quality and performance attributes of raw and in-process materials and processes with the goal of ensuring final product quality Process Analytics [2] Chemical or Physical analysis of material in the process through the use of an in-line or on-line analyzer Process Validation [17] Establishing by objective evidence that a process consistently produces a result or product meeting its predetermined specifications.
Score Plot [12] It is a summary of observations (batches) and lastly ICH Q11 in 2012 [7]. These guidelines are continuously setting new industry trends as well as continuing to raise expectations. This study was focused on the use of a multivariate statistical data analysis tool for real-time process monitoring and its validation test cases to support good manufacturing practice (GMP) decisions. It also discusses case studies to demonstrate how a batch can be monitored using multivariate Each unit operation is comprised of multiple phases. Each phase is operated by multiple process parameters or variables. These parameters are categorized into input and output parameters. The input parameters are evaluated in the operating space and characterization studies and are maintained within the known operating ranges to achieve the desired output. However, the output parameters (a.k.a performance parameters) have a pre-set acceptance criteria to ensure that the process delivers consistent results every time.
For every biological process batch, there are many process variables measured during the course of production. It is important to make sure that each variable is operating within it operating range to ensure process performance consistency and product quality..
The suggested PAT framework using combination of following PAT tools includes: I. Multivariate (more than one variable) data acquisition and analysis II. Process and endpoint monitoring and control tools III.

Continuous improvement and knowledge management tools
The multivariate statistical process monitoring system efficiently monitors many variables at the same time by utilizing multivariate charts. The system also explains how these variables are changing in correlation with performance variables.
The goal of this study was to demonstrate how above stated PAT tools can be utilized to ensure continued process verification as outlined in FDA's 2011 process validation guidelines [6] for making critical manufacturing decisions in real-time.
In order to collect data for the multivariate data analysis (MVDA), modern process analyzers such as pH, temperature, agitation, dissolved O 2 , CO 2, and cell density probes must be installed so that the process information can be gathered at regular intervals.
Various software applications store and maintain this process data into databases enabling extraction of meaningful and critical process information from this data.
This study focused on the use of multivariate analysis tool as outlined in PAT framework by using SIMCA software to create multivariate models for real-time process monitoring.
The data collection and data mining process is a critical step which required installation of multiple software interfaces for linking the software databases, database modifications, and creation of trigger tags, timers, batch tags and monitoring markers. The three dimensional data extracted from the databases must be unfolded and saved in a specific format so that it can be used by the SIMCA software for the creation of batch evolution and batch level models.
Various MV models are generated using the successful performance batches from the historical databases. Two popular and commonly used MVDA methods such as principal components analysis (PCA) and partial least squares (PLS) were employed to demonstrate the use of PAT tool [2,9,10]. Then new batches were tracked against these models to ensure process consistency and detect deviations or process failures in realtime.
This study proved that PAT tools can be used to achieve continued process verification which meets the lifecycle approach in FDA's process validation guideline [6] and adaption of new mantra that the "Process validation should not be viewed as a one-off event. A lifecycle approach should be applied linking product and process development, validation of the commercial manufacturing process and maintenance of the process in a state of control during routine commercial production" [15]. This software was used to create a design of experiments and multivariate data analysis.
This tool transforms data into information, which can be seen in the form of color coded graphical control charts to enable the process analyst to make correct decisions in realtime.
Historical data from consistent successful process batches (batches which have minimum deviations) were used from a biological manufacturing process. The data was extracted from the historical databases by making configurations, tags, scan rates, and compression settings to the source system. This data was pre-treated and organized in the appropriate format prior to importing into the SIMCA software for the creation of multivariate models.

METHODS
This study was conducted by using the data from a commercial biological manufacturing facility. The commercial biological manufacturing process was enabled with various modern process analyzers and was equipped with both a distributed control system (Delta-V) system and a plant data historian to collect the process data. The MVDA tool was linked to the process databases to acquire the process data. Modifications were made to the existing databases for the appropriate collection of data from the unit operations.
The case studies presented were focused on one upstream unit operation (i.e., bioreactor) and one downstream unit operation (i.e. UF/DF).
The data set used for the bioreactor and UF/DF unit operations, to develop empirical models for multivariate monitoring purposes, were gathered from one of the existing products. The data were modified (normalized for propriety reasons) as necessary prior to using it for the creation of MV models in SIMCA.
The bioreactor and UF/DF unit operations were monitored using the parameters listed in Table 1 and Table 2. These unit operations were connected to the DeltaV system to collect the process data. The process database was connected to both DeltaV and a plant data historian and the historical and current batch data was saved continuously. The plant data historian was configured with the correct tags to enable advanced monitoring [11].
Along with online data collection, cell viability and viable cell density data were also collected from off-line measurements to check the process performance at every twenty- Title 21 [12], Software Validation Guidance [13] and Part 11 Guidance [14]. The validation of the real-time process monitoring system was performed Following concentration and diafiltration, the product pool is recovered by filtering it through 0.2µ membrane filter. Viability is monitored as a measure of cell culture performance. Retentate control valve is monitored as a controller output Step yield % Performance Output Step yield is monitored as a measure of UF/DF performance.
The data obtained from the above variables was pre-processed prior to use in the creation of the MV model. Multivariate methods are maximum variance projection methods. A variable with a large variance is more likely to be expressed in the modeling than a lowvariance variable. In order to give equal weight to all the variables, the data from the variables required standardization. The unit variance scaling [16] was chosen for these case studies. For example, DO 2 values vary from ~ 30 to 90 ranges whereas Air flow values vary from 0 to 1 for bioreactor. If we do not perform scaling then the DO 2 variable will have very high variance and it will have an impact on the model as compared to Air flow. Therefore, the MV model without data scaling may not be accurate.
Two levels of batch monitoring were employed; the BEM and BLM were implemented by unfolding the three-way matrix into two-way matrix as shown in Figure 1a and 2b [17].
BEM: The goal of BEM was to develop a model of the desired batches and monitor new batches against this model to determine if they were evolving within the confidence limits. The data generated by a batch process was arranged in data blocks as shown in Figure 1a and 2b [17]. The batches were depicted as "I", variables depicted as "J" and time points depicted as "K". In order to execute BEM, the three way batch data table was unfolded in such a way that the direction of the variable was preserved as shown in  This gave a two-way matrix with I*K rows and J columns. Each row contained data points X ijk from a single batch evolution. Figure 1b: The three-way table of historical batch process data comprises "I", J variables and K time points. In the BLM, this three-way data table was unfolded by preserving the batch direction.
This gave a two-way matrix with I rows and J*K columns. Each row contained data points from one single batch.
BLM: In batch level modeling, all the data are from input variable matrix (X) and output variable matrix (Y) is available. Therefore, the data from the whole batch is used to create a model. The aim of the batch level model is to verify whether the new batch is within multivariable control. The data generated is unfolded in such a way that the direction of the batch is preserved. The resulting two-way matrix then has "I*K" rows and J columns as shown in Figure 1b [17]. Another important objective of batch level model is to understand how Y (output) variable is influenced by the X (input) variables [10]. The general expectation is that at least 75% to 85% of the variation must be accounted for by a good model [10]. This is because the scores of BEM model are used at the BLM to predict the output variable.
Scatter Plot t 1 vs. t 2 (Bioreactor): The score plot is a map of the observations. Figure    The scores batch control chart displays that the selected score values (t 1 ) over time for all the 15 batches. The chart depicts the average batch (green) and the ±3 standard deviation (red). Figure 4 and 5 show the score contributions for scores t 1 and t 2 . The t 1 score plot demonstrates that all the batches start with low scores and then increase steadily until the termination. Whereas, for t 2 scores, all of the batches move steady and end the same way.
All of the fifteen reference batches behave well between the ±3 standard deviation and around the average for both t 1 and t 2 scores.  Hotelling T 2 chart in figure 6 demonstrates that all the data from all the variables are within the score dimension.     The score plot is created using the scores of the first two principal components.
Vertical axis depicts t 2 scores and horizontal axis depicts t 1 scores. The score plot shows that all the batches are aligned properly and fit the 95% confidence limit ecllipse.

Loadings Plot W*C [1] Vs. W*C [1]:
The loading plot demonstrates that this model can be further analyzed by interpretation of Figure 11. It reveals that the response maturity variable ($Time) is positively correlated with the retentate flow process totalizer, permeate flow, process totalizer, feed flow and concentration factor. These variables steadily increase with time. All other variables maintain somewhat steady state and are reasonably co-related with each other except for retentate pressure which is negatively correlated as it shows a decreasing trend with time. The scores batch control chart displays the selected score value (t 1 ) over time for all fifteen batches. The chart also shows the average batch (green) and the ±3 standard deviation (red). Figure 12 shows the score contribution for t 1 scores. The t 1 score plots demonstrate that all batches start with low scores and then steadily increase until termination. All of the fifteen reference batches behave well between ±3 standard deviation and around the average for first PC. shows that all fifteen batches are ending in a similar fashion within ±3 standard deviation confidence limit. BLM Model: In batch level modelling, the entire batch data from input variables and output variables are used to create the PLS model. In order to accomplish BLM modelling, the three-way batch data table is unfolded as shown in Figure 2b [17]. The data is unfolded and arranged as explained in the bioreactor BLM section. In the case study for UF/DF, the output variable is step yield. The fifteen batches used for the creation of the reference PLS model of X data matrix versus Y data matrix can now be used to classify new batches that still under development and demonstrate how Y variables are influenced by X variables [10]. The UF/DF batch level PLS model as shown in Figure 13, using the first two components demonstrate that all the batches are evenly scattered and are within the 95% confidence interval ellipse. When the diagnostics charts like Hotelling's T 2 and DModX are evaluated, charts, (Figures 14 and 15), are well with within the confidence limits for all fifteen batches.  demonstrates that all batches are well within the 95% confidence limit. score plot in Figure 16 shows that the batch# 1016 shown in dark red color was well within the 95% confidence limit. The score plot shows that batch# 1016 was a good batch. The score plot shows that batch# 1016 was a good batch because it was well within the 95% confidence limit.
The batch control score plot for batch# 1016 in Figure 17 shows that the batch was within ± 3 standard deviation. Therefore, it was a good batch.  Figure 18 and batch control score plot in Figure 19 clearly show that batch# 1017 was outside the model space.  The variable batch plots for culture pH in Figure 20 and for temperature in Figure 21 demonstrate the time points when the batch was out of confidence limits.     shows that batch# 116 was a good batch because it was well within the 95% confidence limit.
Batch control plot for batch# 116 in Figure 17 shows that the batch was within the confidence limits. Therefore, it is considered a good batch.  The variable batch plots for feed pressure in Figure 27 and for feed flow in Figure 28 showed the exact time points when the these two variables were outside of ± 3 standard deviation confidence limits.

VALIDATION OF MULTIVARIATE STATISTICAL MONITORING SYSTEM
An existing commercial manufacturing facility can be PAT enabled by installation of state of the art on-line sensors, data management system and data analyzing computer hardware and software systems. This vast amount of data collected from a ten second interval to fifteen minute intervals from different unit operations is saved in databases.
The entire system consists of multiple data servers, network components, software interfaces, and software applications for multivariate analysis. In this study, the SIMCA software was used for the multivariate model creation which was connected to the network via several related software interfaces. All together this entire system becomes an automated computerized system. As per FDA's software validation guidance [13], prior to using any computerized system in a cGMP environment, it must be qualified and validated. The cGMP guideline outlines that "Any software used to automate any part of the device production process or any part of the quality system must be validated for its intended use, as required by 21 CFR §820.70(i)". Validation is necessary to establish documented evidence to provide a high degree of assurance that the system will consistently operate according to pre-defined requirements and design specifications [12].
The SIMCA software is Part 11 compliant software. In order to utilize it in the cGMP environment, it must undergo validation per Computer Validation Guidelines [13] and Part 11 guidelines [14]. For the validation of a computerized system, several documents are generated, executed and approved by appropriate stake holders in the GMP facility.
All validation related documents and the test cases for the validation of real-time process monitoring system are generated per software validation and Part 11 guidance.
The following section outlines the documents and test cases generated for the validation of the real-time process monitoring system.

Validation Plan
The purpose of the Validation Plan (VP) was to define the overall validation approach,  There are two basic classes of software testing: black box testing and white box testing [20]:  Black box testing (also called functional testing) is testing that ignores the internal mechanism of a system or component and focuses solely on the outputs generated in response to selected inputs and execution conditions.
 White box testing (also called structural testing and glass box testing) is testing that takes into account the internal mechanism of a system or component.
A combination of black box and white box testing methods were used to test the real-time Structural testing: Features are qualified by testing individual components as specified in the RD (Requirements and/or Design document). Testing will ensure that each requirement stated in the RD is made to execute during testing and that each requirement stated in the RD performs its intended function.
Functional testing: All the hardware used for this system is a standard hardware, which is subject to IQ to verify the installation and connection to components. Software category 3 will be subjected to the validation process to ensure it meets the requirement specifications and design specifications. Testing will ignore that the internal mechanism or structure of a system or component and focuses on the outputs generated in response to selected inputs and execution conditions.
In March of 1997, FDA issued final part 11 regulations that provide criteria for acceptance by FDA, under certain circumstances, of electronic records, electronic signatures, and handwritten signatures executed to electronic records as equivalent to paper records and handwritten signatures executed on paper [1].
SIMCA is compliant with 21 CFR Part 11 (Electronic Records). "Umetrics quality systems for software development and validation can be audited". The audit trail is administrator-controlled and check-sum protected. SIMCA-4000 is OPC certified by the OPC Foundation. [21]. Therefore, the test cases related electronic record was performed to ensure the Part 11 compliance.

Validation Limitations and Assumptions:
All test cases assumed that the components, systems and services of servers were operating as expected. The test cases also assumed that the complex calculations performed during the generation of multivariate models in SIMCA software were correct and accurate because SIMCA is COTs software. Therefore, the multivariate statistical calculations were not verified. Testing conducted to verify one unit operation from the upstream (bioreactor) and one unit operation from downstream (UF/DF) for data flow from historical databases to SIMCA software for MV model creation was assumed to work exactly the same way for every unit operation.

Requirement and Design Specifications [13]
Every computerized system and software is developed or designed based on its intended use. While designing the system, the developers must know the specific requirements of the user (a.k.a 'User Specific Requirements'). The designed system also must meet certain inherent capabilities for it to function and meet the user specifications (a.k.a 'Functional Specific Requirements'). In order to validate the computerized system and software in the cGMP environment, these URSs and FRSs were required to be tested.
There can be many different kinds of requirements (e.g., design, functional, implementation, interface, performance, or physical). Software requirements are typically derived from the system requirements for those aspects of system functionality that have been allocated to software. Success in accurately and completely documenting software requirements is a crucial factor in successful validation of the resulting software. There are also many different kinds of written specifications (e.g., system requirements specification, software requirements specification, software design specification, software test specification, software integration specification, etc.). All of these documents establish "specified requirements" and are design outputs for which various forms of verification are necessary [13]. guidance [23]. As outlined in the Figure 33 [25], IQ, OQ and PQ documents were written with specific test cases as listed in Table 3. The each test script in the validation protocol was executed to ensure that it meets the expected results.  Ensure that all the SOP's are created for the operation of the GMP system. 8 Start up and shutdown verification (OQ) Each system component (server, network PC's, network devices are connected properly and go through flawless reboot process in case of power outage or routine start and shut down process 9 Logical Security for the Operating system and the Software's (OQ) Ensure that the servers and the PC's which are used to operate the system have the restricted access so that unauthorized users cannot modify or delete the secured data folders/files or modify the recording time. 10 Logical Security for the Software's (OQ) Ensure that the SIMCA software has the restricted access so that unauthorized users cannot modify or create new folders/files or modify existing folders/files 11 Password Security (OQ) Ensure that the users can create unique passwords with specific length, alpha-numeric combinations and allow only certain number of attempts. Only the application or IS administrator is allowed to add the users or reset the passwords. Ensure that all the clocks are synchronized on the servers and the PC, so that the there is no error during data transfer. 18 Backup and Restore verification (OQ) Ensure that all the data can be back-up and restored in case of disaster. 19 Alert limits and Action limits settings verification (OQ) Ensure that the alarm limits are configured correctly and they show the appropriate alarm conditions 20 Audit Trail verification (OQ) Ensure that the MVDA system is enabled with audit trail. The audit trail is human readable and the entries do not overwritten. 21 End to End performance verification (PQ) Ensure that system meets the performance specification over the period of time Performance Qualification (PQ) [12,13,22]: The performance verification of the real-time process monitoring system was performed after the IQ and OQ testing was complete. The intent of the PQ was to ensure that the system performed according to the expectations and was able to monitor the process in real-time. The test scripts were written to test one unit operation from upstream and one unit operation from downstream from start to end as shown below.
Requirement Traceability Matrix (RTM) [13,19]: The RTM was generated to map the functional testing of the real-time process monitoring system in validation documents (IQ/OQ/PQ) to the corresponding RS and DS specifications. This mapping helped to ensure that the requirements were met and traced to the appropriate qualification document(s). All requirements were verified and were traced to the test activity to prove that each requirement had been met.
Validation Summary Report (VSR) [13,19]: The VSR summarized the deliverables, validation activities, test results and deviations encountered during validation of the system. This document was generated at the end of the validation campaign to summarize a qualification conclusion that the real-time process monitoring system was validated and is suitable for using in the GMP environment.

Summary
The PAT enabled facility can generate data at the desired intervals. When these technologies are combined with Multivariate statistical methods can analyze the data to give meaningful information. Upon validation of the entire system can be used for realtime process monitoring to achieve the FDA's CPV requirements.

BUSINESS BENEFITS
In an ideal situation and complete implementation of lifecycle approach using QbD and PAT tools [2] can offer several tangible and intangible benefits to the biopharmaceutical manufacturers. The benefits of this system include detecting raw material and equipment related process variability to real-time lot release as outlined below. The real-time process monitoring at every manufacturing step, will result in tremendous benefits to the manufacturers and regulatory agencies throughout the lifecycle process [10,24].
Each of the benefits listed below are associated with significant financial savings ultimately, cost saving and financial gains to meet the product life cycle requirements is the objective along with meeting the regulatory expectation.
Operating Space: Leveraging scientific understanding and process knowledge helps Consistent Product Quality: With a real-time process monitoring tool, it is ensured that every batch is consistently meeting the quality requirement. This can help establish the assurance and confidence with regulatory agency and patients.
Real-time Release: The real-time process monitoring tool can assist in maintaining the patient supply and managing the inventory. Consistent product quality with minimal variability and higher yield results in a higher return on investments.
In the traditional approach, set points and operating ranges for process parameters are defined. The control strategy is based on the demonstration of process reproducibility and testing to meet the established acceptance criteria. There are certainly flaws in the traditional approach which needed to be improved with an enhanced approach. The enhanced approach is backed by risk management studies, scientific knowledge, and process understanding.
The latest guidelines such as PAT framework [2], FDA's 2011 process validation guideline [6] and Q11 guideline for development and manufacture of drug substances by FDA [7], are eliciting the same message that the innovative technologies can used in drug manufacturing processes.
In this study, the use of one of the PAT tools for process monitoring showed how a state of control is achieved and process failures could detect batch discrepancies or sensor malfunctions. The study was conducted using the data from existing biologics manufacturing process to demonstrate the industrial application of the tool. The study outlined the validation of a process monitoring system to show that this tool could be used in the GMP environment. Even if adapting this tool requires an initial investment, it can be applied easily with appropriate management support. It definitely offers significant enhancement to process understanding, process monitoring, and scientific thoroughness in decision making. It significantly enhances qualitative and quantitative performances and cost savings. The use of multivariate process monitoring tool provides an opportunity to improve control of monitoring the process real-time so that issues can be addressed quickly.
There are multiple benefits of implementing PAT tools in the drug development, validation and manufacturing phases. In the development phase, it can provide thorough scientific knowledge and process understanding to achieve stage 1process design. In stage 2process qualification stage, it can help determine and justify the number of PPQ batches required for process validation. In stage 3the continued manufacturing stage, it can help gain confidence and assurance in real-time that the batch is moving in the right direction. It may also reduce or eliminate the off-line testing [24].
If the QbD and PAT tools are applied to new products then it can help establish solid justification for the number of batches for PPQ prior to process validation campaign.
If the PAT tools are applied to an existing product then every batch can be monitored in real-time just like a process validation batch. The early fault detection can help in assuring that the processes are running at the optimum level within the operating space to give maximum efficiency, consistent quality and higher yields. This may also result in lower production cost and energy consumption.
This project is expected to reduce costs by helping to better control process variability, improve yields, reduce waste, and ensure high-quality product consistently. The cost savings upon implementation of this system for the conventional manufacturing process or new processes can be calculated using significant number of batches, right first time, quality costs and other metrics. This capability not only provides financial benefit but ensures quality product and meets the regulatory expectation for continued/continuous process verification.    [4] and Q11-Development and manufacture of drug substances [5]. Regulatory agencies objective is to encourage the innovation in the drug manufacturing process [6].
In January 2011, FDA published new guidance for industry entitled Process Validation: General Principles and Practices [7]. Since it is guidance from the regulatory agency, it is legally enforceable per the Federal Food, Drug, and Cosmetics Act. The requirements are called out in 21 CFR Parts 210 and 211 of the CGMP regulations, more specifically in Part 211.100 (a) [8].
There had been a gap of exactly 25 years between FDA 1987 Guideline and the 2011 Guidance for process validation. The 2011 Guidance is entirely consistent with the basic principles of process validation articulated in the 1987 Guideline.
"Nonetheless, more than 25 years' worth of experience and regulatory oversight, along with the cGMPs for the 21 st Century Initiative [9], prompted FDA to revisit the principles and concepts in an effort to update and clarify FDA's thinking on process validation".
Per this new guidance, manufacturers are urged to adopt the lifecycle approach in three stages [7]: In Stage 1, process design, the commercial process is defined based on scientific knowledge gained through development and scale-up activities.
In Stage 2, process qualification, the process design is evaluated and assessed to determine if the process is capable of reproducible commercial manufacturing.
In Stage 3, continued process verification, ongoing assurance is gained during routine production that the process remains in a state of control.
Per 2011 guidance, FDA states that process validation is to be a lifecycle approach instead of being a one-time activity. The FDA's new approach is to make 'every manufacturing batch as a 'Validated' batch via 'Continued Process Validation'. The following Figure 1 outlines the FDA's new process validation expectation [10]. The answers to these questions helps you understand the process limits for every parameter you need to establish the design space as shown in Figure 2 [12]. ICH guidance Q8, defines it as "the multidimensional combination and interaction of input variables and process parameters provides the assurance of quality [2]". The scientific knowledge of operating space provides the understanding of variability in raw materials, the relationship between a process and product's critical quality attributes (CQAs), and the association between CQAs and product's clinical properties. This through understanding can help "Control the variation in a manner commensurate with risk it represents to the process and product [7]". The scientific knowledge of drug and process parameters can be achieved by conducting a design of experiments (DOE) a.k.a characterization studies. The high degree of scientific knowledge and assurance in the performance of the manufacturing process is obtained from objective information and data from laboratory, pilot, and/or commercial scale studies [7]. The results obtained from these studies define the operating space. DOE studies can help develop process knowledge by revealing relationships, including multivariate interactions between the variable inputs (e.g., component characteristics or process parameters) and resulting outputs (e.g., in-process material, intermediates, or final product) [13].
Performing the Risk Assessment [13]: The most import task is performing a risk assessment of the operational and process parameters identified during characterization studies of the DOE. Per the 2011 guideline, 'All the parameters should be evaluated in terms of their roles in the process and impact on the product or in-process material [7]. A team of representatives from manufacturing, process development, quality assurance and validation are required to perform the assessment. A typical quality risk management model outlined in Figure 3 [3] is commonly used as described in the ICH Q9, Quality Risk management [4]. Each and every parameter identified during a characterization study is evaluated to find out what might go wrong? The likelihood of it going wrong and the consequences of it are discussed during the risk assessment. Based on the evaluation of the risk model, a score is assigned in three different categories such as low, medium and high. The risk parameters are weighted against the likelihood of occurrence, probability of detection, and severity of consequences. All three scores are multiplied to obtain a risk priority number (RPN) as shown in Figure 3 [3]. A decision is made to identify the parameter as critical, key or non-key to merit process characterization. 2011 guidance expects, 'a higher degree of control for the parameters that pose higher risk [7]. The results are documented as characterization reports to establish the operating space. The operational ranges for the operational parameters and acceptance criteria for the process parameters in the design space are the basis for process validation protocols to validate the process [15]. The scientific knowledge and information gathered must be documented, and approved in accordance with the established procedure so that it can be used in the stage of lifecycle [7].

Stage 2: Process Performance Qualification (PPQ)
The goal of validating any manufacturing process is to establish scientific evidence that the process is reproducible and will consistently deliver quality products. The sufficient scientific knowledge and assurance gained during stage 1 via characterization studies, sets the stage for stage -2 process qualifications. How do the characterization studies help in process performance qualification (PPQ)? FDA's 2011 guidance outlined that the manufacturers should [7] understand the sources of variation, detect the presence and degree of variation, understand the impact of variation on the process and ultimately on product attributes, and control the variation in a manner commensurate with the risk it represents to the process and product.
The scientific evidence gathered during characterization provides the appropriate level of assurance that the manufacturing system has been designed to consistently deliver a quality product to the market. The specific information obtained from the operating space such as the critical/key parameters and, control strategy to set the normal operating ranges (NOR) and proven acceptable ranges (PAR). This information derives the scientific justification for the parameters selection and to calculate and establish the control limits which serves as an input for PPQ protocol.
This phase involves evaluating the facility and equipment for its fitness for use. Utility systems and equipment are verified to be built and installed properly, and operators ensure that they operate within the intended and anticipated operating ranges. During the PPQ stage of process validation, the process design is evaluated to determine if it is capable of reproducible commercial manufacture of products [7]. The decision to distribute the product to the market is determined by the successful completion of the PPQ. The successful completion of the PPQ demonstrates that the commercial manufacturing process performs as expected.

Number of PPQ Batches:
One of the most important discussion and interpretation of FDA's 2011 guideline is about the number of batches. Until the new guidance came along, the process validation was done by performing a three-batch requirement. "...it was widely accepted throughout industry, and, indeed, implied or stated in some FDA guidance documents, that process validation was a static, three-batch demonstration event. [16]". The EU GMP Annex 15 states that "It is generally considered acceptable that three consecutive batches/runs within the finally agreed parameters would constitute a validation of the process" [17]. interpretation. One can interpret that 'process validation as a continuous process of collection and evaluation of data, rather than as a three-batch static event" [10].
The number of batches is not an acceptance criterion; however, the results of the data obtained from the batches are the acceptance criteria. The new definition of validation caused one industry member to state at the workshop that for the past 30 years, industry has been told that process validation is a documentation exercise. FDA expects industry to consider process validation as a scientific endeavor. That is quite a shift and 30-year habits are hard to break [10,18,19].
The existing products which are already in the market may have already crossed this hurdle by making three process validation (PV) batches, which got the approval of FDA for commercial manufacturing. The 2011 guidance is also applicable to these products.
The FDA directive for the manufacturers of these products is to follow the life cycle approach. The legacy product manufacturers benefit from the knowledge they have already gained about the manufacturing process and the product over course of the commercial manufacturing. Use of PAT tools for these manufacturing processes can really enhance their process monitoring capabilities to achieve the stage 3continued process verification.
At the same time, new products and new manufacturing processes have the benefit of following the QbD and PAT principle right from the beginning to gain process understanding during stage 1 so that they can scientifically justify the number of batches required for PPQ. The manufacturers must make deliberate, rational decisions about whether their specific processes are validated and their products ready for commercial release. A manufacturing process that uses PAT may warrant a different PPQ approach.
PAT processes are designed to measure in real time the attributes of an in-process material and then adjust the process in a timely control loop so the process maintains the desired quality of the output material [7].
Justification for selecting number of batches for PPQ: The PPQ validation strategy can be used to scientifically justify the required number of batches selected for PPQ. The knowledge gained from previous molecules and during the process design stage through development, scale-up activities and engineering runs can be used to demonstrate that the current process is well characterized. A thorough process Sampling during PPQ batches: The 2011 guidance also emphasized on the sampling plan, sampling points, number of samples, frequency of sampling for each unit operation and using a statistical approach for PPQ samples. The number of samples should be adequate to provide sufficient statistical confidence of quality both within a batch and between batches [7]. The use of a statistical tool and the approach is not specified but the manufacturers are expected to choose a suitable statistical tool. Homogeneity within a batch and consistency between batches are goals of process validation activities. The expectation is to use the heightened sampling and monitoring period to gain the confidence and assurance for the high risk parameters.

Stage 3 -Continued Process Verification (CPV):
The goal of the third validation stage is continual assurance that the process remains in a state of control (validated state) during commercial manufacture. A system needs to be in place for detecting unplanned departure from the process as designed during stage 1 and stage 2 [7]. Ideally, this stage should be treated as extension to stage 1 and stage 2 because all the scientific knowledge of operating space and its verification is done during process validation phases. But, this is still creating confusion because it is a new concept and the expectations stated in the 2011 guidance are vague.
The new guidance outlines that upon regulatory filing and receiving the approval for commercial manufacturing, the manufacturers maintain the same state of control as it was shown during PPQ runs to ensure that each batch is a process validation batch. When implementing stage 3, manufacturers should consider the semantic difference between the terms "continued" and "continuous". The 2011 Guidance deliberately speaks to continued process verification, which some organizations have misinterpreted to mean continuous, with mandatory enablement via PAT. The expectation is decidedly not that in-process or release testing required under the cGMP regulations be replaced by PAT approaches. Rather, the expectation is for ongoing, (i.e., inter and intra-batch, monitoring, and review) [18]. Monitoring Quality Systems: There are other periodic review quality systems are also being used to monitor the achieve CPV such as periodic review of post approval change control process [7], periodic review of non-conformances and defect reporting systems, verification of Root Causes and CAPA process, periodic review of validated equipment, systems and utilities at regular intervals, periodic review of monitoring for CIP and SIP cycles, periodic review, monthly and annual review of equipment and facility qualification [7], incorporating appropriate detection, control and mitigation strategies, collecting regular feedback from the process operators and quality staff on the process performance and maintaining and reviewing the product complaints data.
Statistical Evaluation and Analysis of Process Data [7,21] The drug manufacturers are following the FDA's suggestion of using statistics in the evaluation of data trends for analyzing the data. In order to achieve this, all IPC parameters and critical and key post filling parameters must be monitored starting from the first lot scheduled for commercial release. The product specific control limits must be established using generally accepted statistical process control practices with upper and/or lower control limits computed nominally at three standard deviation units from the mean for normally distributed data. The use of Nelson rules for supporting the making the statistical decisions is followed by many drug makers. As outlined in Nelson This software is used to create design of experiments and multivariate data analysis. This tool transforms data into information, which can be seen in the form of colorful graphical control charts to enable the process analyst to make the correct decisions and take the appropriate actions in real-time.
Historical data from the good batches (batches which have minimum deviations) are used from one of the well-known commercial biological manufacturing process. The data is extracted from the historical databases by making configurations, tags, scan rates, and compression settings to the source system. This data is pre-treated and organized in appropriate format prior to importing it into the SIMCA software for the creation of multivariate models.

Methods
Two popular and commonly used Multivariate Data Analysis (MVDA) Methods are principal components analysis (PCA) and partial least squares (PLS), which are used to show the use of the PAT tool described in the QbD, and PAT framework. [PAT Framework, QbD [2,11,24].
The commercial biological manufacturing process is enabled with various modern process analyzers and is equipped with a distributed control system to collect the process data. The real-time multivariate statistical process monitoring system is linked to the process databases obtain the process data. Various modifications were made to the existing databases for the collection of data from the unit operations. The study is focused one upstream unit operation (i.e. bioreactor) and one downstream unit operation (i.e. UF/DF).
The propriety data were modified and normalized as necessary prior to using it for the creation of the multivariate (MV) model in SIMCA application.
The goal is to demonstrate that the following PAT tools can be utilized to ensure continued process verification is met as outlined in FDA's 2011 process validation guidelines [7]:

II. Continuous improvement and knowledge management tools
Most of the existing manufacturing processes for biologics product at scale are not designed to inherently enable the real-time process monitoring. The real-time process monitoring (PAT software tools) and QbD principles are required for monitoring a biologics manufacturing process in "Real-Time".
The bioreactor and UF/DF unit operations were monitored against the input parameters listed in table 1 and table 2. These unit operations were connected to the distributed control system (DCS) to collect the process data [11]. The process database was connected to DCS, which saved the historical and current batch data continuously generated from the ongoing batches. The continuous plant data historian was configured with the correct tags. The cell viability and viable cell density data were also collected from off-line measurements to check the process performance at twenty-four hour intervals during the course of the unit operation. The configuration of trigger tags, timers and batch tags were made as required to get all the relevant batch and continuous process data from the historical databases.
The on-line data was collected at fifteen minute intervals for the bioreactor unit operation and at 10 second intervals for UF/DF unit operation. The data from fifteen batches were Fourteen variables (J=14) were monitored and the data was collected at every fifteen minute interval, giving a total of ~ K = 279 time points per batch. The total duration of the unit operation was 68 hours, and 30 minutes. The bioreactor unit operation is assumed a single phase process.
MV for UF/DF: The dataset contains data for N=17 batches. Out of these, fifteen batches were selected for the creation of the MV model. The batch selection criteria were to have a little variability among the batches used for the UF/DF process. The main objective of the study was to create a MV model so that it could be used as a reference to monitor new batches as they evolve and identify good batches from the bad batches. Nineteen variables (J=19) were monitored and the data was collected at ten second interval, giving a total of ~ K = 800 time points per batch. The total duration of the unit operation was 2 hours: 31 minutes: 10 seconds. UF/DF unit operation has three phases such as concentration, diafiltration and recovery.
The data are scaled to UV variance and unfolded by the SIMCA software prior to using for the model creation [24]. Viability is monitored as a measure of cell culture performance. Retentate control valve is monitored as a controller output Step yield % Performance Output Step yield is monitored as a measure of UF/DF performance.
The MV model created for bioreactor unit operation using fifteen good batches is shown in Figure 5a and Figure 5b by a score plot and a batch contribution plot. The MV model created a default 95% confidence limit using F distribution, which is shown by an ecllipse in Figure 5a. In Figure 5b, the batch contribution plot shows a ±3 standard deviation and averages using the data from the reference batches. The ±3 standard deviation and averages are shown in red and green. Figure 5a and Figure    UF/DF unit operation has three different phases. Therefore, SIMCA created three separate batch evolution models for each phase. The UF/DF MV model for the concentration phase is shown in Figure 6a and 6b. The model is created using fifteen good batches and is depicted by a score plots and a batch contribution plot. The MV model created a default 95% confidence limit using F distribution which, is shown by an ecllipse in the Figure 5a. In Figure 5b, the batch contribution plot shows a ±3 standard deviation and averages using the data from the reference batches. The ±3 standard deviation and averages are shown in red and green. Figure 5a and Figure 5b plots show that all batches are aligned properly and ending in a similar fashion within the confidence limits. Figure 6a: The Score Plot (BEM) for UF/DF. The score plot created for UF/DF using the scores of the first two principal components. The score plot shows that all batches are aligned properly and fitting the 95% confidence limit ecllipse. In order to test the real-time process monitoring system to ensure the continued process verification, two new batches were selected for both the bioreactor and UF/DF unit operations each. One of the two new batches was a good batch and the second new batch was deliberately modified by making deliberate changes to a few variables to see if they can be detected by the real-time process monitoring system.
In this case study 1, a new batch (batch# 1016) was projected on MV model created for the bioreactor process using fifteen good batches to see if this batch was running in the state of control. The batch score plot and the individual batch plot in Figure 7 showed that the batch# 1016 (shown in red) was well within the 95% control limit. The batch contribution plot shows that batch# 1016 was moving within ± 3 standard deviations shown in red and around the average (shown in green).  In order to find out the cause for the batch deviation, the contribution plot for each variable was evaluated. It revealed that the pH and the temperature sensors were malfunctioning. Figure 9 shows the pH and temperature batch plots with the specific time points where the batch was out of the confidence limit. In this case study 3, a new batch# 116 was projected on MV model created for UF/DF using fifteen good batches to see if this batch was running in the state of control. A batch score plot and a batch contribution plot in Figure 10 shows that the batch# 116 (shown in red) was well within the 95% control limit. The batch contribution plot shows that batch# 116 was moving within ± 3 standard deviations (shown in red) and around the average (shown in green). was within the 95% confidence interval and the individual batch plot shows that the batch# 116 was evolving within ±3 standard deviation confidence limit.
In this example 4, a new batch# 117 was projected on the MV model created for the UF/DF using fifteen good batches to see if this batch was running in the state of control.
The batch contribution plot in Figure 11 shows that batch# 117 was outside of ±3 standard deviations at several time points. This graphical presentation of new batch in real-time revealed that the batch was not a good batch and the cause of deviation to be addressed immediately. Figure 11: The Batch Score Plot for the UF/DF batch# 117. The plot shows that it was going outside of ± 3 standard deviation at several places.
In order to find out the cause for the batch deviation, the batch plot for each variable was evaluated. It revealed that the feed pressure and feed flow sensors were malfunctioning. Figure 12 shows the feed pressure and feed flow contribution plots with specific time points of where the batch was outside of the confidence limit. The latest Q11 [5] guideline, along with a new Process Validation guidance by FDA for process validation [7] about Continued Process Verification are eliciting the same message that "Process validation should not be viewed as a one-off event. A lifecycle approach should be applied linking product and process development, validation of the commercial manufacturing process and maintenance of the process in a state of control during routine commercial production".
With this, the regulatory agencies are encouraging the manufacturers to implement the QbD and PAT. In the traditional approach, set points and operating ranges for process parameters are defined and the control strategy based on the demonstration of process reproducibility and testing to meet the established acceptance criteria. There are certainly flaws in the traditional approach which can be improved with an enhanced approach. The enhanced approach is backed by risk management studies, scientific knowledge, and process understanding. The process knowledge and understanding gained during the process design and process qualification stages can be utilized to develop appropriate control strategies which are applicable over the lifecycle of the product.
The RT-MSPM system used in the study can be applied to any legacy manufacturing process. Even if it requires investment of resources and time, it can be applied easily with appropriate management support. It can definitely offer a significant enhancement to process understanding, process monitoring, scientific thoroughness in decision making qualitative and quantitative performances and cost savings. The use of MVDA tool provides an opportunity to have better control on monitoring the process real-time so that issues can be identified and addressed quickly.
There are multiple benefits of implementing a PAT during the drug development phase, and manufacturing phase. In the development phase, it can provide process understanding. In manufacturing phase, it can help gain the real-time monitoring and assurance that the batch is moving in a right direction. It can also eliminate off-line testing and minimize batches that are out of specification [25]. development runs and engineering runs are performed to obtain sufficient data then the justification for a number of batches required for PPQ can be easily made prior to process validation campaign.
If the MVDA tool is applied to an existing product, then every batch can be monitored in real-time similar to a process validation batch. The early fault detection and any deviation from the targeted range can be detected in real-time. These capabilities not only provide financial benefit but also meet the regulatory expectation for continued/continuous process verification.