AN OMICS BASED APPROACH FOR THE IDENTIFICATION OF BIOMARKERS IN NON-ALCOHOLIC FATTY LIVER DISEASE USING IN VITRO MODELS OF HEPATIC STEATOSIS

Nonalcoholic fatty liver disease (NAFLD) is an “umbrella” term for the broad spectrum of the disease that begins from the simple steatosis to more progressive stages of nonalcoholic steatohepatitis (NASH), that includes hepatocellular ballooning, inflammation, and fibrosis. NAFLD is a growing epidemic globally, with 25% of the population predicted to be diagnosed with this disease. Liver biopsy is the only definitive method of diagnosis, despite the widespread use of sonography and elastography to predict the disease state. There is no current FDA approved medication for NAFLD/NASH. This is partly due to the lack of translatable disease model to predict the whole spectrum of the disease in humans as well as lack of definitive biomarker to predict the disease state. The goal of this dissertation is (1) To build an in vitro model for NAFLD to study the influence of fat over-load on xenobiotic as well as lipid metabolizing proteins (2) To use the model to help identify novel biomarkers in liver tissue to characterize the early stages of disease. Manuscript I: In recent times, there are numerous invitro models relevant to humans developed to predict the disease state. They involve monoculture, coculture and as well as multicellular culture in both 2D and 3D models to best represent the physiology and working of liver in the disease state. In this review, we have explored all the existing in vitro models of NAFLD relevant to humans as well as highlight the technological gaps in the current in vitro models for future development. Manuscript II: Human hepatic carcinoma cell lines are commonly used in in vitro studies of lipid and xenobiotic metabolism, as well as glucose regulation in normal and disease state. However, their validity is still under debate due to the variable expression of proteins in the cell lines and human hepatocytes. In this present study, we used a data independent acquisition based total protein approach (DIA-TPA) to quantify the protein abundance in the different cell lines versus (vs.) cryopreserved human hepatocytes (cHH) and human liver tissue (HLT). For this purpose, the global proteome from the whole cell homogenates of HepaRG, HepG2, Huh7 cell lines were compared to cHH and HLT. The MS2 spectra for all detectable peptides were quantified using SpectronautTM. In summary, 2715, 2578, 2874, 2717 and 3083 proteins in, HepaRG, HepG2, Huh7, cHH and HLT, were identified at 1% FDR, respectively. The global proteome of the cHH significantly differed from the cancer hepatic cell lines. Within the cell lines, the global as well as ADME protein profile of HepaRG most closely correlated with cHH, with 84 out of 101 ADME proteins, identified in HepaRG cells. Within gluconeogenesis and glycolysis pathway, Huh7 cell line expressed proteins in high abundance in contrast to the other groups. Therefore, we show that the comparison demonstrates the capability of untargeted global proteomics to detect the differences in protein expression among the different groups. In addition, this study provides a comprehensive database of information to aid study design and model selection. Manuscript III: To manage NAFLD and related co-morbidities, patients are administered with an array of pharmacological agents. Therefore, understanding of the effect of NAFLD on drug disposition is warranted. Using a HepaRG model, we aimed to mimic steatosis in vitro, and to examine its effects on drug-metabolizing enzymes (DME) and transporters. HepaRG cells, differentiated in-house, were exposed to a mixture of saturated and unsaturated fatty acids (1:2 ratio of 0.5 mM palmitate and oleate complex conjugated to BSA for 72h) and were subjected to RNA sequencing and proteomic analyses. Lipid accumulation was ascertained by Oil Red O (ORO) staining and triglyceride (TG) quantitation and cell viability by WST-1 determination. The treatment condition resulted in ~6-fold increase in TG concentration without reducing cell viability. RNA sequencing of lipid-loaded and control cells identified a total of 393 differentially expressed transcripts (89 upand 304 down-regulated). Moreover, lipid loading resulted in significant downregulation of mRNA transcripts of transcription factors, NR1I2 (-1.18) and HNF4α (-0.55), phase 1 DMEs including CYP1A2 (-3.25), 2B6 (-2.02), 2C8 (-1.48), 2C9 (-2.00),2C19 (1.32) and 3A4 (-1.87), phase 2 DMEs including UGT1A6 (-0.36) and 2B7 (-1.09), SULT2A1 (-0.75) and 1E1 (-1.41) as well as clinically relevant transporters such as ABCC11 (-1.36), ABCG5 (-1.66), SLC10A1(-1.63) and SLCO2B1(-1.49). However, the protein expression did not show a significant change. Furthermore, lipid loaded cells significantly upregulated AKR1B10 mRNA (2.17) and protein (0.99) that may regulate lipid as well as xenobiotic metabolism. Manuscript IV: Non-alcoholic fatty liver disease (NAFLD) is a global epidemic, present in over 10% of the world population, despite the majority of the population being undiagnosed. Liver biopsies are the only gold standard available for the confirmation of disease state. Other non-invasive diagnostics such as ultrasound and MRI are either inaccurate or expensive for routine use. Many markers for disease state are available that detect the onset of inflammation and fibrosis from moderate-to-high accuracy. However, there is a huge gap in specific biomarkers that can distinguish the NAFLD liver tissue from normal in early stages of the disease. Using SWATH-MS based Data-independent acquisition (DIA) strategy the dysregulated proteins in the in vitro model of hepatic steatosis was compared with human liver tissue (n=116) showing progressive stages of NAFLD. More than 2,500 proteins were identified in HepaRG and human hepatocyte model as well as human liver tissue. Within the hepatocyte model, 40 proteins were dysregulated in steatosis. These proteins were screened in liver tissue and 6 common proteins were identified. The sensitivity and selectivity of the markers were analyzed using receiver-operative curve (ROC) for the following markers PLIN2 (0.77), ANXA1(0.70), H2AFY (0.80), SNX1 (0.67), GCHFR (0.69), APO (0.69) and all the above markers showed significance P <0.05. Conclusion. This work demonstrates that in vitro human relevant disease model has the potential to explain the effect of NAFLD. Subsequently, the in vitro models when used in conjunction with human liver tissue aid in identification of novel biomarkers that may have therapeutic and diagnostic value.


NAFLD Background
Nonalcoholic fatty liver disease (NAFLD) is a highly prevalent disease in the United States as well as other parts of the world . A high-fat diet and lack of exercise can lead to the accumulation of free fatty acids and tri-acyl glycerides (TAG) in hepatocytes leading to the development of steatosis . Current treatment involves lifestyle management, which in most cases is proven ineffective. Hence, our long-term goal is to identify non-invasive and conclusive diagnostic methods for the identification of NAFLD as well as to discover a panel of biomarkers that could be potential therapeutic targets for the pharmacotherapy of NAFLD.

Importance of studying the disease state with in-vitro systems
To study steatosis, it is crucial to identify the appropriate disease model. Nutrition and energy uptake are the critical determinants for the onset of NAFLD; however, significant differences exist in the manifestation of disease between humans and mice. Diet-induced mouse models fail to recapitulate the spectrum of NAFLD as seen in humans (Gómez-Lechón, . With a high-fat diet (HFD) simple steatosis is established; however, male C57BL/6 mice fail to progress to NASH even in the presence of high fructose-rich diet (Takahashi, Soejima and Fukusato, 2012). To establish NASH, most animal models are fed with methionine-choline-deficient (MCD) diet that results in NASH without inducing obesity. Recent reports, suggest the development of liver steatosis and fibrosis with high fructose (HF) rich diet (Cydylo, Davis and Kavanagh, 2017). In addition, there are several genetically modified mouse models such as, ob/ob, db/db that contain mutations in the leptin gene; however, they too fail to recapitulate the full spectrum of the disease (Larter and Yeh, 2008). This physiological mismatch makes it challenging to solely rely upon animal models for investigation of NAFLD.
The use of primary human hepatocytes (PHH) or cryopreserved hepatocytes is considered the gold standard for in-vitro studies (Lecluyse and Alexandre, 2010). However, the availability of early-stage biopsy samples, limited time in culture before the onset of dedifferentiation renders the utility of PHH as an ineffective tool for model development.
HepaRG cells are a human hepatocellular carcinoma cell line composed of a mixture of both hepatocyte-like and biliary-like cells. They retain hepatic functions, expression of liver-specific genes and drug metabolism capacity at levels comparable to that of PHH . Furthermore, they preserve their phenotype for over 2 weeks upon differentiation . This provides for a good window for model development and treatment modification.

The use of 'Omics' based technology in the identification of biomarkers
Omics technology is a broad term to denote the global detection of genes (genomics), mRNA (transcriptomics), protein (proteomics) and metabolites (metabolomics To investigate the drug deposition of the administered drugs due to changes in hepatic metabolism in disease state as well as inspect the chemical-induced toxicity that ameliorates NAFLD; early-stage screening tools that most represent the disease state in humans, are warranted. These models are also advantageous during drug development for examining various molecular entities against therapeutic targets that contribute to disease progression, e.g., FXR agonists (8). Due to the widely acknowledged ethical and financial requirements that limit the animal use in exploratory studies (9), system-oriented approaches involving integrated into in vitro and in silico models to best predict the changes in human liver physiology, are emerging as valuable tools to study the molecular changes involved in the disease state (10).
In the forthcoming sections, we will provide context for the pathology of NAFLD, risk factors associated with the disease progression as well as the current sophisticated in-vitro human models that reflect the different stages underlying the etiology of NAFLD.
The aim of this review is to provide comprehensive information on the existing in vitro models of NAFLD to aid the choice of appropriate model as well as highlight the technological gaps in the current in vitro models for future development.

Pathology and Risk factors
Lipid accumulation in the hepatocytes is the starting point for hepatic steatosis.
However, with the severity of progression, the affected parenchyma is subdivided into three categories: 5%-33%, 34%-66%, and > 66% as mild, moderate, and severe, respectively (11). The occurrence of steatosis is spatially heterogenous in the liver, with lipids localized to zone 3 in the early stages of disease in contrast to panacinar localization, as seen with the progression of steatosis (12). This heterogeneity may lead to occasional overestimation of the degree of steatosis by histopathologists. In addition, conventional imaging (ultrasound, magnetic resonance imaging, or computed tomography), lack the sensitivity to detect hepatic steatosis less than 30% (13). However, newer technologies involving magnetic resonance imaging-estimated proton density fat fraction, and 1Hmagnetic resonance spectroscopy detect steatosis with high accuracy (14,15). Ethnicity also plays a critical role with a higher rate of incidence seen in Hispanics and Asians, followed by Caucasians (18). Possession of homozygous PNPLA3 allele (rs738409) in the Hispanic population contributes to 2-fold higher hepatic lipid accumulation (19). On the other hand, a lower incidence of steatosis is observed in African-Americans in comparison to Caucasians. In addition, genetic polymorphism in the TM6SF2 gene, involved in VLDL secretion, leads to a higher incidence of NAFLD. The variant frequencies in TM6SF2 gene is found to be higher in whites, African-American, and Hispanic races (19).
The occurrence of NAFLD increases with age an incidence of 20% in age group under 20 to 40% or more in the age group above 60 (20). Diet enriched with macronutrients and carbohydrates posit high risk in developing NAFL as well as its progression. It can be worsened when coupled with a sedentary lifestyle and smoking (21,22). Moreover, endocrine dysfunction, such as polycystic ovarian disorder (PCOS), is often characterized by obese and insulin-resistant women (23). There are evidences that show PCOS with insulin resistance worsens hyperandrogenemia, which leads to NAFL. A striking association of NAFL is also observed in patients with obstructive sleep apnea (OSA), which is caused by a complete or partial obstruction of the airway. The incomplete exchange of gases (hypoxemia and hypercapnia) leads to oxidative stress, recruitment of pro-inflammatory factors, insulin resistance, and endothelial dysfunction. This may lead to the evolution of NAFL to NASH (24).

In vitro models of Nonalcoholic fatty liver disease (NAFLD)
The current in vitro models for NAFLD is summarized in  (29). Furthermore, treatment with tetracycline and amiodarone after acute and repeated treatments induced steatosis, highlighting the robustness of HepaRG, in predicting the drug-induced hepatotoxicity as well as its utility for understanding the molecular mechanisms involved in the onset and progression of disease state (7). In summary, these reports represent the possible impact of various stimuli on the onset of hepatic steatosis and/or NASH. However, the results from cancer cell lines are viewed with skepticism as the translatability of the observed impact on human hepatocytes is unknown. In addition, it is well-known that Huh7 and HepG2 cell line expresses the drug-metabolizing enzymes in low levels in comparison to human hepatocytes (30,31). Hence understanding the DME modulation could be a challenge.
However, HepaRG cells can be used to overcome this drawback as evidence show the presence of comparable levels of DME to human hepatocytes.

Primary Human Hepatocytes (PHH)
Primary human hepatocytes are the current gold standard to study metabolism and toxicity related effects of drugs. Donato  To mitigate the drawbacks of the 2D static model and to study the long-term impact of triglyceride accumulation, three dimensional (3D) spheroids or 3D models using scaffolds were developed (33). The 3D models using scaffolds were cultured in the microfluidic device and LiverChip platform with continuous perfusion of media to mimic the dynamic sinusoidal flow in the liver (34). This stabilized the hepatic metabolizing capacity to over 14 days in culture. Subsequently, the model was treated with control and FFA mixture (oleate and palmitate, 2:1) for 7d. As a result, increased levels of secreted adipokines were noted; IGFβ1, PDK4, and FABP-1 genes associated with lipid metabolism were upregulated; and a decrease in the metabolic rate of CYP3A4 and CYP2C9 were also reported.

Induced Pluripotent Stem Cells derived Hepatocytes (iPSC-Hep)
Despite the fact that human hepatocytes are considered as gold-standard, they represent several limitations, such as donor to donor variability, lack of availability, cost of procurement, variability in proliferation, which introduce mutations and polymorphisms in metabolic markers. Human-induced pluripotent stem cell (hiPSC)-derived hepatocytes

Micropatterned Tri-Cultures (MPTCs)
To stabilize the hepatocytes in long-term culture as well as induce NASH-like phenotype with activated stellate cells, a micropatterned tri-culture model was introduced.
In this model, growth-arrested 3T3-J2s (90k cells per 24-well format) were mixed with activated HSCs (2.5k), and this mixture was then seeded onto to micropatterned PHH colonies (30k cells) (36). The ratio of stellate cells to PHH was corresponding to the proportion in the human liver. In such micropatterned tri-cultures (MPTCs), albumin secretion was higher than the levels seen in MPCCs. Besides, the supporting non-

3D-Human Liver Micro-Tissues (3D-hLMT)
3D-hLMT model is the known closest in vitro culture system that mimic the in vivo liver physiology. The model can recapitulate the multicellularity in 3D format and maintain individual cell phenotype for over 5 weeks (38). PHH were co-cultured with NPCs such as hepatic stellate cells, Kupffer cells, and endothelial cells. In this model, 0.5 mM of palmitic acid (PA) induces a robust inflammatory as well as fibro-genic response in the 3D-hLMT.
PA significantly induced the expression of collagen related genes, α-SMA, TIMP1, IL8 transcripts, as well as TGFβ gene activation pathway (38). Nevertheless, owing to its complexity, these models are time-consuming and expensive to establish; in addition, scalability is a limiting factor to use these models in routine high-throughput screening.

Induced Pluripotent Stem Cell (iPSCs) derived organoids
iPSCs derived from 11 human donors (healthy and diseased) were developed to derive organoid comprising of hepatocyte, stellate, and Kupffer and biliary-like cells whose genetic signature was confirmed from transcriptomics analysis. Upon stimulus with fatty acids (800 µm of oleate) for 5 days, the human liver organoids (HLO) showed the stepwise progression of NAFLD from steatosis, ballooning, inflammation, and fibrosis phenotypes recapitulating NASH in the latter stages (39). In addition, atomic force microscopy confirmed the severity of fibrosis conferred in the HLOs. This novel approach sets a unique benchmark to understand the genotype-specific contributions on the progression of disease state as well as contribute to therapies based on personalized treatment.
However, the significant limitations associated with this model must also be taken to account. The donor to donor variability might affect the HLOs phenotype, such as NASH and fibrosis (39). Additionally, the time and cost involved in the organoid development may pose further limitations in routine usage.

3.5
The organotypic ex-vivo culture system

Precision-Cut Liver Slices (PCLS)
Many advancements have been noticed in the use of 2D and 3D monoculture to mimic NAFLD. However, the complex interactions of multiple cell types such as hepatocytes, Kupffer cells, stellate, and endothelial cells that induce inflammation, necrosis, and consequently, fibrosis in NAFLD, is impossible to be recapitulated in a 2D and 3Dmonoculture. Human precision-cut liver slices (hPCL) retain the liver architecture with the multi-cell interactions among the specialized liver-specific cell types as well as infiltrating lymphocytes to up to 5 days in culture (40). Henceforth, they make an excellent model to study the changes observed during the different stages of NAFLD and test new therapies. However, the limited time in culture and the inability to adapt to genetic changes such as transfection or RNA silencing, makes it challenging to understand the impact of a specific gene in the signaling pathways leading to disease progression.

Conclusion
A translatable human in vitro model of NAFLD is the need of the hour. However, a model that fits all-purpose, to study a "multiple-hit" disease state, such as NAFLD, poses a highly relevant challenge to the research community. The majority of models developed thus far represent 2D-static mode involving either PHH or human carcinoma cell lines.
However, it is impossible to study the whole spectrum of disease state using such simple systems. Hepatic steatosis was well recapitulated using various stressors, such as free fatty acids, glucose, and fructose, insulin, interleukins as well as other chemicals. A few studies have advanced further, by including microfluidic the model to mimic the sinusoidal flow as well as micropatterning to recapitulate the spatial organization of hepatocytes with the supporting NPCs. Sophisticated models with 3D-hLMT as well as iPSC-derived organoids, only show a sneak-peak of the innovations coming up soon. This may allow for the whole body to be modeled on the chip to understand the disease state beyond the liver, by including the gut axis as well as adipose tissue. Such innovative tools will contribute to new therapies as well as understanding other complexities that come with the disease.

Conflict of interest
None of the authors have any conflict of interest to declare.

Statement of financial support
Financial support for this study was provided by National Institutes of Health grants to Methods. In-solution trypsin digestion of proteins in whole cell homogenate was conducted using pressure-cycling technology (PCT). DIA was carried out by sequential window acquisition of theoretical mass spectra (SWATH-MS). In addition, MS2 spectra for all detectable peptides at its corresponding m/z were quantified using Spectronaut™, which was then analyzed using TPA. Conclusion. In summary, the comparison demonstrates the capability of untargeted global proteomics to detect the differences in protein expression among the different groups. In addition, this study provides a comprehensive database of information to aid better study design.

Introduction
The human liver is the major organ for metabolism and clearance of xenobiotics, maintenance of glucose homeostasis as well as regulation of lipid metabolism. In hepatocytes, the glucose consumption is observed via the uptake of plasma glucose, followed by the break-down of the sugar moiety to pyruvate, which in-turn is oxidized through the TCA cycle to produce ATP. Conversely, the glucose may also get converted to fatty acids via de-novo lipogenesis pathways. 4 On the other hand, hepatocytes also play a critical role in lipid metabolism via conversion of non-esterified fatty acids into tri-acylglycerols (TAG), which in-turn is stored or secreted as low-density lipoproteins (VLDL) back into the plasma. 4 Studies characterizing the mechanisms involved in metabolic functioning of the liver, in the healthy as well as disease state such as obesity, insulin resistance, non-alcoholic fatty liver disease (NAFLD) are on the increase. [5][6][7][8] The gold standard for such invitro studies are human hepatocytes. However, the cost of procurement, donor-to-donor variability as well as limited availability renders some disadvantages of using them for long-term studies. Consequently, hepatic carcinoma cell lines such as HepG2, Huh7 and HepaRG cells are often used as an alternative tool. In addition, immortalized hepatic cell lines have been routinely been used in toxicity as well as drug metabolism assays in preclinical studies. [9][10][11] However, it is unclear how well the tumor-derived cell lines represent the human hepatocytes in-context to their protein expression involved in lipid and energy metabolism as well as xenobiotic metabolism.
Shotgun proteomics is a powerful tool for analyzing proteolytic peptides of high intensity, from low-throughput biological assays. The results generated are highthroughput and have a complete proteome coverage that out-weighs the traditional labor and resource intensive western blotting techniques. 2,12 Data-dependent acquisition (DDA) has been used more commonly in untargeted analysis. 13 However, bias associated with the abundant peptides and the lack of sensitivity to quantify low-abundance proteins, renders limitation to use this approach. 14 Hence, sequential window acquisition of all theoretical mass spectra (SWATH-MS) that follows a Data Independent Acquisition (DIA) was employed to analyze the samples. Eventually, the DIA based total protein approach (DIA-TPA) was used to compare the absolute protein levels of the human hepatocytes and the carcinoma cell lines. [15][16][17] This approach is sensitive to capture the low abundant proteins which makes it ideal to compare the expression pattern of thousands of proteins from the cell lines and human hepatocytes, at the same time. In addition, we compared the protein expression from whole cell homogenate without enrichment. We believe this novel approach of sample analysis overcomes the limitation of loss of proteins in fractionation and enrichment and is more reflective of the biological state. 2,18,19 The objective of this study was to compare the protein expression involved in the glucose, lipid as drug metabolism pathways in immortalized hepatocyte cell-lines as well as primary human hepatocytes. This comprehensive comparison provides a database to aid the betterment of study design in the field of hepatology and pathophysiology.

Cell Culture
The cell lines and cHH were cultured as per conditions in-house or as per manufacturer's protocol. The conditions between the different cell lines and cHH were maintained as close as possible with minor changes that conformed to specific cell lines and cHH.

Whole cell homogenate and proteomics sample preparation
An overview of the study workflow is given in Fig. 1

In-solution trypsin digestion
Protein digestion was conducted as described previously with few adaptations. 21 Subsequently, 10 µg trypsin was again added to each sample and digestion was repeated as mentioned above.
Furthermore, to 110 µL of digested peptides sample, 10 µL of acetonitrile (1:1, v/v containing 5% formic acid) was added to precipitate DOC. Samples were spun (10,000 rpm for 5 min at 10°C) to remove the precipitate and 100 µL supernatant was collected.
Subsequently, 25 µL of the digested peptide sample was injected on the analytical column and were analyzed using LC-MS/MS method described below.

SWATH-MS acquisition and data analysis
Homogenate samples were analyzed using SWATH-MS based spectra which were  (1).
The absolute protein abundance of each group was represented as pmol/mg of protein.

Statistical and bioinformatic analysis
A number of normalization strategies were tested using NormalyserDE. LIMMA package with multiple comparison was used to test for significance (FDR < 0.05, log2FC > 0.58 or 2.2B), and we believe this is due to the bipotent progenitor status of HepaRG, resulting in expression of hepatocyte-like and biliary-like cells.

Global proteome profile comparisons from
In addition, the global proteome profiles of the cell lines, cHH and HLT were analyzed using Perseus and represented as a PCA and heat map ( Fig. 2.3B). Dramatic differences were observed between cHH, HLT and the three cell lines. With regard to the PCA analysis of component 1 and 2, it was noted that the HLT and cHH were grouped closer to one another in comparison to the hepatic carcinoma cell lines. In addition, hierarchical clustering ( Fig. 2A) showed a similar pattern of the HLT and cHH grouped together with minimum distance and hepatic carcinoma cell lines were clustered closer to one another.

SWATH-MS analysis of ADME protein expression in the HLT, cHH and the hepatic cell lines
To investigate the detection reliability, we performed label-free quantification on both HLT and cHH ( Fig. 2.4A). More than 2000 proteins were identified, of which 97% (2647 out of 2717) were found in both sample types. Overall, a good global correlation in protein abundance of individual proteins were observed between the two sample sets. In addition, we detected 89 out of the 101 ADME proteins (88%) chosen from the literature.
The ADME protein abundance, showed a good correlation around the line of unity.

SWATH-MS analysis of proteins involved in energy and lipid metabolism in the HLT, cHH and hepatic cell lines
Hepatocytes are pivotal in the energy supply and storage. The major pathways such as glycolysis/gluconeogenesis, are involved in the conversion glucose to pyruvate, eventually for ATP production ( Fig. 2.5). Glucose undergoes phosphorylation by hexokinase or glucokinase to form glucose-6-phosphate (G6P) that is converted to ATP via glycolysis. Furthermore, via gluconeogenesis, pyruvate is converted to acetyl-CoA that participates in ATP production as seen in TCA cycle. Alternatively, glucose is also used up in lipid synthesis via pentose-phosphate pathway.
The protein expression in terms of energy metabolism varied significantly in HLT, cHH and the hepatic cell lines (Fig. 2.6). For instance, the primary enzyme involved in gluconeogenesis, 1,6-glucose bisphosphatase (FBP1/2), was identified only in cHH and HLT. Other mitochondrial proteins involved in gluconeogenesis such as PCK2 and PC were found in higher abundance (~6 fold) in cHH than HepaRG cell lines. Hexokinase (HK1), important enzymes involved in glycolysis was present in HLT, cHH and was comparable to the levels in HepaRG cell line. In contrast, LDHA involved in the conversion of lactate (released by the muscles) to glucose in hepatocytes was found in high abundance in Huh7 cell lines (9-fold) in comparison to cHH (Fig. 2

SWATH-MS analysis of proteins involved in xenobiotic transport and drug metabolism
Liver is the major organ that contributes to clearance and metabolism of drugs and xenobiotics present in the circulation. This is accomplished by the specialized drug were detected at within quantifiable limits in the HepaRG but not in HepG2 and Huh7.
Among the efflux transporters MRP6 was the most abundant followed by MRP2, BSEP and MDR1.  (Table 3.5). In addition, we also noted that the predicted top upstream regulator that was significantly inhibited across all cell lines, was found to be, HNF4α. We believe this probably contributes to the dysregulated protein expression pattern seen in lipid and xenobiotic metabolism.

Discussion
Human hepatic carcinoma cell lines are commonly used in drug development to aid safety assessment and for candidate selection for first-in-human (FIH) studies. Hence, there is a need for well-characterized, fit-to-purpose, proliferative hepatic cell lines models, due to the limited availability of primary human hepatocytes cells, inconsistency across donors and variability in DMET expression. 24 In addition, the need to understand hepatic-disease state along with its impact on drug metabolizing enzymes propels the urgency for better preclinical models of human origin. With this objective in mind, we screened HepaRG cells as well as more traditional cell lines such as HepG2 and Huh7 for the expression of DME and proteins involved in energy and lipid metabolism. 6,7 HepaRG are cells derived from a 66 y old female patient with hepatocellular carcinoma. 26 They are progenitor cells that upon differentiation exhibit 50% hepatocyte-like and/or 50% biliary-like cell lineages. 27 In addition, upon differentiation, HepaRG cells are reported to express high levels of CYP450 and other Phase I and Phase II enzymes, that is confirmed using several micro array studies. 9,28,29 Apart from the reported gene expression data, very few reports discuss the global proteome of these cell lines as well as the targeted DME protein expression. Also, it is important to note that the known gene expression reports may not directly extrapolate to its protein expression or its activity. 30  With respect to the proteins involved in energy and lipid metabolism, we compared a proteins involved in glycolysis, ß-oxidation and lipogenesis across different groups.
Majority of proteins involved in gluconeogenesis/glycolysis was expressed in high abundance in Huh7 cell line (Figure 2.4B). Interestingly, the average expression across the proteins involved in fatty acid metabolism was much higher in the HLT and cHH in comparison to cell lines. HepaRG, expressed a general lower trend than cHH followed by HepG2 and Huh7 cells. However, in contrast to the DME expression, the proteins involved in lipid metabolism was highly expressed in the carcinoma cell lines, but not of the similar abundance as human hepatocytes.
In conclusion, in this publication we used a DIA based TPA strategy to compare proteome profiles of the whole cell homogenate of three hepatocarcinoma cell lines, i.e.
HepG2, Huh7 and HepaRG and compared it with cHH and HLT in the focus of proteins associated to pathways of energy, lipid and xenobiotic metabolism. As discussed before, we observed significantly different protein expression profiles among the groups in energy and lipid metabolism as well as ADME protein expression. Therefore, caution must be exercised when choosing the cell lines as in vitro models for drug metabolism or developing models for disease states. This study provides a comprehensive database for understanding the expression of proteins in different cell lines and we believe it will aid in making informed choices for hepatic cell lines in future model development.   The protein expression is represented as the log2 fold change between the test group (HLT, HepaRG, Huh7 and HepG2) vs. human hepatocytes. The figure shows proteins involved in gluconeogenesis/glycolysis. X -not applicable when protein expression is below limit of quantification. Abbreviation: HLT -Human liver tissue. The protein expression is represented as the log2 fold change between the test group (HLT, HepaRG, Huh7 and HepG2) vs. human hepatocytes. The figure shows proteins involved in lipid metabolism. X -not applicable when protein expression is below limit of quantification. Abbreviation: HLT -Human liver tissue; FA -Fatty acid.   The square in black represent the average of human hepatocytes (n=6) enzymes with min and max range. The triangle represents the average of human hepatocytes from current study. The square in black represent the average of human hepatocytes (n=6) enzymes with min and max range. The triangle represents the average of human hepatocytes from current study. The square in black represent the average of human hepatocytes (n=6) enzymes with min and max range. The triangle represents the average of human hepatocytes from current study.

Background & Aims. Hepatic lipid accumulation (steatosis) is an early sign of a spectrum
of Non-alcoholic Fatty Liver Disease (NAFLD) that precedes fibrosis and cirrhosis. To manage NAFLD and related co-morbidities, patients are administered an array of pharmacological agents. Therefore, understanding of the effect of NAFLD on drug disposition is warranted. Using a HepaRG model, we aimed to mimic steatosis in vitro, and to examine its effects on drug-metabolizing enzymes (DME) and transporters.
Methods. HepaRG cells, differentiated in-house, were exposed to a mixture of saturated and unsaturated fatty acids (1:2 ratio of 0.5 mM palmitate and oleate complex conjugated to BSA for 72h) and were subjected to RNA sequencing and proteomic analyses. and protein (0.99) that may regulate lipid as well as xenobiotic metabolism.

Introduction
Nonalcoholic fatty liver disease (NAFLD) is highly prevalent in the United States as well as other developed counties . Excess intake of calories and lack of physical activity can cause accumulation of free fatty acids (FAF) and triglyceride (TG) in hepatocytes leading to the development of steatosis . NAFLD which is associated with metabolic syndrome, includes a spectrum of liver disorders, initiating with simple steatosis, progressing to nonalcoholic steatohepatitis (NASH) and possibly lead to end-stage cirrhosis Cobbina and Akhlaghi, 2017).

Cell Culture Model
HepaRG cells were purchased from Biopredic International (through licensing agreement from Inserm, France) and were grown and differentiated in-house according to manufacturer's instructions. The incubation was carried out in a at 37 °C with 5% CO2

Oil Red O Staining
HepaRG cells treated with 0.5 mM FA or control for 24h, 48h and 72h. Cells were

WST-1 cell viability
Cells were treated with 0.1-2 mM BSA conjugated with FA (1:2 palmitate: oleate) for 72h in a 96 well format. The positive control used for this experiment was clotrimazole (100 µM).
The media was added to cells and incubated in 37 °C for 4h. The supernatant was collected and analyzed using the SpectraMax M2 plate reader at 440 and 600 nm wavelengths.

TG Accumulation
Approximately 10 million cells (HepaRG control vs. FA) were rinsed with PBS and scraped off the flask using a rubber policeman cell scraper. Cells were resuspended in PBS and lysed by sonication twenty times with one-second bursts. The samples were centrifuged at 10,000 xg for 10 minutes at 4 °C, and the supernatant was collected and assayed for TG content using a TG colorimetric assay kit (Cayman Chemicals, Catalog# 10010303, Ann Arbor, MI) according to manufacturer's protocol. The optical density was read at 540nm on SpectraMax M2 plate reader. Following this, samples were analyzed at 542 nm using SpectraMax M2 plate reader.

RNA-seq data analysis
Sequencing data have been uploaded in the Gene Expression Omnibus (GEO) database that can be viewed with the accession number of GSE122151. The data analysis and visualization were carried out using QuickRNASeq, an integrated pipeline used in the analysis for large-scale RNA-Seq data (Zhao et al., 2016). Read mapping and counting was performed using STAR and featureCounts. Following this, RSeQC package was included to employ all the QC metrics and to remove outliers; whereas VarScan, platformindependent software tool was included for variant detection in RNA-Seq data. A generalized linear model was used instead of a pairwise comparison approach to account for the changes between treatment and control according to the R user's guide to determine differentially expressed genes. We used log2 FC > 0.58 as a cut-off combined with exclusion criteria with False Discovery Rate (FDR) < 0.01 and p-value < 0.05, to define the differentially expressed genes.

SWATH-MS acquisition and data analysis
Cell homogenate samples were analyzed using SWATH-MS based spectra which were acquired for mass range m/z 400−1100 Da within SWATH window width of 10 m/z resulting in 70 overlapping mass windows per cycle, as described in the previous literature

…(1)
The absolute protein abundance of each group was represented as pmol/mg of protein.

Characterization of Enzyme Activity
HepaRG cells grown on twenty-four-well plates of were used for activity measurements. After 72 h in culture, treatment media was removed, and the cells were showed linearity 9 to 925 ng/mL (R 2 > 0.95) and precision.

Data and statistical analyses
RNA-Seq data were analyzed using IPA (Ingenuity Pathway Analysis) software for studying the disease and other canonical pathways, utilizing the generalized pathways that represent common properties of a signaling module using the KEGG database (Qiagen, Foster City, CA). Differential gene expression of FA treated cells was normalized to the control and reported as log2 of the fold change (log2FC). HemI (Heatmap Illustrator, version 1.0) was used to generate the heatmap of differentially expressed genes. In most cases, data were averaged for each experiment performed as triplicates and data were presented as mean ± SEM relative to BSA for each condition. Unpaired Student's t-test analyzed column statistics between control and FA treated groups. Data were analyzed, and graphs were presented using Prism software (GraphPad).

Development of the steatosis model with HepaRG cell line
Upon treatment, the neutral lipid accumulation was visualized at 24h, 48h and 72h utilizing ORO stain (Fig 3.2-I). There was an increasing trend in the lipid accumulation, with 72 h post-treatment showing (Fig 3.2

Validation of FA treated HepaRG cells as a model for steatosis
Increased total TG and cholesterol levels were observed under FA treatment conditions. There was a 6-fold increase in triglyceride levels in FA treatment vs. control ( Fig. 3.3A). Cholesterol concentration showed ~1.3-fold increase in the FA treatment group compared to control (Fig. 3.3B). To assess the effect of FA treatment on hepatocellular oxidative stress, the mean level of malondialdehyde (MDA) was measured ( Fig. 3.3C). However, there was no significant difference between the FA treatment and control cells. This observation suggests that FA treatment may not induce oxidative stress.
Insulin signaling, which is often hindered in steatosis, was assessed in FA treated HepaRG cells. Total Akt levels remained constant for all treatment conditions. At every concentration increment of insulin, a significant reduction in the levels of phospho-Akt was noted within the FA treatment group as compared with control ( Fig. 3.3D). This finding shows reduced insulin signaling in FA treated HepaRG cells.
In addition, we compared the transcriptomics data with a clinical study (Starmann et al., 2012) that reported the hepatic gene expression difference between liver steatosis (n=30) and control (n=18). Of the top 100 differentially expressed genes, 24 genes overlapped with our dataset and are summarized in Table 3

Pathway analysis of differentially regulated genes upon FA treatment vs. control
IPA was used to identify significant pathways affected by differential regulation of genes from FA treatment of HepaRG cells. To further examine the model, we used the differentially expressed (DE) genes to predict the disease state. Two hundred seventynine canonical pathways were significantly altered in the dataset of 393 DE genes ( Fig.   3.4A). The top twenty pathways are listed in Table 3.2. It is important to interpret this information with caution as the dysregulated canonical pathways were mostly represented by a small fraction of differentially regulated genes (less than 20), in such cases the pathway was not considered significantly dysregulated. Amongst the enriched pathways, the pathways that were dysregulated upon FA treatment, include FXR/RXR, LXR/RXR activation, LPS/IL-1 mediated inhibition of RXR, xenobiotic metabolism, coagulation system and acute phase response signaling (Fig. 3.4C). Differentially expressed genes were also grouped to predict the disease state with the IPA software (Table 3.3). It is interesting to note the top-scoring and statistically significant (p<0.05) predictions in the category of metabolic diseases was that of hepatic steatosis, microvesicular steatosis, non-alcoholic fatty liver disease and steatohepatitis (Fig. 3.4B).

Protein expression pattern in FA treatment vs. control
Similar to the gene expression pattern that predicted hepatosteatosis (Fig. 3.4C), significant changes in protein expression that was associated with hepatocellular injury was observed with fatty acid accumulation. Addition of fatty acids induced the proteins PGK1, ALDOA, PGAM1 involved in glycolytic pathways as seen in steatosis, that leads to the formation of glycerol which is the key component in TG formation (Fig. 3.5A). In addition, we noticed a decrease in albumin production and increase in ALT production as seen in hepatocellular injury (Fig. 3.5B). Moreover, increased protein expression of collagen (COL18A1) and TNF-α are indicators of early stage progression to NASH. Collectively, these results support the idea that FA accumulation in HepaRG cells exhibits broad regulatory roles on immune response, lipogenesis, lipid catabolism and transport.

Effect of FA treatment on DMEs and nuclear receptor expression
We compared the transcriptomes between FA treatment and control HepaRG cells and identified significant changes in genes, suggesting the role of FA accumulation as an initial trigger to the onset of hepatosteatosis (Fig. 3.4C). Furthermore, FA treatment resulted in differential mRNA expression of the phase I (17 different CYP isoforms, 4 different ADH isoforms and FMO5), phase II (2 different GST isoforms, 7 different UGT isoforms, 2 isoforms of SULT), transporter (5 different ABC superfamily, 3 of SLC superfamily) (Fig. 3.7A). Treatment with FA for 72h shows a statistically significant with the RNA-Seq data, qRT-PCR results indicated that expression of the nuclear receptors PXR (NR1I2) and CAR (NR1I3) and CYP3A4 was markedly reduced upon FA treatment (Fig. 3.7C). Furthermore, mRNA and protein of transcriptional regulator, AKR1B10 was positively increased. This suggests that FA accumulation in steatosis induce AKR1B10 that may modulate the lipid metabolism as well as basal expression of drug-metabolizing enzymes (DMEs) in hepatocytes.

Functional activity of CYP3A4 in control and fatty acid treatment
CYP3A4, the most abundant human hepatic CYP (Saravanakumar et al., 2019) , along with PXR and CAR expression was assessed using qRT-PCR. CYP3A4 enzyme activity in FA-treated HepaRG was assessed using midazolam as a substrate. Upon FA treatment, the CYP3A4 enzyme activity was significantly reduced. FA-treatment resulted in a 10-fold reduction in CYP3A4 (maximal rate of metabolism, Vmax) activity compared to the control at 10 µM (Fig. 3.8B). Both our RNA-seq, PCR data as well as enzyme activity showed marked downregulation in CYPs and its corresponding nuclear receptors, demonstrating the link between lipid metabolism in steatosis cells with that of xenobiotic metabolism in liver cells.

Discussion
We Here we report a 6-fold increase in TG content as well as increased lipid-filled vesicles in HepaRG cells after a 72h stimulus with 0.5mM of O/P (2:1). This in vitro model also showed a positive correlation with several genes associated with de-novo lipogenesis (Table 3.3) and inflammation. Thereby, providing a molecular signature resembling liver steatosis.  Cells were treated for 12h in serum free media (Starvation). Cells were then stimulated with insulin (1, 10, 100 nM) for 10 min before harvest. Phosphorylated Akt (Ser473) and total Akt were assayed in ELISA kit. Error bars represent standard deviation. All samples were assayed in triplicates. Asterisks indicate a significant (*P < 0.05; **P <0.01; ***P <0.001) difference compared to control.     much is known about the dysregulated proteins between simple steatosis and NASH groups. Proteomics employing sequential acquisition of theoretical mass spectra (SWATH-MS) is a powerful tool that aims to determine the relative amount of proteins from low-throughput biological samples [10][11][12] . Data independent acquisition based total protein approach (DIA-TPA) is more commonly used in untargeted proteomics, to compare the protein levels between the groups. This approach is sensitive to capture the low abundant proteins extracted from cells as well tissue homogenates, and hence an ideal tool for biomarker discovery 12 .
The objective of this study is to identify a list of secretory hepatic proteins associated with the progression of disease. To achieve this, we compared the relative protein expression in human hepatocyte and HepaRG-based models for hepatic steatosis with human liver tissue showing progressive stages of NAFLD.
This comprehensive comparison provides a panel of proteins that may act as probable biomarkers of disease state prediction.
Subsequently, 10 µg trypsin was added again for futher digestion as mentioned above.
Furthermore, to 110 µL of digested peptides sample, 10 µL of acetonitrile (1:1, v/v containing 5% formic acid) was added to precipitate DOC. Samples were spun (10,000 rpm for 5 min at 10°C) to remove the precipitate and 100 µL supernatant was collected.
Subsequently, 25 µL of the digested peptide sample was injected on the analytical column and were analyzed using LC-MS/MS method described below.

LC-MS/MS method and SWATH-MS data analysis
The LC-MS/MS method was used as previously developed without modifications.
Homogenate samples were analyzed using SWATH-MS based spectra which were The absolute protein abundance of each group was represented as pmol/mg of protein.

Statistical and bioinformatic analysis
Missing data with proteins more than 50% among the samples were omitted from analysis.
For cell analysis, the proteins present in all replicates were only considered for analysis.
Student's t-test with p value < 0.05 and log2FC cutoff of +/-0.58 was used to identify differentially expressed proteins in pooled human hepatocytes, and the HepaRG cell lines.
The data obtained from the cells were represented as means from quadruples. Proteins from human livers were analyzed using SPSS software v24.0. Protein data from human samples were ln (natural-log) transformed and analyzed using one-way ANOVA and posthoc analysis for multiple group comparison was conducted using Bonferroni test. In addition, the specificity and selectivity of proteins were analyzed using ROC cures on SPSS. P< 0.05 was considered significant. Prism 8 (GraphPad Inc., La Jolla, CA) was used and SPSS 24.0 (IBM Corp., Armonk, NY) were used for graphing and statistical testing, respectively.

Overall workflow in the identification for dysregulated protein in disease state.
The workflow followed in this manuscript is as illustrated in Fig. 4 (Table 4.3). It was to be noted that, majority of the protein ANXA1, GCHFR, H2AFY2, PLIN2, SNX1 and APOE showed a significant change (p < 0.05) in the earlier stages of the disease with steatosis, hepatocellular ballooning with minimal/no lobular inflammation (Fig. 4.5). Moreover, PLIN2 showed an increase with progression of disease from NAFL to NASH (data not shown). Within this pool of 6 proteins, ANAX1 and APOE were secretory proteins and PLIN2 is associated with lipid droplets present in tissue as well as plasma.
The specificity and selectivity of the markers were analyzed using ROC analysis.
All 6 markers showed significance, however, PLIN2, H2AFY2 and ANAX1 showed ROAUC of ≥ 0.7 showing the potential to be good markers of disease state (Table 4.4).

Discussion
Non-alcoholic fatty liver disease (NAFLD) is the estimated, within next 20 years, the major cause of liver related morbidity and mortality 3 . Liver biopsy is the only confirmatory test for the onset of the disease 1 . Due to the invasiveness, substantial cost as well as variability associated in this testing method, novel non-invasive techniques for the diagnosis of NAFLD have been extensively studied for development, in the recent times. The absence of biomarkers at early stages of disease is responsible for late diagnosis, leading to undiagnosed NAFLD in general population. Therefore, identifying novel biomarkers to detect the onset of the disease as well as disease progression is the need of the hour.
Proteomics is emerging as an efficient tool for its application in exploratory biomarker studies 12 . DIA based SWATH-MS approach is sensitive and highly reproducible method to measure low abundant proteins in complex biological matrices 10,11,15 . Hence, SWATH-MS approach was used to screen our samples as illustrated in Fig. 1 It is important to note that most of the markers showed an altered expression pattern in the earlier stages of the disease (NAFL) rather than those observed in NASH (data not shown). This pattern could be because of the bias associated with the comparison of dysregulated proteins from the hepatocyte-steatosis model. Furthermore, the sample size is not large enough to conclude the markers definitively to predict NAFL.
Hence, studies with larger sample sizes need to conducted to make more conclusive remarks.
PLIN2 is an abundant hepatic protein associated with lipid droplets. Reports show staining pattern of perilipin to distinguish steatosis and non-alcoholic steatohepatitis in adults and pediatric population 17 . In addition, PLIN2 is concentrated and detected in human urine and is used to differentiate renal carcinomas 18 . However, it is a ubiquitous protein associated with adipocytes differentiation, protumorigenesis and steatosis in cardiomyocytes. Hence, its selectivity for a disease state prediction may lead to low confidence. Therefore, PLIN2 along with other markers needs to be assessed together for understanding NAFLD. ANXA1 is an anti-inflammatory protein secreted by the hepatocytes. Plasma levels of ANXA1 are shown to correlate with fatty liver index in type 2 diabetes patients 19 . Additionally, its role is speculated to attenuate insulin resistance and hepatosteatosis. The evidences in literature alongside our observation in disease state suggest the probable role AnnexinA1 as a biomarker for NAFL. Nuclear protein H2AFY2, a variant of histone protein H2A is a repressor of gene expression and APOE is a lipoprotein secreted by the hepatocyte that is required for VLDL clearance from the blood. The role of GCHFR and SNX1 is yet to be explored in NAFLD.
Many previous studies have explored the potential of multiple plasma proteins as biomarkers of NAFLD 9,20 . Most studies, focus on the hallmarks of the disease progression that is inflammation and fibrosis. Thereby, we notice the promising biomarkers in plasma known thus far, such as: adiponectin, CRP, resistin, leptin, RBP-4, IL-6 and TNFα in inflammation and fibrosis; CK-18 fragments in apoptosis; MCP-1 in hepatocellular ballooning; ALT levels in hepatocellular injury; as well as ferritin in fibrosis. However, there is not much know about the early stage predictors of the disease. Hence our work is novel in presenting the dysregulated proteins in earlier stages of NAFLD prior to the onset of NASH.
In summary, a DIA based proteomics using SWATH-MS was implemented to understand the relevant proteins in NAFLD progression. Furthermore, by utilizing in-vitro model for steatosis we were able to delineate the proteins associated particular to the changes in fatty-acid overload. With this approach we streamlined 6 proteins including 2 secretory proteins (ANAX1 and APOE) along with PLIN2 that is all localized ubiquitously with lipid droplets. Our work adds valuable information for future investigations and clinical biomarker research. The HepaRG and pooled human hepatocytes model for hepatic steatosis were collected and analyzed using SWATH-MS. Simultaneously human liver tissue (n=116) homogenates were screened using SWATH-MS and the data was analyzed using spectronaut. Using set criteria, the number of proteins identified in the groups were 2580, 2653 and 2781 in HepaRG, human hepatocytes and human liver tissue, respectively. Finally, the dysregulated from human hepatocytes was matched with human liver tissue to identify 6 markers of disease state.      The differential expression in control (n=42) and NAFL (n=34) in human liver tissue. * denotes P value < 0.05 and ** denotes P-value < 0.01 using Bonferroni multiple comparison on lntransformation.   Histone; SNX1 -Sorting protein 1; GCHFR -GTP Cyclohydrolase I Feedback Regulator; APOE -Apolipoprotein E. The differential expression in human liver tissue. * denotes P value < 0.05 and ** denotes P-value < 0.01 using Bonferroni multiple comparison on one-way ANOVA

CONCLUSIONARY REMARKS
This work demonstrates the utility of in vitro models of hepatocytes to aid in the identification of novel biomarkers in hepatic steatosis. In the first part of this work We believe this work that is a conjunction of in vitro model NAFLD along with the SWATH-MS, DIA based proteomics approach provides vital information for the identification of novel biomarkers to predict the early stages of the disease. We believe this will help in diagnosis of NAFLD in stages prior to NASH that may aid in more effective treatment before the disease had developed to more progressive stages. In addition, we believe these proteins may also have the potential as predictive biomarkers in treatment and therapy.