Document Type


Date of Original Version



Background: One useful application of pattern matching algorithms is identification of major histocom-patability complex (MHC) ligands and T-cell epitopes. Peptides that bind to MHC molecules and interact with T cell receptors to stimulate the immune system are critical antigens for protection against infectious pathogens. We describe a genomes-to-vaccine approach to H. pylori vaccine design that takes advantage of immunoinformatics algorithms to rapidly identify T-cell epitope sequences from large genomic datasets.

Results: To design a globally relevant vaccine, we used computational methods to identify a core genome comprised of 676 open reading frames (ORFs) from amongst seven genetically and phenotypically diverse H. pylori strains from around the world. Of the 1,241,153 9-mer sequences encoded by these ORFs, 106,791 were identical amongst all seven genomes and 23,654 scored in the top 5% of predicted HLA ligands for at least one of eight archetypal Class II HLA alleles when evaluated by EpiMatrix. To maximize the number of epitopes that can be assessed experimentally, we used a computational algorithm to in-crease epitope density in 20-25 amino acid stretches by assembling potentially immunogenic 9-mers to be identically positioned as they are in the native protein antigen. 1,805 immunogenic consensus sequences (ICS) were generated. 79% of selected ICS epitopes bound to a panel of 6 HLA Class II haplotypes, repre-senting >90% of the global human population.

Conclusions: The breadth of H. pylori genome datasets was computationally assessed to rapidly and care-fully determine a core set of genes. Application of immunoinformatics tools to this gene set accurately pre-dicted epitopes with promising properties for T cell-based vaccine development.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.