Fungal Genome Annotation

Haridas, Sajeet; Salamov, Asaf; Grigoriev, Igor V.

doi:10.1007/978-1-4939-7804-5_15

Sajeet Haridas⁵,
Asaf Salamov⁵ &
Igor V. Grigoriev⁵

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1775))

3057 Accesses
12 Citations
1 Altmetric

Abstract

The term “genome annotation” includes identification of protein-coding and noncoding sequences (e.g., repeats, rDNA, and ncRNA) in genome assemblies and attaching functional information (metadata) to these annotated features. Here, we describe the basic outline of fungal nuclear and mitochondrial genome annotation as performed at the US Department of Energy Joint Genome Institute (JGI).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Grigoriev IV, Nikitin R, Haridas S, Kuo A, Ohm R, Otillar R, Riley R, Salamov A, Zhao X, Korzeniewski F, Smirnova T, Nordberg H, Dubchak I, Shabalov I (2014) MycoCosm portal: gearing up for 1000 fungal genomes. Nucleic Acids Res 42(Database issue):D699–D704. https://doi.org/10.1093/nar/gkt1183
Article CAS PubMed Google Scholar
Haas BJ, Zeng Q, Pearson MD, Cuomo CA, Wortman JR (2011) Approaches to fungal genome annotation. Mycology 2(3):118–141. https://doi.org/10.1080/21501203.2011.606851
Article PubMed CAS Google Scholar
Kuo A, Bushnell B, Grigoriev IV (2014) Fungal genomics: sequencing and annotation. Adv Bot Res 70:1–52. https://doi.org/10.1016/b978-0-12-397940-7.00001-x
Article Google Scholar
Price AL, Jones NC, Pevzner PA (2005) De novo identification of repeat families in large genomes. Bioinformatics 21(Suppl 1):i351–i358. https://doi.org/10.1093/bioinformatics/bti1018
Article PubMed CAS Google Scholar
Kent WJ (2002) BLAT – the BLAST-like alignment tool. Genome Res 12(4):656–664. https://doi.org/10.1101/gr.229202. Article published online before March 2002
Article PubMed PubMed Central CAS Google Scholar
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421. https://doi.org/10.1186/1471-2105-10-421
Article PubMed PubMed Central CAS Google Scholar
Li H, Durbin R (2010) Fast and accurate long-read alignment with burrows-wheeler transform. Bioinformatics 26(5):589–595. https://doi.org/10.1093/bioinformatics/btp698
Article PubMed PubMed Central CAS Google Scholar
Ter-Hovhannisyan V, Lomsadze A, Chernoff YO, Borodovsky M (2008) Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome Res 18(12):1979–1990. https://doi.org/10.1101/gr.081612.108
Article PubMed PubMed Central CAS Google Scholar
Korf I (2004) Gene finding in novel genomes. BMC Bioinformatics 5:59. https://doi.org/10.1186/1471-2105-5-59
Article PubMed PubMed Central Google Scholar
Stanke M, Schoffmann O, Morgenstern B, Waack S (2006) Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics 7:62. https://doi.org/10.1186/1471-2105-7-62
Article PubMed PubMed Central CAS Google Scholar
Salamov AA, Solovyev VV (2000) Ab initio gene finding in Drosophila genomic DNA. Genome Res 10(4):516–522
Article CAS PubMed PubMed Central Google Scholar
Birney E, Clamp M, Durbin R (2004) GeneWise and Genomewise. Genome Res 14(5):988–995. https://doi.org/10.1101/gr.1865504
Article PubMed PubMed Central CAS Google Scholar
Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK Jr, Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD, Salzberg SL, White O (2003) Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res 31(19):5654–5666
Article CAS PubMed PubMed Central Google Scholar
Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, Orvis J, White O, Buell CR, Wortman JR (2008) Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol 9(1):R7. https://doi.org/10.1186/gb-2008-9-1-r7
Article PubMed PubMed Central CAS Google Scholar
Holt C, Yandell M (2011) MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12:491. https://doi.org/10.1186/1471-2105-12-491
Article PubMed PubMed Central Google Scholar
Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, Potter SC, Punta M, Qureshi M, Sangrador-Vegas A, Salazar GA, Tate J, Bateman A (2016) The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res 44(D1):D279–D285. https://doi.org/10.1093/nar/gkv1344
Article PubMed CAS Google Scholar
Petersen TN, Brunak S, von Heijne G, Nielsen H (2011) SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods 8(10):785–786. https://doi.org/10.1038/nmeth.1701
Article PubMed CAS Google Scholar
Claudel-Renard C, Chevalet C, Faraut T, Kahn D (2003) Enzyme-specific profiles for genome annotation: PRIAM. Nucleic Acids Res 31(22):6633–6639
Article CAS PubMed PubMed Central Google Scholar
Lowe TM, Eddy SR (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25(5):955–964
Article CAS PubMed PubMed Central Google Scholar
Yang JH, Zhang XC, Huang ZP, Zhou H, Huang MB, Zhang S, Chen YQ, Qu LH (2006) snoSeeker: an advanced computational package for screening of guide and orphan snoRNA genes in the human genome. Nucleic Acids Res 34(18):5112–5123. https://doi.org/10.1093/nar/gkl672
Article PubMed PubMed Central CAS Google Scholar
An J, Lai J, Lehman ML, Nelson CC (2013) miRDeep*: an integrated application tool for miRNA identification from RNA sequencing data. Nucleic Acids Res 41(2):727–737. https://doi.org/10.1093/nar/gks1187
Article PubMed CAS Google Scholar
Hackenberg M, Rodriguez-Ezpeleta N, Aransay AM (2011) miRanalyzer: an update on the detection and analysis of microRNAs in high-throughput sequencing experiments. Nucleic Acids Res 39(Web Server issue):W132–W138. https://doi.org/10.1093/nar/gkr247
Article PubMed PubMed Central CAS Google Scholar
Sebastian B, Aggrey SE (2008) Specificity and sensitivity of PROMIR, ERPIN and MIR-ABELA in predicting pre-microRNAs in the chicken genome. In Silico Biol 8(5–6):377–381
PubMed CAS Google Scholar
Wang X, Zhang J, Li F, Gu J, He T, Zhang X, Li Y (2005) MicroRNA identification based on sequence and structure alignment. Bioinformatics 21(18):3610–3614. https://doi.org/10.1093/bioinformatics/bti562
Article PubMed CAS Google Scholar
Majoros WH, Pertea M, Salzberg SL (2004) TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20(16):2878–2879. https://doi.org/10.1093/bioinformatics/bth315
Article PubMed CAS Google Scholar
Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L (2012) Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7(3):562–578. https://doi.org/10.1038/nprot.2012.016
Article PubMed PubMed Central CAS Google Scholar
Koonin EV, Fedorova ND, Jackson JD, Jacobs AR, Krylov DM, Makarova KS, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Rogozin IB, Smirnov S, Sorokin AV, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA (2004) A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol 5(2):R7. https://doi.org/10.1186/gb-2004-5-2-r7
Article PubMed PubMed Central Google Scholar
Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M (2006) From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 34(Database issue):D354–D357. https://doi.org/10.1093/nar/gkj102
Article PubMed CAS Google Scholar
Quevillon E, Silventoinen V, Pillai S, Harte N, Mulder N, Apweiler R, Lopez R (2005) InterProScan: protein domains identifier. Nucleic Acids Res 33(Web Server issue):W116–W120. https://doi.org/10.1093/nar/gki442
Article PubMed PubMed Central CAS Google Scholar
Emms DM, Kelly S (2015) OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol 16:157. https://doi.org/10.1186/s13059-015-0721-2
Article PubMed PubMed Central CAS Google Scholar
Li L, Stoeckert CJ Jr, Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13(9):2178–2189. https://doi.org/10.1101/gr.1224503
Article PubMed PubMed Central CAS Google Scholar
Fischer S, Brunk BP, Chen F, Gao X, Harb OS, Iodice JB, Shanmugam D, Roos DS, Stoeckert CJ Jr (2011) Using OrthoMCL to assign proteins to OrthoMCL-DB groups or to cluster proteomes into new ortholog groups. Curr Protoc Bioinformatics Chapter 6:Unit 6. 12 11–19. https://doi.org/10.1002/0471250953.bi0612s35
Laslett D, Canback B (2008) ARWEN: a program to detect tRNA genes in metazoan mitochondrial nucleotide sequences. Bioinformatics 24(2):172–175. https://doi.org/10.1093/bioinformatics/btm573
Article PubMed CAS Google Scholar
Gautheret D, Lambert A (2001) Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles. J Mol Biol 313(5):1003–1011. https://doi.org/10.1006/jmbi.2001.5102
Article PubMed CAS Google Scholar
Nawrocki EP, Burge SW, Bateman A, Daub J, Eberhardt RY, Eddy SR, Floden EW, Gardner PP, Jones TA, Tate J, Finn RD (2015) Rfam 12.0: updates to the RNA families database. Nucl Acids Res 43(D1):D130–D137. https://doi:10.1093/nar/gku1063
Article PubMed PubMed Central CAS Google Scholar
Parra G, Bradnam K, Korf I (2007) CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23(9):1061–1067. https://doi.org/10.1093/bioinformatics/btm071
Article PubMed CAS Google Scholar
Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31(19):3210–3212. https://doi.org/10.1093/bioinformatics/btv351
Article PubMed CAS Google Scholar

Download references

Acknowledgment

The work conducted by the US Department of Energy Joint Genome Institute, a DOE Office of Science User Facility, is supported under Contract No. DE-AC02-05CH11231.

Author information

Authors and Affiliations

United States Department of Energy Joint Genome Institute, Walnut Creek, CA, USA
Sajeet Haridas, Asaf Salamov & Igor V. Grigoriev

Authors

Sajeet Haridas
View author publications
You can also search for this author in PubMed Google Scholar
Asaf Salamov
View author publications
You can also search for this author in PubMed Google Scholar
Igor V. Grigoriev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Igor V. Grigoriev .

Editor information

Editors and Affiliations

Fungal Physiology, Westerdijk Fungal, Biodiversity Institute, Utrecht, The Netherlands
Ronald P. de Vries
Centre for Structural and Functional Genomics, Concordia University, Montreal, Québec, Canada
Adrian Tsang
US Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
Igor V. Grigoriev

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Haridas, S., Salamov, A., Grigoriev, I.V. (2018). Fungal Genome Annotation. In: de Vries, R., Tsang, A., Grigoriev, I. (eds) Fungal Genomics. Methods in Molecular Biology, vol 1775. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7804-5_15

Download citation

DOI: https://doi.org/10.1007/978-1-4939-7804-5_15
Published: 07 June 2018
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-7803-8
Online ISBN: 978-1-4939-7804-5
eBook Packages: Springer Protocols

Publish with us

Policies and ethics