Skip to main content

Fungal Genome Annotation

  • Protocol
  • First Online:
Fungal Genomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1775))

Abstract

The term “genome annotation” includes identification of protein-coding and noncoding sequences (e.g., repeats, rDNA, and ncRNA) in genome assemblies and attaching functional information (metadata) to these annotated features. Here, we describe the basic outline of fungal nuclear and mitochondrial genome annotation as performed at the US Department of Energy Joint Genome Institute (JGI).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Grigoriev IV, Nikitin R, Haridas S, Kuo A, Ohm R, Otillar R, Riley R, Salamov A, Zhao X, Korzeniewski F, Smirnova T, Nordberg H, Dubchak I, Shabalov I (2014) MycoCosm portal: gearing up for 1000 fungal genomes. Nucleic Acids Res 42(Database issue):D699–D704. https://doi.org/10.1093/nar/gkt1183

    Article  CAS  PubMed  Google Scholar 

  2. Haas BJ, Zeng Q, Pearson MD, Cuomo CA, Wortman JR (2011) Approaches to fungal genome annotation. Mycology 2(3):118–141. https://doi.org/10.1080/21501203.2011.606851

    Article  PubMed  CAS  Google Scholar 

  3. Kuo A, Bushnell B, Grigoriev IV (2014) Fungal genomics: sequencing and annotation. Adv Bot Res 70:1–52. https://doi.org/10.1016/b978-0-12-397940-7.00001-x

    Article  Google Scholar 

  4. Price AL, Jones NC, Pevzner PA (2005) De novo identification of repeat families in large genomes. Bioinformatics 21(Suppl 1):i351–i358. https://doi.org/10.1093/bioinformatics/bti1018

    Article  PubMed  CAS  Google Scholar 

  5. Kent WJ (2002) BLAT – the BLAST-like alignment tool. Genome Res 12(4):656–664. https://doi.org/10.1101/gr.229202. Article published online before March 2002

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  6. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421. https://doi.org/10.1186/1471-2105-10-421

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  7. Li H, Durbin R (2010) Fast and accurate long-read alignment with burrows-wheeler transform. Bioinformatics 26(5):589–595. https://doi.org/10.1093/bioinformatics/btp698

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  8. Ter-Hovhannisyan V, Lomsadze A, Chernoff YO, Borodovsky M (2008) Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome Res 18(12):1979–1990. https://doi.org/10.1101/gr.081612.108

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  9. Korf I (2004) Gene finding in novel genomes. BMC Bioinformatics 5:59. https://doi.org/10.1186/1471-2105-5-59

    Article  PubMed  PubMed Central  Google Scholar 

  10. Stanke M, Schoffmann O, Morgenstern B, Waack S (2006) Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics 7:62. https://doi.org/10.1186/1471-2105-7-62

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  11. Salamov AA, Solovyev VV (2000) Ab initio gene finding in Drosophila genomic DNA. Genome Res 10(4):516–522

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Birney E, Clamp M, Durbin R (2004) GeneWise and Genomewise. Genome Res 14(5):988–995. https://doi.org/10.1101/gr.1865504

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  13. Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK Jr, Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD, Salzberg SL, White O (2003) Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res 31(19):5654–5666

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, Orvis J, White O, Buell CR, Wortman JR (2008) Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol 9(1):R7. https://doi.org/10.1186/gb-2008-9-1-r7

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  15. Holt C, Yandell M (2011) MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12:491. https://doi.org/10.1186/1471-2105-12-491

    Article  PubMed  PubMed Central  Google Scholar 

  16. Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, Potter SC, Punta M, Qureshi M, Sangrador-Vegas A, Salazar GA, Tate J, Bateman A (2016) The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res 44(D1):D279–D285. https://doi.org/10.1093/nar/gkv1344

    Article  PubMed  CAS  Google Scholar 

  17. Petersen TN, Brunak S, von Heijne G, Nielsen H (2011) SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods 8(10):785–786. https://doi.org/10.1038/nmeth.1701

    Article  PubMed  CAS  Google Scholar 

  18. Claudel-Renard C, Chevalet C, Faraut T, Kahn D (2003) Enzyme-specific profiles for genome annotation: PRIAM. Nucleic Acids Res 31(22):6633–6639

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Lowe TM, Eddy SR (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25(5):955–964

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Yang JH, Zhang XC, Huang ZP, Zhou H, Huang MB, Zhang S, Chen YQ, Qu LH (2006) snoSeeker: an advanced computational package for screening of guide and orphan snoRNA genes in the human genome. Nucleic Acids Res 34(18):5112–5123. https://doi.org/10.1093/nar/gkl672

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  21. An J, Lai J, Lehman ML, Nelson CC (2013) miRDeep*: an integrated application tool for miRNA identification from RNA sequencing data. Nucleic Acids Res 41(2):727–737. https://doi.org/10.1093/nar/gks1187

    Article  PubMed  CAS  Google Scholar 

  22. Hackenberg M, Rodriguez-Ezpeleta N, Aransay AM (2011) miRanalyzer: an update on the detection and analysis of microRNAs in high-throughput sequencing experiments. Nucleic Acids Res 39(Web Server issue):W132–W138. https://doi.org/10.1093/nar/gkr247

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  23. Sebastian B, Aggrey SE (2008) Specificity and sensitivity of PROMIR, ERPIN and MIR-ABELA in predicting pre-microRNAs in the chicken genome. In Silico Biol 8(5–6):377–381

    PubMed  CAS  Google Scholar 

  24. Wang X, Zhang J, Li F, Gu J, He T, Zhang X, Li Y (2005) MicroRNA identification based on sequence and structure alignment. Bioinformatics 21(18):3610–3614. https://doi.org/10.1093/bioinformatics/bti562

    Article  PubMed  CAS  Google Scholar 

  25. Majoros WH, Pertea M, Salzberg SL (2004) TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20(16):2878–2879. https://doi.org/10.1093/bioinformatics/bth315

    Article  PubMed  CAS  Google Scholar 

  26. Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L (2012) Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7(3):562–578. https://doi.org/10.1038/nprot.2012.016

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  27. Koonin EV, Fedorova ND, Jackson JD, Jacobs AR, Krylov DM, Makarova KS, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Rogozin IB, Smirnov S, Sorokin AV, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA (2004) A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol 5(2):R7. https://doi.org/10.1186/gb-2004-5-2-r7

    Article  PubMed  PubMed Central  Google Scholar 

  28. Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M (2006) From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 34(Database issue):D354–D357. https://doi.org/10.1093/nar/gkj102

    Article  PubMed  CAS  Google Scholar 

  29. Quevillon E, Silventoinen V, Pillai S, Harte N, Mulder N, Apweiler R, Lopez R (2005) InterProScan: protein domains identifier. Nucleic Acids Res 33(Web Server issue):W116–W120. https://doi.org/10.1093/nar/gki442

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  30. Emms DM, Kelly S (2015) OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol 16:157. https://doi.org/10.1186/s13059-015-0721-2

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  31. Li L, Stoeckert CJ Jr, Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13(9):2178–2189. https://doi.org/10.1101/gr.1224503

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  32. Fischer S, Brunk BP, Chen F, Gao X, Harb OS, Iodice JB, Shanmugam D, Roos DS, Stoeckert CJ Jr (2011) Using OrthoMCL to assign proteins to OrthoMCL-DB groups or to cluster proteomes into new ortholog groups. Curr Protoc Bioinformatics Chapter 6:Unit 6. 12 11–19. https://doi.org/10.1002/0471250953.bi0612s35

  33. Laslett D, Canback B (2008) ARWEN: a program to detect tRNA genes in metazoan mitochondrial nucleotide sequences. Bioinformatics 24(2):172–175. https://doi.org/10.1093/bioinformatics/btm573

    Article  PubMed  CAS  Google Scholar 

  34. Gautheret D, Lambert A (2001) Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles. J Mol Biol 313(5):1003–1011. https://doi.org/10.1006/jmbi.2001.5102

    Article  PubMed  CAS  Google Scholar 

  35. Nawrocki EP, Burge SW, Bateman A, Daub J, Eberhardt RY, Eddy SR, Floden EW, Gardner PP, Jones TA, Tate J, Finn RD (2015) Rfam 12.0: updates to the RNA families database. Nucl Acids Res 43(D1):D130–D137. https://doi:10.1093/nar/gku1063

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  36. Parra G, Bradnam K, Korf I (2007) CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23(9):1061–1067. https://doi.org/10.1093/bioinformatics/btm071

    Article  PubMed  CAS  Google Scholar 

  37. Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31(19):3210–3212. https://doi.org/10.1093/bioinformatics/btv351

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgment

The work conducted by the US Department of Energy Joint Genome Institute, a DOE Office of Science User Facility, is supported under Contract No. DE-AC02-05CH11231.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Igor V. Grigoriev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Haridas, S., Salamov, A., Grigoriev, I.V. (2018). Fungal Genome Annotation. In: de Vries, R., Tsang, A., Grigoriev, I. (eds) Fungal Genomics. Methods in Molecular Biology, vol 1775. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7804-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-7804-5_15

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-7803-8

  • Online ISBN: 978-1-4939-7804-5

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics