Skip to main content

Measuring Chinese-English Cross-Lingual Word Similarity with HowNet and Parallel Corpus

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6609))

Abstract

Cross-lingual word similarity (CLWS) is a basic component in cross-lingual information access systems. Designing a CLWS measure faces three challenges: (i) Cross-lingual knowledge base is rare; (ii) Cross-lingual corpora are limited; and (iii) No benchmark cross-lingual dataset is available for CLWS evaluation. This paper presents some Chinese-English CLWS measures that adopt HowNet as cross-lingual knowledge base and sentence-level parallel corpus as development data. In order to evaluate these measures, a Chinese-English cross-lingual benchmark dataset is compiled based on the Miller-Charles’ dataset. Two conclusions are drawn from the experimental results. Firstly, HowNet is a promising knowledge base for the CLWS measure. Secondly, parallel corpus is promising to fine-tune the word similarity measures using cross-lingual co-occurrence statistics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Resnik, P.: Semantic similarity in a taxonomy: An information based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1999)

    MATH  Google Scholar 

  2. Lin, D.: Automatic retrieval and clustering of similar words. In: Proc. of COLING 1998, pp. 768–774 (1998a)

    Google Scholar 

  3. Srihari, R.K., Zhang, Z.F., Rao, A.B.: Intelligent Indexing and Semantic Retrieval of Multimodal Documents. Information Retrieval 2, 245–275 (2000)

    Article  Google Scholar 

  4. Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-Based Image Retrieval at the End of the Early Years. IEEE Trans. Pattern Analysis and Machine Intelligence 22(12), 1349–1380 (2000)

    Article  Google Scholar 

  5. Miller, G.A.: WordNet: A Lexical Database for English. Communication of ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  6. Dong, Z., Dong, Q.: HowNet and the Computation of Meaning. World Scientific Publishing Co. Inc., River Edge (2006)

    Book  Google Scholar 

  7. Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proc. of IJCAI 1995, pp. 448–453 (1995)

    Google Scholar 

  8. Lin, D.: An Information-Theoretic Definition of Similarity. In: Proc. of the 15th ICML 1998, pp. 296–304 (1998b)

    Google Scholar 

  9. Liu, Q., Li, S.: Word similarity computing based on HowNet. Computational Linguistics and Chinese Language Processing 17(2), 59–76 (2002)

    Google Scholar 

  10. Dai, L., Liu, B., Xia, Y., Wu, S.: Measuring Semantic Similarity between Words Using HowNet. In: Proc. of ICCSIT 2008, pp. 601–605 (2008)

    Google Scholar 

  11. Bollegala, D., Matsuo, Y., Ishizuka, M.: Measuring Semantic Similarity between Words using Web Search Engines. In: Proc. of WWW 2007, pp. 08–12 (2007)

    Google Scholar 

  12. Cilibrasi, R., Vitanyi, P.: The Google Similarity Distance. IEEE Transactions on Knowledge and Data Engineering 19(3), 370–383 (2007)

    Article  Google Scholar 

  13. Li, Y., Bandar, Z.A., McLean, D.: An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources. IEEE Transactions on Knowledge and Data Engineering 15(4), 871–882 (2003)

    Article  Google Scholar 

  14. Nie, J.-Y., Simard, M., Isabelle, P., Durand, R.: Cross-Language Information Retrieval based on Parallel Texts and Automatic Mining of Parallel Texts from the Web. In: Proc. of SIGIR 1999, pp. 74–81 (1999)

    Google Scholar 

  15. Rapp, R.: Automatic Identification of Word Translations from Unrelated English and German Corpora. In: Proc. of ACL 1999, pp. 519–526 (1999)

    Google Scholar 

  16. Miller, G.A., Charles, W.G.: Contextual Correlates of Semantic Similarity. Language and Cognitive Processes 6(1), 1–28 (1991)

    Article  Google Scholar 

  17. Oard, D.: A Comparative Study of Query and Document Translation for Cross-Language Information Retrieval. In: Farwell, D., Gerber, L., Hovy, E. (eds.) AMTA 1998. LNCS (LNAI), vol. 1529, pp. 472–483. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  18. Pouliquen, B., Steinberger, R., Ignat, C., Käsper, E., Temnikova, I.: Multilingual and Cross-lingual News Topic Tracking. In: Proc. of COLING 2004, vol. 2, pp. 959–965 (2004)

    Google Scholar 

  19. Terra, E., Clarke, C.L.A.: Frequency Estimates for Statistical Word Similarity Measures. In: Proc. of HLT-NAACL 2003, pp. 165–172 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xia, Y., Zhao, T., Yao, J., Jin, P. (2011). Measuring Chinese-English Cross-Lingual Word Similarity with HowNet and Parallel Corpus. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6609. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19437-5_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19437-5_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19436-8

  • Online ISBN: 978-3-642-19437-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics