Skip to main content

Extraction and Interpretation of Textual Data from Czech Insolvency Proceedings

  • Conference paper
  • First Online:
Artificial Intelligence and Soft Computing (ICAISC 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10246))

Included in the following conference series:

  • 1993 Accesses

Abstract

Recently, the Czech Insolvency Register covers about 200000 insolvency proceedings. In order to better assess the real impact of indebtedness across the Czech society, the data about creditors or reasons for debt might be of great value. Unfortunately, the vast majority of such information is contained only in scanned document copies attached to the insolvency proceedings. Therefore, this study aims at finding efficient pre-processing, clustering and classification techniques capable of extracting the wanted information from these cca 1200000 pdf-files.

The first author was partially supported by the Czech Science Foundation under Grant No. 15-04960S. The second author was partially supported by the Charles University, project GA UK No. 120616.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://www.abbyy.com/finereader/.

References

  1. Aggarwal, C.C.: Data Mining: The Textbook. Springer, Berlin (2015)

    Book  MATH  Google Scholar 

  2. Bradski, G., Kaehler, A.: Learning OpenCV. O’Reilly, Sebastopol (2008)

    Google Scholar 

  3. Chen, C.L.P., Zhang, C.Y.: Data-intensive applications, challenges, techniques and technologies. Inf. Sci. 275, 314–347 (2014)

    Article  Google Scholar 

  4. Duda, R.O., Hart, P.E.: Use of the Hough transformation to detect lines and curves in pictures. Commun. ACM 15, 11–15 (1972)

    Article  MATH  Google Scholar 

  5. Dunn, J.C.: Well-separated clusters and optimal fuzzy partitions. J. Cybern. 4(1), 95–104 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  6. Kiryati, N., Eldar, Y., Bruckstein, A.M.: A probabilistic Hough transform. Pattern Recogn. 24, 303–316 (1991)

    Article  MathSciNet  Google Scholar 

  7. Kohonen, T.: Self-Organizing Maps. Springer, Berlin (2001)

    Book  MATH  Google Scholar 

  8. Liu, B.: Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Springer, Berlin (2007)

    MATH  Google Scholar 

  9. Mrázová, I., Zvirinský, P.: Czech insolvency proceedings data: social network analysis. Procedia Comput. Sci. 61, 52–59 (2015)

    Article  Google Scholar 

  10. Patel, C., Patel, A.: Optical character recognition by open source OCR tool tesseract: a case study. Int. J. Comput. Appl. 55, 50–56 (2012)

    Google Scholar 

  11. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. JMLR 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  12. Rousseeuw, P.J.: Silhouettes: a graphic aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20(1), 53–65 (1987)

    Article  MATH  Google Scholar 

  13. Still, M.: The Definitive Guide to ImageMagick. Apress, Berkeley (2005)

    Google Scholar 

  14. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Berlin (1995)

    Book  MATH  Google Scholar 

  15. Vesanto, J., Alhoniemi, E.: Clustering of the self-organizing map. IEEE Trans. Neural Netw. 11, 586–600 (2000)

    Article  Google Scholar 

  16. Zhang, T., Oles, F.: Text categorization based on regularized linear classifiers. Inf. Retrieval 4, 5–31 (2001)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Iveta Mrázová .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Mrázová, I., Zvirinský, P. (2017). Extraction and Interpretation of Textual Data from Czech Insolvency Proceedings. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2017. Lecture Notes in Computer Science(), vol 10246. Springer, Cham. https://doi.org/10.1007/978-3-319-59060-8_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59060-8_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59059-2

  • Online ISBN: 978-3-319-59060-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics