Extraction and Interpretation of Textual Data from Czech Insolvency Proceedings

Mrázová, Iveta; Zvirinský, Peter

doi:10.1007/978-3-319-59060-8_12

Iveta Mrázová¹⁹ &
Peter Zvirinský¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10246))

Included in the following conference series:

International Conference on Artificial Intelligence and Soft Computing

1993 Accesses

Abstract

Recently, the Czech Insolvency Register covers about 200000 insolvency proceedings. In order to better assess the real impact of indebtedness across the Czech society, the data about creditors or reasons for debt might be of great value. Unfortunately, the vast majority of such information is contained only in scanned document copies attached to the insolvency proceedings. Therefore, this study aims at finding efficient pre-processing, clustering and classification techniques capable of extracting the wanted information from these cca 1200000 pdf-files.

The first author was partially supported by the Czech Science Foundation under Grant No. 15-04960S. The second author was partially supported by the Charles University, project GA UK No. 120616.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Unsupervised identification of crime problems from police free-text data

Article Open access 07 October 2020

On Integrating and Classifying Legal Text Documents

Towards a machine understanding of Malawi legal text

Article 23 October 2021

Notes

1.
http://www.abbyy.com/finereader/.

References

Aggarwal, C.C.: Data Mining: The Textbook. Springer, Berlin (2015)
Book MATH Google Scholar
Bradski, G., Kaehler, A.: Learning OpenCV. O’Reilly, Sebastopol (2008)
Google Scholar
Chen, C.L.P., Zhang, C.Y.: Data-intensive applications, challenges, techniques and technologies. Inf. Sci. 275, 314–347 (2014)
Article Google Scholar
Duda, R.O., Hart, P.E.: Use of the Hough transformation to detect lines and curves in pictures. Commun. ACM 15, 11–15 (1972)
Article MATH Google Scholar
Dunn, J.C.: Well-separated clusters and optimal fuzzy partitions. J. Cybern. 4(1), 95–104 (1974)
Article MathSciNet MATH Google Scholar
Kiryati, N., Eldar, Y., Bruckstein, A.M.: A probabilistic Hough transform. Pattern Recogn. 24, 303–316 (1991)
Article MathSciNet Google Scholar
Kohonen, T.: Self-Organizing Maps. Springer, Berlin (2001)
Book MATH Google Scholar
Liu, B.: Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Springer, Berlin (2007)
MATH Google Scholar
Mrázová, I., Zvirinský, P.: Czech insolvency proceedings data: social network analysis. Procedia Comput. Sci. 61, 52–59 (2015)
Article Google Scholar
Patel, C., Patel, A.: Optical character recognition by open source OCR tool tesseract: a case study. Int. J. Comput. Appl. 55, 50–56 (2012)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. JMLR 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Rousseeuw, P.J.: Silhouettes: a graphic aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20(1), 53–65 (1987)
Article MATH Google Scholar
Still, M.: The Definitive Guide to ImageMagick. Apress, Berkeley (2005)
Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Berlin (1995)
Book MATH Google Scholar
Vesanto, J., Alhoniemi, E.: Clustering of the self-organizing map. IEEE Trans. Neural Netw. 11, 586–600 (2000)
Article Google Scholar
Zhang, T., Oles, F.: Text categorization based on regularized linear classifiers. Inf. Retrieval 4, 5–31 (2001)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Theoretical Computer Science and Mathematical Logic, Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic
Iveta Mrázová & Peter Zvirinský

Authors

Iveta Mrázová
View author publications
You can also search for this author in PubMed Google Scholar
Peter Zvirinský
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Iveta Mrázová .

Editor information

Editors and Affiliations

Częstochowa University of Technology, Częstochowa, Poland
Leszek Rutkowski
Częstochowa University of Technology, Częstochowa, Poland
Marcin Korytkowski
Częstochowa University of Technology, Częstochowa, Poland
Rafał Scherer
AGH University of Science and Technology, Kraków, Poland
Ryszard Tadeusiewicz
University of California, Berkeley, California, USA
Lotfi A. Zadeh
University of Louisville, Louisville, Kentucky, USA
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mrázová, I., Zvirinský, P. (2017). Extraction and Interpretation of Textual Data from Czech Insolvency Proceedings. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2017. Lecture Notes in Computer Science(), vol 10246. Springer, Cham. https://doi.org/10.1007/978-3-319-59060-8_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-59060-8_12
Published: 24 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59059-2
Online ISBN: 978-3-319-59060-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Extraction and Interpretation of Textual Data from Czech Insolvency Proceedings

Abstract

Access this chapter

Similar content being viewed by others

Unsupervised identification of crime problems from police free-text data

On Integrating and Classifying Legal Text Documents

Towards a machine understanding of Malawi legal text

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Extraction and Interpretation of Textual Data from Czech Insolvency Proceedings

Abstract

Access this chapter

Similar content being viewed by others

Unsupervised identification of crime problems from police free-text data

On Integrating and Classifying Legal Text Documents

Towards a machine understanding of Malawi legal text

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation