skip to main content
10.1145/956750.956789acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Efficient elastic burst detection in data streams

Published:24 August 2003Publication History

ABSTRACT

Burst detection is the activity of finding abnormal aggregates in data streams. Such aggregates are based on sliding windows over data streams. In some applications, we want to monitor many sliding window sizes simultaneously and to report those windows with aggregates significantly different from other periods. We will present a general data structure for detecting interesting aggregates over such elastic windows in near linear time. We present applications of the algorithm for detecting Gamma Ray Bursts in large-scale astrophysical data. Detection of periods with high volumes of trading activities and high stock price volatility is also demonstrated using real time Trade and Quote (TAQ) data from the New York Stock Exchange (NYSE). Our algorithm beats the direct computation approach by several orders of magnitude.

References

  1. http://www.lanl.gov/milagro/,2003.Google ScholarGoogle Scholar
  2. R. Agrawal, C. Faloutsos, and A. N. Swami. Efficient Similarity Search In Sequence Databases. In D. Lomet, editor, Proceedings of the 4th International Conference of Foundations of Data Organization and Algorithms (FODO), pages 69--84, Chicago, Illinois, 1993. Springer Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In Proceedings of the Twenty-first ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 3--5, Madison, Wisconsin, USA, pages 55--68. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, and S. B. Zdonik. Monitoring streams - a new class of data management applications. In VLDB 2002, Proceedings of 28th International Conference on Very Large Data Bases, August 20--23, 2002, Hong Kong, China, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Chakrabarti, M. N. Garofalakis, R. Rastogi, and K. Shim. Approximate query processing using wavelets. In VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, September 10--14, 2000, Cairo, Egypt, pages 111--122, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K.-P. Chan and A. W.-C. Fu. Efficient time series matching by wavelets. In Proceedings of the 15th International Conference on Data Engineering, Sydney, Australia, pages 126--133, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. In Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 6--8, 2002, San Francisco, CA, USA. ACM/SIAM, 2002, pages 635--644, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Atkins et. al. (The Milagro Collaboration). Evidence for TeV emission from GRB 970417a. In Ap. J. Lett. 533, L119, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  9. A. J. Smith for the Milagro Collaboration. A search for bursts of tev gamma rays with milagro. In Proceedings of the 27th International Cosmic Ray Conference (ICRC 2001), 07--15 August 2001, Hamburg, Germany, 2001.Google ScholarGoogle Scholar
  10. V. Ganti, J. Gehrke, and R. Ramakrishnan. Demon: Data evolution and monitoring. In Proceedings of the 16th International Conference on Data Engineering, San Diego, California, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  11. J. Gehrke, F. Korn, and D. Srivastava. On computing correlated aggregates over continual data streams. In Proc. ACM SIGMOD International Conf. on Management of Data, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In VLDB 2001, pages 79--88. Morgan Kaufmann, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 97--106. ACM Press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. V. Jagadish, N. Koudas, and S. Muthukrishnan. Mining deviants in a time series database. In M. P. Atkinson, M. E. Orlowska, P. Valduriez, S. B. Zdonik, and M. L. Brodie, editors, VLDB'99, Proceedings of 25th International Conference on Very Large Data Bases, September 7--10, 1999, Edinburgh, Scotland, UK, pages 102--113. Morgan Kaufmann, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. E. Keogh, S. Lonardi, and B. Y. Chiu. Finding surprising patterns in a time series database in linear time and space. In the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 23--26, 2002. Edmonton, Alberta, Canada, pages 550--556, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Kleinberg. Bursty and hierarchical structure in streams. In the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 23--26, 2002. Edmonton, Alberta, Canada, pages 91--101, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Matias, J. S. Vitter, and M. Wang. Wavelet-based histograms for selectivity estimation. In L. M. Haas and A. Tiwary, editors, SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2--4, 1998, Seattle, Washington, USA, pages 448--459, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Samet. The quadtree and related hierarchical data structures. ACM Computing Surveys, 16(2):187--260, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Shahabi, X. Tian, and W. Zhao. Tsa-tree: A wavelet-based approach to improve the efficiency of multi-level surprise and trend queries on time-series data. In 12th International Conference on Scientific and Statistical Database Management (SSDBM'00), July 26--28, 2000, Berlin, Germany, pages 55--68, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. S. Vitter and M. Wang. Approximate computation of multidimensional aggregates of sparse data using wavelets. In SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, June 1--3, 1999, Philadephia, Pennsylvania, USA, pages 193--204, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Wang, T. M. Madhyastha, N. H. Chan, S. Papadimitriou, and C. Faloutsos. Data mining meets performance evaluation: Fast algorithms for modeling bursty traffic. In ICDE 2002, 18th International Conference on Data Engineering, February 26-March 1, 2002, San Jose, California, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Zhu and D. Shasha. Statstream: Statistical monitoring of thousands of data streams in real time. In VLDB 2002, Proceedings of 28th International Conference on Very Large Data Bases, August 20--23, 2002, Hong Kong, China, pages 358--369, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient elastic burst detection in data streams

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2003
      736 pages
      ISBN:1581137370
      DOI:10.1145/956750

      Copyright © 2003 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 August 2003

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      KDD '03 Paper Acceptance Rate46of298submissions,15%Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader