ABSTRACT
Burst detection is the activity of finding abnormal aggregates in data streams. Such aggregates are based on sliding windows over data streams. In some applications, we want to monitor many sliding window sizes simultaneously and to report those windows with aggregates significantly different from other periods. We will present a general data structure for detecting interesting aggregates over such elastic windows in near linear time. We present applications of the algorithm for detecting Gamma Ray Bursts in large-scale astrophysical data. Detection of periods with high volumes of trading activities and high stock price volatility is also demonstrated using real time Trade and Quote (TAQ) data from the New York Stock Exchange (NYSE). Our algorithm beats the direct computation approach by several orders of magnitude.
- http://www.lanl.gov/milagro/,2003.Google Scholar
- R. Agrawal, C. Faloutsos, and A. N. Swami. Efficient Similarity Search In Sequence Databases. In D. Lomet, editor, Proceedings of the 4th International Conference of Foundations of Data Organization and Algorithms (FODO), pages 69--84, Chicago, Illinois, 1993. Springer Verlag. Google ScholarDigital Library
- B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In Proceedings of the Twenty-first ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 3--5, Madison, Wisconsin, USA, pages 55--68. ACM, 2002. Google ScholarDigital Library
- D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, and S. B. Zdonik. Monitoring streams - a new class of data management applications. In VLDB 2002, Proceedings of 28th International Conference on Very Large Data Bases, August 20--23, 2002, Hong Kong, China, 2002. Google ScholarDigital Library
- K. Chakrabarti, M. N. Garofalakis, R. Rastogi, and K. Shim. Approximate query processing using wavelets. In VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, September 10--14, 2000, Cairo, Egypt, pages 111--122, 2000. Google ScholarDigital Library
- K.-P. Chan and A. W.-C. Fu. Efficient time series matching by wavelets. In Proceedings of the 15th International Conference on Data Engineering, Sydney, Australia, pages 126--133, 1999. Google ScholarDigital Library
- M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. In Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 6--8, 2002, San Francisco, CA, USA. ACM/SIAM, 2002, pages 635--644, 2002. Google ScholarDigital Library
- R. Atkins et. al. (The Milagro Collaboration). Evidence for TeV emission from GRB 970417a. In Ap. J. Lett. 533, L119, 2000.Google ScholarCross Ref
- A. J. Smith for the Milagro Collaboration. A search for bursts of tev gamma rays with milagro. In Proceedings of the 27th International Cosmic Ray Conference (ICRC 2001), 07--15 August 2001, Hamburg, Germany, 2001.Google Scholar
- V. Ganti, J. Gehrke, and R. Ramakrishnan. Demon: Data evolution and monitoring. In Proceedings of the 16th International Conference on Data Engineering, San Diego, California, 2000.Google ScholarCross Ref
- J. Gehrke, F. Korn, and D. Srivastava. On computing correlated aggregates over continual data streams. In Proc. ACM SIGMOD International Conf. on Management of Data, 2001. Google ScholarDigital Library
- A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In VLDB 2001, pages 79--88. Morgan Kaufmann, 2001. Google ScholarDigital Library
- G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 97--106. ACM Press, 2001. Google ScholarDigital Library
- H. V. Jagadish, N. Koudas, and S. Muthukrishnan. Mining deviants in a time series database. In M. P. Atkinson, M. E. Orlowska, P. Valduriez, S. B. Zdonik, and M. L. Brodie, editors, VLDB'99, Proceedings of 25th International Conference on Very Large Data Bases, September 7--10, 1999, Edinburgh, Scotland, UK, pages 102--113. Morgan Kaufmann, 1999. Google ScholarDigital Library
- E. Keogh, S. Lonardi, and B. Y. Chiu. Finding surprising patterns in a time series database in linear time and space. In the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 23--26, 2002. Edmonton, Alberta, Canada, pages 550--556, 2002. Google ScholarDigital Library
- J. Kleinberg. Bursty and hierarchical structure in streams. In the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 23--26, 2002. Edmonton, Alberta, Canada, pages 91--101, 2002. Google ScholarDigital Library
- Y. Matias, J. S. Vitter, and M. Wang. Wavelet-based histograms for selectivity estimation. In L. M. Haas and A. Tiwary, editors, SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2--4, 1998, Seattle, Washington, USA, pages 448--459, 1998. Google ScholarDigital Library
- H. Samet. The quadtree and related hierarchical data structures. ACM Computing Surveys, 16(2):187--260, 1984. Google ScholarDigital Library
- C. Shahabi, X. Tian, and W. Zhao. Tsa-tree: A wavelet-based approach to improve the efficiency of multi-level surprise and trend queries on time-series data. In 12th International Conference on Scientific and Statistical Database Management (SSDBM'00), July 26--28, 2000, Berlin, Germany, pages 55--68, 2000. Google ScholarDigital Library
- J. S. Vitter and M. Wang. Approximate computation of multidimensional aggregates of sparse data using wavelets. In SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, June 1--3, 1999, Philadephia, Pennsylvania, USA, pages 193--204, 1999. Google ScholarDigital Library
- M. Wang, T. M. Madhyastha, N. H. Chan, S. Papadimitriou, and C. Faloutsos. Data mining meets performance evaluation: Fast algorithms for modeling bursty traffic. In ICDE 2002, 18th International Conference on Data Engineering, February 26-March 1, 2002, San Jose, California, 2002. Google ScholarDigital Library
- Y. Zhu and D. Shasha. Statstream: Statistical monitoring of thousands of data streams in real time. In VLDB 2002, Proceedings of 28th International Conference on Very Large Data Bases, August 20--23, 2002, Hong Kong, China, pages 358--369, 2002. Google ScholarDigital Library
Index Terms
- Efficient elastic burst detection in data streams
Recommendations
EclatDS: An efficient sliding window based frequent pattern mining method for data streams
Mining frequent patterns over data streams is an interesting problem due to its wide application area. The researchers in this field have been facing two key challenges, namely reduction in runtime and memory usage. In this study, a novel method for ...
Detecting Volatility Shift in Data Streams
ICDM '14: Proceedings of the 2014 IEEE International Conference on Data MiningCurrent drift detection techniques detect a change in distribution within a stream. However, there are no current techniques that analyze the change in the rate of these detected changes. We coin the term stream volatility, to describe the rate of ...
Improved approximate detection of duplicates for data streams over sliding windows
Detecting duplicates in data streams is an important problem that has a wide range of applications. In general, precisely detecting duplicates in an unbounded data stream is not feasible in most streaming scenarios, and, on the other hand, the elements ...
Comments