Abstract
We describe three generations of information integration systems developed at George Mason University. All three systems adopt a virtual database design: a global integration schema, a mapping between this schema and the schemas of the participating information sources, and automatic interpretation of global queries. The focus of Multiplex is rapid integration of very large, evolving, and heterogeneous collections of information sources. Fusionplex strengthens these capabilities with powerful tools for resolving data inconsistencies. Finally, Autoplex takes a more proactive approach to integration, by "recruiting" contributions to the global integration schema from available information sources. Using machine learning techniques it confronts a major cost of integration, that of mapping new sources into the global schema.
- P. Anokhin and A. Motro. Fusionplex: Resolution of data inconsistencies in the integration of heterogeneous information sources. Tech. Rep. ISETR-03-06. Dept. of Information and Software Engineering, George Mason Univ., 2003.]]Google Scholar
- J. Berlin and A. Motro. Autoplex: Automated discovery of contents for virtual databases. In Proc. of CoopIS 01, Sixth IFCIS Int. Conf. on Cooperative Information Systems. Lecture Notes in Computer Science No. 2172, pp. 108--122, 2001.]] Google ScholarDigital Library
- J. Berlin and A. Motro. TupleRank: Ranking discovered content in virtual databases. Tech. Rep. ISE-TR-02-03. Dept. of Information and Software Engineering, George Mason Univ., 2002.]]Google Scholar
- J. Berlin and A. Motro. Database schema matching using machine learning with feature selection. In Proc. of CAiSE 02, the 14th Int. Conf. on Advanced Information Systems Engineering. Lecture Notes in Computer Science No. 2348, pp. 452--466, 2002.]] Google ScholarDigital Library
- U. Dayal and H. Hwang. View definition and generalization for database integration in a mutli-database system. IEEE Transactions on Software Engineering SE-10(6), pp. 628--644, 1984.]]Google ScholarDigital Library
- A. Doan, P. Domingos, and A. Halevy. Reconciling schemas of disparate data sources: a machine-learning approach. In Proc. of ACM SIGMOD Conf. on Management of Data, pp. 509--520, 2001.]] Google ScholarDigital Library
- H. Garcia-Molina, Y. Papakonstantinou, D. Quass, A. Rajaraman, Y. Sagiv, J. D. Ullman, V. Vassalos, J. Widom. The TSIMMIS Approach to Mediation: Data Models and Languages. Journal of Intelligent Information Systems, 8(2), pp. 117--132, 1997.]] Google ScholarDigital Library
- M. R. Genesereth, A. M. Keller, and O. M. Duschka. Infomaster: An information integration system. In Proc. of ACM SIGMOD Conf. on Management of Data, pp. 539--542, 1997.]] Google ScholarDigital Library
- A. Y. Halevy. Answering queries using views: A survey. VLDB Journal, 10(4):270--294, 2001.]] Google ScholarDigital Library
- C. A. Knoblock, S. Minton, J. L. Ambite, N. Ashish, I. Muslea, A. Philpot, and S. Tejada. The Ariadne approach to Web-based information integration. Int. Journal of Cooperative Information Systems, 10(1-2):145--169, 2001.]]Google ScholarCross Ref
- A. Y. Levy, A. Rajaraman, and J. J. Ordille. Querying heterogeneous information sources using source descriptions. In Proc. of the 22nd Int. Conf. on Very Large Databases, pp. 251--262, 1996.]] Google ScholarDigital Library
- A. Levy, C. Knoblock, S. Minton, and W. Cohen. Information integration. IEEE Intelligent Systems, 13(5):12--24, 1998.]] Google ScholarDigital Library
- J. Madhavan, P. A. Bernstein, and E. Rahm. Generic schema matching with Cupid. In Proc. of the 27th Int. Conf. on Very Large Databases, pp. 49--58, 2001.]] Google ScholarDigital Library
- R. Miller, L. Haas, and M. Hernández. Schema mapping as query discovery. In Proc. of the 26th Int. Conf. on Very Large Databases, pp. 77--88, 2000.]] Google ScholarDigital Library
- A. Motro. Superviews: Virtual integration of multiple databases. IEEE Transactions on Software Engineering, SE-13(7):785--798, 1987.]] Google ScholarDigital Library
- A. Motro. Multiplex: a formal model for multidatabases and its implementation. In Proc. of Fourth Int. Workshop on Next Generation Information Technologies and Systems, Lecture Notes in Computer Science No. 1649, pp. 138--158, 1999.]] Google ScholarDigital Library
- A. Motro, P. Anokhin and A. C. Acar. Utility-based resolution of data inconsistencies. In Proc. of Int. Workshop on Information Quality in Information Systems (in conjunction with SIGMOD 2004), pp. 35--43, 2004.]] Google ScholarDigital Library
- N. F. Noy and M. A. Musen. The PROMPT suite: Interactive tools for ontology merging and mapping. International Journal of Human-Computer Studies, 59(6):983--1024, 2003.]] Google ScholarDigital Library
- E. Rahm and P. Bernstein. On matching schemas automatically. Technical Report MSR-TR-2001-17, Microsoft, 2001.]]Google Scholar
- E. A. Rundensteiner, A. Koeller, and X. Zhang. Maintaining data warehouses over changing information sources. Communications of the ACM, 43(6):57--62, 2000.]] Google ScholarDigital Library
- M. Templeton, D. Brill, S. K. Dao, E. Lund, P. Ward, A. L. P. Chen, and R. McGregor. Mermaid - A front-end to distributed heterogeneous databases. In Proc. of the IEEE, 75(5):695--708, 1987.]]Google ScholarCross Ref
Recommendations
Fusionplex: resolution of data inconsistencies in the integration of heterogeneous information sources
Fusionplex is a system for integrating multiple heterogeneous and autonomous information sources that uses data fusion to resolve factual inconsistencies among the individual sources. To accomplish this, the system relies on source features, which are ...
Autoplex: Automated Discovery of Content for Virtual Databases
CooplS '01: Proceedings of the 9th International Conference on Cooperative Information SystemsMost virtual database systems are suitable for environments in which the set of member information sources is small and stable. Consequently, present virtual database systems do not scale up very well. The main reason is the complexity and cost of ...
XML schema integration to facilitate E-commerce
Web-enabled systems integrationXML has become the de facto standard for Information Exchange protocol for e-commerce and many work group applications such as Enterprise Resource Planning (ERP). The availability of large amounts of heterogeneous distributed web data necessitates the ...
Comments