[BSS06a] Scalability of source identification in data integration systems

Conférence Internationale avec comité de lecture : ACM/IEEE SITIS Conference, January 2006, Vol. 4879, pp.270-279, (DOI: 10.1007/978-3-642-01350-8_25)
Résumé: Given a large number of data sources, each of them being indexed by attributes from a predefined set A and given a query q over a subset Q of A with size k attributes, we are interested in identifying the set of all possible combinations of sources such that the union of their attributes covers Q. Each combination c may lead to a rewriting of q as a join over the sources in c. Furthermore, to limit redundancy and combinatorial explosion, we want the combination of sources to produce a minimal cover of Q. Although motivated by query rewriting in OpenXView, an XML data integration system with a large number of XML sources, we believe that the solutions provided in this paper apply to other scalable data integration schemes. In this paper we focus on the cases where the number of sources is very large, while the size of queries is small. We propose a novel algorithm for the computation of the set of minimal covers of a query and experimentally evaluate its performance.

Collaboration: ETIS


@inproceedings {
title="{Scalability of source identification in data integration systems}",
author=" F. Boisson and M. Scholl and I. Sebei and D. Vodislav ",
booktitle="{ACM/IEEE SITIS Conference}",