MetadataShow full item record
Provenance is metadata about the where, the why, and the how of data. It is evidence which can answer questions such as: Where exactly did this piece of data come from? Why is this row in my result? How was it produced? Answers to these questions are useful for judging the trustworthiness of data, and for finding and correcting mistakes. Most programs that use a database at all, already use one crude form of provenance: they manually propagate row identifiers together with database values, just in case they need to be updated later. More sophisticated forms of provenance are exceedingly rare, because they are more difficult to implement manually. Tools to calculate data provenance systematically, only exist as research prototypes. Even standard database systems are hard to set up, as evidenced by the rise of hosted database services, so there is little suprise that prototypes of provenance systems are not used much. This dissertation shows how a programming language can provide support for provenance. Based on language-integrated query technology, it can systematically rewrite queries to produce various forms of provenance. We describe such query transformations for where-provenance and lineage, and discuss how to enable programmers to define their own forms of provenance. Thanks to query normalization the resulting queries still execute efficiently on mainstream database systems. A programming language can help further by giving provenance metadata precise types to ensure that it is handled appropriately. Language-integrated queries make it easy to write programs that deal with data, no special query language needed. Language-integrated provenance makes it as easy to deal with data provenance, no special database needed.