Show simple item record

dc.contributor.advisorOsborne, Miles
dc.contributor.authorBecker, Markus
dc.date.accessioned2008-05-22T13:27:57Z
dc.date.available2008-05-22T13:27:57Z
dc.date.issued2008-06-24
dc.identifier.urihttp://hdl.handle.net/1842/2219
dc.descriptionInstitute for Communicating and Collaborative Systems
dc.description.abstractActive learning reduces annotation costs for supervised learning by concentrating labelling efforts on the most informative data. Most active learning methods assume that the model structure is fixed in advance and focus upon improving parameters within that structure. However, this is not appropriate for natural language processing where the model structure and associated parameters are determined using labelled data. Applying traditional active learning methods to natural language processing can fail to produce expected reductions in annotation cost. We show that one of the reasons for this problem is that active learning can only select examples which are already covered by the model. In this thesis, we better tailor active learning to the need of natural language processing as follows. We formulate the Unreliable Parameter Principle: Active learning should explicitly and additionally address unreliably trained model parameters in order to optimally reduce classification error. In order to do so, we should target both missing events and infrequent events. We demonstrate the effectiveness of such an approach for a range of natural language processing tasks: prepositional phrase attachment, sequence labelling, and syntactic parsing. For prepositional phrase attachment, the explicit selection of unknown prepositions significantly improves coverage and classification performance for all examined active learning methods. For sequence labelling, we introduce a novel active learning method which explicitly targets unreliable parameters by selecting sentences with many unknown words and a large number of unobserved transition probabilities. For parsing, targeting unparseable sentences significantly improves coverage and f-measure in active learning.en
dc.contributor.sponsorEngineering and Physical Sciences Research Council (EPSRC)en
dc.format.extent926993 bytes
dc.format.mimetypeapplication/pdf
dc.language.isoenen
dc.relation.hasversionMarkus Becker, Miles Osborne: A two-stage method for active learning of statistical grammars. IJCAI 2005en
dc.subjectInformaticsen
dc.subjectComputer Scienceen
dc.subjectselective samplingen
dc.subjectmachine learningen
dc.subjectnatural language processingen
dc.subjectactive learningen
dc.titleActive Learning - An Explicit Treatment of Unreliable Parametersen
dc.typeThesis or Dissertationen
dc.type.qualificationlevelDoctoralen
dc.type.qualificationnamePhD Doctor of Philosophyen


Files in this item

This item appears in the following Collection(s)

Show simple item record