


向心理论是一个有关语篇连贯和显著性的理论。自提出以来,该理论已被广泛的运用于指代消解,语篇连贯性分析等领域。但是向心理论的提出者,为了使该理论能被广泛地用于解释不同语言中的语言现象,并未对向心理论中的核心概念,如语句(utterance),前瞻中心(C_f)的排序(ranking),实现(realization),以及R1代词的定义进行严格的限定。在具体的指代消解算法实现中,需要对上述概念分别进行参数化的分析,以获得相对于具体指代消解任务的最佳参数设定。 本文在向心理论参数化研究的基础上,综合分析了各种基于向心理论的指代消解算法,并对向心理论在汉语指代消解中参数设定进行了实证性研究。 本文收集了来自三种语篇类型的共三万多字的语料,其中名词短语5148个。首先我们给这些名词短语标注上语法功能以及数和性等特征,然后编写程序将语料中的名词短语信息自动存入一个Access语料库。根据研究目的的需要,本文一共设计了六个基于向心理论的指代消解算法,每个算法实现一种不同的参数设定,这些算法在数据库提供的名词信息基础上对语料中的代词和零形代词进行消解。 本文研究的向心理论的参数有:语句的定义,前瞻中心的排序,以及R1代词的选定。 对于语句的定义,我们考察了两种可能的语句定义,一是将语句定义为至少包含一个述谓结构的,用标点符号隔开的语段;二是将语句定义为句子。这两种语句划分的方法在文中分别记为Udef.1和Udef.2。研究结果表明,在Udef.1划分下,零形代词的消解正确率远远高于Udef.2。然而,在这两种语句划分对于代词消解的正确率的影响不如对零形代词消解明显。 本文考察的前瞻中心排序影响因素有:语篇实体的出现顺序,语篇实体的语法角色,语法角色平行因素,后指中心(C_b)延续,和语篇实体出现的句法层次。消解结果表明,语篇实体的语法角色,相对于语篇实体出现的线形顺序,更能精确忠实地反映语篇实体的显著性。在指代消解中引入语法角色平行因素对于代词和零形代词消解均有积极的影响,这种影响在语句按照Udef.2划分的情况下更为显著。本文研究发现将C_b的延续性倾向引入指代消解算法并没有给消解结果带来积极的影响,而且在某些参数设定下,引入C_b延续性考虑还会给指代消解带来负面影响。这说明向心理论中提出的“语篇实体连贯性”(entity coherence)并不是语篇实现连贯的唯一途径。


  • Acknowledgements
  • Abstract
  • 摘要
  • Contents
  • List of Tables
  • List of Figures
  • Chapter 1 Introduction
  • 1.1 The Subject Matter of the Dissertation
  • 1.1.1 Parameter Setting in Centering Theory
  • 1.1.2 Anaphora Resolution and NLP
  • 1.2 Research Hypothesis and Objectives
  • 1.3 Corpus and Research Methods
  • 1.4 Organization of the Disseration
  • Chapter 2 Centering Theory and its Parameter Setting
  • 2.1 Introduction of Centering Theory
  • 2.1.1 Goals and Methods
  • 2.1.2 Claims and Rules
  • 2.1.3 Examples
  • 2.2 Parametric Configuration of Centering Theory
  • 2.2.1 Main Claims
  • 2.2.2 The Parameters of Centering
  • Utterance and Previous Utterance
  • Realization
  • Ranking
  • R1-Pronouns
  • Segmentation
  • 2.2.3 A Corpus-based Comparison of Centering's Instantiations
  • 2.2.4 Main Results
  • The Vanilla Instantiation
  • Varying Parameter Setting
  • 2.2.5 Discussion
  • Parameter Setting
  • Claims of Centering Revisited
  • Chapter 3 Centering-based Anaphora Resolution Algorithms
  • 3.1 Introduction
  • 3.2 Centering-based Anaphora Resolutions in English
  • 3.2.1 The BFP Algorithm
  • 3.2.2 S-list
  • 3.2.3 Left-right Centering Algorithm
  • 3.2.4 Optimization Theory in Centering Theory
  • 3.3 Centering-based Anaphora Resolution Algorithms in Chinese
  • 3.3.1 Yeh and Chen's Centering-based Resolution Algorithm
  • 3.3.2 Wang's Centering-based Zero Anaphora Resolution
  • 3.4 Parameter Setting and Centering Claims in the Algorithms
  • 3.4.1 Parameter Setting of the Algorithms
  • 3.4.2 Basic Claims in the Algorithms
  • Chapter 4 Centering and its Parametric Instantiations in This Research
  • 4.1 Centering Theory Revisited
  • 4.2 Claims of Centering and their Roles
  • 4.3 Centering Update Units and Definition of Utterance
  • 4.3.1 Kameyama(1998):Clauses as Utterances
  • 4.3.2 The RAFT/RAPT's Extending for Complex Sentences
  • 4.3.3 Utterance Definitions Examined
  • 4.4 Ranking
  • 4.4.1 Salience Theories:A Brief Survey of Literature
  • Prince's Definition of Topic and Topicalization
  • Chafe's Definition of Subject and Topic
  • Givon:Topicality as a Continuum
  • Gundel et al.and Givenness Hierarchy
  • Ariel and Accessibility Hierarchy
  • 4.4.2 Ranking Examined
  • Linear Order vs. Grammatical Function
  • b Continuity vs. Parallelism'> Cb Continuity vs. Parallelism
  • Main/subordinate Consideration in Ranking
  • 4.5 R1-pronouns
  • 4.6 Other Parameters
  • 4.6.1 Realization
  • 4.6.2 Segmentation
  • Chapter 5 Corpus Annotation and Database Generation
  • 5.1 Introduction to Corpus Annotation and Database Establishment
  • 5.2 Annotation
  • 5.2.1 NP Types
  • 5.2.2 Grammatical Roles of Nominal Constituents
  • 5.2.3 Hierarchical Structure Information and Clause Segmentation
  • 5.2.4 Morphological Features of NPs
  • 5.2.5 Reference Types
  • 5.3 Database Structure and Generation
  • 5.4 General Information about the Corpus
  • Chapter 6 Algorithms Developed in This Research
  • 6.1 An Overview of Anaphora Resolutions in NLP
  • 6.2 General Introduction of Algorithms Designed in This Research
  • 6.2.1 Inter-algorithms Relations
  • 6.2.2 the Structure of the Algorithms
  • Eliminating Principles
  • Preference Principles
  • 6.2.3 Inner Structure of the Algorithms
  • 6.3 Algorithms Designed in This Research
  • 6.3.1 Algorithm1:Lin
  • 6.3.2 Algorithm2:Grm
  • 6.3.3 Algorithm3:Para
  • b'>6.3.4 Algorithm4:Cb
  • b'>6.3.5 Algorithm5:Para+Cb
  • 6.3.6 Algorithm6:Sub
  • Chapter 7 Data Analysis and Discussion
  • 7.1 General Introduction of Resolution Results
  • 7.2 Effects of Different Ranking-affecting Factors
  • 7.2.1 Linear Order as a Ranking Criterion
  • fs Ranking'>7.2.2 Grammatical Functions in Cfs Ranking
  • fs Ranking'>7.2.3 Parallelism of Grammatical Functions in Cfs Ranking
  • b Continuity in Cfs Ranking'>7.2.4 Cb Continuity in Cfs Ranking
  • fs Ranking'>7.2.5 Main/subordinate Distinction in Cfs Ranking
  • 7.2.6 Analysis of Resolution Results in Terms of Precision Rate
  • 7.2.7 Analysis of Resolution Results in Terms of Significance
  • 7.3 Data Analysis in Terms of the Utterance Parameter
  • 7.4 Data Analysis in Terms of R1-pronouns
  • 7.5 Data Analysis in Terms of Referring Distance
  • Chapter 8 Conclusion
  • 8.1 Major Findings of This study
  • 8.2 Theoretical Implications of the Studies
  • 8.3 Limitations of the Study
  • 8.4 Suggestions for Further Research
  • References
  • Appedendix Ⅰ.Annotated Corpus(excerpt)
  • Appedendix Ⅱ.Interface for Database Information Input
  • Appedendix Ⅲ.Program of Resolution Algorithms(excerpt)
  • Appedendix Ⅳ.Resolution Results under Udef.1
  • Appedendix Ⅴ.Resolution Results under Udef.2
