论文摘要
当前Web搜索引擎对以自然语言形式提出的查询语句分析重视不够,主要体现在对语句分词后不加区分地将所有分词结果作为检索单位,送入检索系统。由于检索前端语言分析对用户信息需求把握不准,后台各种复杂处理流程就成了无源之水,无本之木。本论文将着眼于对用户自然语言语句(以下简称Query)这一表述用户特定信息需求的独特语言现象分析,为检索系统提供一个良好的前端处理。 本文主要涉及以下四个方面的工作: (1) 对Query区分信息内容词语(information content word)和停用词语(stop word),通过过滤停用词语,有效排除那些无需在文本出现的词语对Query信息内容词语造成的干扰。Query本身是一种受限语言(controlled language),用来表达用户信息需求,并且格式相对受限。针对于此,我们进一步区分通用停用词和查询专用停用词,指出它们不同的分布特点,并提出基于左右熵(entropy)和Kullback Leibler距离的停用词表构造方法以及利用N-gram和位置信息的基于概率的动态识别方法。相关实验结果表明,这种方案比单纯基于静态停用词表的标注效果有较大提升。此部分内容建立在对20万用户提问语句的语料分析基础上。 (2) 提出Query主题词语凸显(salience)的原则,并以此作为提升Query有针对性检索的手段。用户在表述特定信息需求时,会有各种不同表述。本部分工作主要是区分信息内容词语中哪些是需要凸显的中心主题,哪些是用户不希望在检索结果中看到的内容,以防止系统误检。本部分根据检索概念是否在目标文本出现,区分必现、必须不现、可现、可以不现四种情况,分别对待。此部分内容建立在TREC和863IR测试问题集SGML语料上,在分析语料表述形式后,由正则表达式匹配,对Query实施有效的主题功能块划分,从而凸显主题
论文目录
相关论文文献
- [1].Spatial skyline query method based on Hilbert R-tree in multi-dimensional space[J]. High Technology Letters 2019(03)
- [2].Research on formalization of efficient query application problems with compound condition in software development[J]. The Journal of China Universities of Posts and Telecommunications 2017(02)
- [3].Exploring features for automatic identification of news queries through query logs[J]. Chinese Journal of Library and Information Science 2014(04)
- [4].Development of Engineering Material and Heat Treatment Inquiry System based on VBA[J]. International Journal of Plant Engineering and Management 2020(02)
- [5].Distributed GEP query optimization on grid service[J]. The Journal of China Universities of Posts and Telecommunications 2010(03)
- [6].Bottom-up mining of XML query patterns to improve XML querying[J]. Journal of Zhejiang University(Science A:An International Applied Physics & Engineering Journal) 2008(06)
- [7].Semantic composition of distributed representations for query subtopic mining[J]. Frontiers of Information Technology & Electronic Engineering 2018(11)
- [8].利用Power Query极速合并分析海量工作表[J]. 电脑知识与技术(经验技巧) 2019(04)
- [9].利用Power Query快速计算数量之和[J]. 电脑知识与技术(经验技巧) 2019(05)
- [10].An Energy-Efficient Query Based on Variable Region for Large-Scale Smart Grid[J]. 中国通信 2016(10)
- [11].Identifying user intent through query refinements[J]. Chinese Journal of Library and Information Science 2013(03)
- [12].l-SkyDiv query:Effectively improve the usefulness of skylines[J]. Science China(Information Sciences) 2010(09)
- [13].借助Power Query实现数据的快速转换[J]. 电脑知识与技术(经验技巧) 2016(02)
- [14].Design and development of real-time query platform for big data based on hadoop[J]. High Technology Letters 2015(02)
- [15].借助Power Query快速转换数据[J]. 电脑爱好者 2016(05)
- [16].An adaptive range-query optimization technique with distributed replicas[J]. Journal of Central South University 2014(01)
- [17].浅谈CSS3 Media Query的使用方法[J]. 科技资讯 2019(27)
- [18].基于Power Query的2017年天津市食品地方抽检数据分析[J]. 食品安全导刊 2018(15)
- [19].A query index for continuous queries on RFID streaming data[J]. Science in China(Series F:Information Sciences) 2008(12)
- [20].Improving SPARQL query performance with algebraic expression tree based caching and entity caching[J]. Journal of Zhejiang University-Science C(Computers & Electronics) 2012(04)
- [21].Efficient Path Query and Reasoning Method Based on Rare Axis[J]. Transactions of Tianjin University 2015(03)
- [22].Investigating the relationships between facets of work task and selection and query-related behavior[J]. Chinese Journal of Library and Information Science 2012(01)
- [23].DB Query Analyzer中的事务管理在DB2中的应用[J]. 电脑编程技巧与维护 2011(22)
- [24].Effcient Location Updates for Continuous Queries over Moving Objects[J]. Journal of Computer Science & Technology 2010(03)
- [25].Adaptive Indexing of Moving Objects with Highly Variable Update Frequencies[J]. Journal of Computer Science & Technology 2008(06)
- [26].A Designated Query Protocol for Serverless Mobile RFID Systems with Reader and Tag Privacy[J]. Tsinghua Science and Technology 2012(05)
- [27].借助Powery Query获取沪深A股的最新行情[J]. 办公自动化 2016(13)
- [28].A Processing Approach for Event-Based Location Aware Queries in Hybrid Wireless Sensor Networks[J]. Wuhan University Journal of Natural Sciences 2009(04)
- [29].利用Power Query提取混合内容中的数据[J]. 电脑知识与技术(经验技巧) 2019(11)
- [30].Cooperative Answering of Fuzzy Queries[J]. Journal of Computer Science & Technology 2009(04)
标签:分析论文; 信息检索论文; 信息需求论文; 信息内容词语论文; 停用词概括词语论文; 具体信息论文; 概念凸显论文;