语言测试分数的导出、报道和解释:对TEM的几点建议

语言测试分数的导出、报道和解释:对TEM的几点建议

论文题目: 语言测试分数的导出、报道和解释:对TEM的几点建议

论文类型: 博士论文

论文专业: 英语语言文学

作者: 席仲恩

导师: 邹申

关键词: 教育计量学,分标,分标分化,分数赏识化,语言测试,专业英语考试

文献来源: 上海外国语大学

发表年度: 2005

论文摘要: 英语专业四、八级考试是一项大规模的把英语作为外国语的国家级考试。考生范围涉及中华人民共和国大陆的所有全日制外语专业学生。由于社会对考试结果的认可,这项考试实质上已经成为一项高风险的外部性的综合性语言考试。可是,这项考试一直采用课堂考试中常用的原始分数进行直接分数合成和分数报道。这样,无论是不同次考试成绩间的可比性,还是同一次考试成绩的可解释性都受到限制。为了进一步提高这项考试结果解释的效度,改善考试结果的可用性,挖掘考试结果的使用价值,本文就语言测试分数的导出、报道和解释,对四、八级考试试探性地提出了几点建议。建议的主导思想是:四、八级考试是一项公共考试,是属于人民的考试,因此,它应该接受人民的监督,从而为人民提供更多、更好的服务。建议分两个层面:一般层面和技术层面。前者针对四、八级考试的有关决策部门或机构,后者针对四、八级考试的有关技术人员。本文在一般层面上的建议共六条,涉及四、八级考试的(1)目的与意图的进一步确定,(2)测试维度与测试方法的约定,(3)分数报道总体政策的确定,(4)证书的颁发,(5)分数解释人员的培训以及(6)专门网站的建立。本文在技术层面上的建议共三大条,涉及四、八级考试的(1)分标的选择,(2)结果的报道以及(3)分数的解释。关于分标选择,本文建议四、八级考试建立标志自己身份的独立分标,且建议四、八级考试采用同一个分标,分标区间可取0~1000。为了既便于分数解释又尽量满足不同使用者的需要,本文建议四、八级考试采用主分标和副分标制;为了提高分数的可解释性,本文建议四、八级考试除报道原来的原始分数之外,还报道项目标准分、百分等级分、年级当量分和四、八级分标分;为了便于分项分数的比较,本文建议四、八级考试采用赏识性分数分标作为主分标。关于分数报道,本文为四、八级考试提供了分数报道册(单)的设计蓝图,并建议四、八级考试既报道分数,也报道分数的信度、不确定度以及常模,以提高分数的可用性,并就此为四、八级考试各设计了一份蓝图。关于分数解释,本文建议公开四、八级考试的评分标准和说明。为了既便于分数解释,又便于提供更多的信息、防止考试结果的误用和滥用,本文还建议编写四、八级考试结果的使用指南。

论文目录:

Acknowledgements

摘要

Abstract

Contents

List of Tables

List of Figures

List of Acronyms and Abbreviations

Chapter 1 Introduction

1.1 Noticing the Significance

1.1.1 Test Scores and Language Related Research

1.1.2 Test Scores and Decision Making in Educational Programs

1.1.2.1 Selection and Test Scores for Selection Decisions

1.1.2.2 Placement and Test Scores for Placement Decisions

1.1.2.3 Diagnosis and Test Scores for Diagnostic Decisions

1.1.2.4 Test Scores and Program Evaluation

1.1.2.5 Minimum Competence and Test Scores in Minimum Competence Decisions

1.1.3 Test Scores and the Reliability and Uncertainty of Test Results

1.2 Identifying the Object for Research

1.2.1 Identifying Some Theoretical Problems

1.2.1.1 The Important and the Neglected

1.2.1.2 The Conflicts between Theories

1.2.2 Identifying Some Practical Needs

1.2.2.1 The Bleak Picture of Testing Practice in China

1.2.2.2 The Hopeful Future in China’s Testing Practice

1.3 Overview of the Dissertation

1.3.1 Purpose and Score of the Study

1.3.2 Study Questions

1.3.3 Overview of the Dissertation

1.4 Summary

Chapter 2 Types of Language Tests

2.1 Language Tests: Norm-Referenced and Criterion-Referenced

2.1.1 Norm-Referenced Tests

2.1.1.1 The Origin and Types of Norm-Referencing

2.1.1.2 The Distinctive Features of a Norm-Referenced Test

2.1.1.3 Purposes and Scores for Norm-Referencing

2.1.2 Criterion-Referenced Tests

2.1.2.1 The Origin and Types of Criterion-Referencing

2.1.2.2 The Distinctive Features of a Criterion-Referenced Test

2.1.2.3 Purposes of and Scores for Criterion-Referencing

2.2 Language Tests: Power and Speed

2.2.1 Power Tests

2.2.1.1 Definition and Design Features

2.2.1.2 Purpose and Score for a Power Test

2.2.2 Speed Tests

2.2.2.1 Definition and Design Features

2.2.2.2 Purpose of and Score for a Speed Test

2.3 Language Tests: Mental Power and Mental Work

2.3.1 Tests of Mental Power

2.3.1.1 Definition and Design Features

2.3.1.2 Purpose of and Score for a Test of Mental Power

2.3.2 Tests of Mental Work

2.3.2.1 Definition and Design Features

2.3.2.2 Purpose of and Score for a Test of Mental Work

2.4 Language Tests; Extensive and Intensive

2.4.1 Extensive Tests

2.4.1.1 Definition and Design Features

2.4.1.2 Purpose of and Score for a Test of Extensive Quantity

2.4.2 Intensive Tests

2.4.2.1 Definition and Design Features

2.4.2.2 Purpose of and Score for a Test of Intensive Quantity

2.5 Language Tests: Weakness Based and Strength Based

2.5.1 Definition and Design Features of the Weakness Based Tests

2.5.2 Purposes of and Score for a Weakness Based Test

2.6 Language Tests: Nominal, Ordinal, Interval, and Ratio

2.6.1 Tests at the Nominal Level of Measurement

2.6.1.1 Definition

2.6.1.2 Property of the Scale, Statistics Allowed and Common Mistakes or Misbelieves

2.6.2.Tests at the Ordinal Level of Measurement

2.6.2.1 Definition

2.6.2.2 Property of the Scale, Statistics Allowed and Common Mistakes or Misbelieves

2.6.3.Tests at the Interval Level of Measurement

2.6.3.1 Definition

2.6.3.2 Property of the Scale, Statistics Allowed and Common Mistakes or Misbelieves

2.6.4.T ests at the Ratio Level of Measurement

2.6.4.1 Definition

2.6.4.2 Property of the Scale, Statistics Allowed and Common Mistakes or Misbelieves

2.7 Summary

Chapter 3 The Derivation of Scores for Language Tests

3.1 Scale, Scaling, Score and Scoring

3.1.1 Scale

3.1.2 Scaling

3.1.3 Score

3.1.4 Scoring

3.2 Some Frequently Used Score Scales: a Critical Review

3.2.1 The Raw Score Scale

3.2.1.1 Definition and Illustration

3.2.1.2 Application(s)

3.2.1.3 Evaluating the Scale

3.2.2 The Percentile Rank Score Scale

3.2.2.1 Definition and Illustration

3.2.2.2 Application(s)

3.2.2.3 Evaluating the Scale

3.2.3 The Standard Score Scale

3.2.3.1 Definition and Illustration

3.2.3.2 Application(s)

3.2.3.3 Evaluating the Scale

3.2.4 The Grade Equivalent Score Scale

3.2.4.1 Definition and Illustration

3.2.4.2 Application(s)

3.2.4.3 Evaluating the Scale

3.2.5 The Latent Trait Score Scale

3.2.5.1 Definition and Illustration

3.2.5.2 Application(s)

3.2.5.3 Evaluating the Scale

3.3 The Standardized Item-Based Score Scale

3.3.1 Definition and Illustration

3.3.2 Application(s)

3.3.3 Evaluating the Models

3.4 Three Models for Scoring

3.4.1 Limitations of Conventional Scoring Models

3.4.2 Fundamental Considerations of Scoring Models

3.4.3 Three Scoring Models

3.4.3.1 The Power Scoring Models

3.4.3.2 The Logistic Scoring Model

3.4.3.3 Standard Uncertainty of the Generated Scores

3.4.3.4 Some General Suggestions

3.5 Summary

Chapter 4 The Reporting of Language Test Scores

4.1 Some General Considerations of Score Reporting

4.1.1 The Purposes of Testing

4.1.1.1 The Primary Purposes of Testing

4.1.1.2 The Secondary Purposes of Testing

4.1.2 The Anticipated Users of Test Results

4.1.2.1 The Non-qualified Users

4.1.2.2 The Less-qualified Users

4.1.2.3 The Well-qualified Users

4.1.3 Information on the Score Report and Information Reserved for the Supporting Documents.

4.1.3.1 Information on the Score Report

4.1.3.2 What to Be Provided in the Supporting Documents

4.2 Some Technical Considerations of Score Reporting

4.2.1 True Score, Its Estimate and the Uncertainty of the Estimate

4.2.1.1 The True Score

4.2.1.2 The Estimates of True Scores

4.2.1.3 The Uncertainty of an Estimate: Its Evaluation and Expression

4.2.1.4 The Correction for Guessing

4.2.2 The Reliability of Test Scores

4.2.2.1 The Stability of Scores

4.2.2.2 The Parallel Form Reliability

4.2.2.3 The Generalizability of Observed Scores over the Item Universe

4.2.2.4 The Generalizability of Observed Scores over the Rater Universe

4.2.2.5 The Generalizability of Observed Scores over Both the Item and the Rater Universe

4.3 Summary

Chapter 5 The Interpretation of Language Test Scores

5.1 Validity and Score Interpretation

5.1.1 The Evolving Concept of Validity

5.1.1.1 Validity as Test-Criterion Correlation

5.1.1.2 Validity as Consisting of Different Types

5.1.1.3 Validity as a Unitary Concept

5.1.2 Validity as the Appropriateness of Score Interpretation

5.2 Norms and Norm-Referenced Score Interpretation

5.2.1 Norms and Norming

5.2.1.1 Norms, Norm Groups and the Criteria for Norms

5.2.1.2 Classification of Norms

5.2.2 Interpreting Test Scores by Referencing to the Norms

5.2.2.1 Interpreting Test Scores by Referencing to the Percentile Rank Norms

5.2.2.2 Interpreting Test Scores by Referencing to the Group Average Norm.

5.2.2.3 Summary of the Section

5.3 Criterion and Criterion-Referenced Score Interpretation

5.3.1 The Criterion

5.3.1.1 Criterion as Mastery of Domain Knowledge

5.3.1.2 Criterion as Performance on Target Tasks

5.3.1.3 Criterion as Proficiency in Relation to Future Needs

5.3.2 Criterion-Referenced Score Interpretation

5.3.2.1 Interpreting the Criterion Score by Referencing to the Cut Score(s)

5.3.2.2 Interpreting the Criterion Score by Referencing to the Expectancy Table

5.3.2.3 Interpreting the Criterion Score by Referencing to Proficiency Descriptors

5.3.2.4 Interpreting the Criterion Score by Referencing to the Scoring Standards

5.4 Summary

Chapter 6 Analyzing the TEM

6.1 Background Information

6.1.1 General Background Information

6.1.2 A Brief History of TEM

6.1.2.1 A Brief History of TEM 4

6.1.2.2 A Brief History of TEM 8

6.1.3 The Growing Population of TEM

6.1.3.1 The Growing Population of TEM 4

6.1.3.2 The Growing Population of TEM 8

6.1.4 The Changing Formats of TEM

6.1.4.1 The Changing Formats of TEM 4

6.1.4.2 The Changing Formats of TEM 8

6.2 Analyzing the Structure of the TEM Test

6.2.1 The Semantic Structure of the E-TEM 4

6.2.1.1 The Surface Structure

6.2.1.2 The Deep Structure

6.2.2 The Structure of the New Generation TEM 4

6.2.3 The Structure of TEM 8

6.2.3.1 The Surface Structure of TEM 8

6.2.3.2 The Deep Structure of TEM 8

6.3 The TEM Scoring Practice and the TEM Certificates

6.3.1 Marking the TEM Tests

6.3.1.1 Machine-marking the Multiple Choice Questions

6.3.1.2 Hand-marking the Constructed Response Questions

6.3.2 Reporting the TEM Result

6.3.2.1 Reporting the TEM Score at the Individual Level

6.3.2.2 Reporting the TEM Score at the Institutional Level

6.3.3 Granting the TEM Certificates

6.4 Summary

Chapter 7 Some Recommendations for the TEM

7.1 General Recommendations for TEM

7.1.1 Purposes and Intended Uses of TEM

7.1.2.D imensionality and Testing Methods of TEM

7.1.3 Raw Score or Scale Score? Skill Scores or Total Score?

7.1.4 The TEM Certificates

7.1.5 Training Score Interpreters

7.1.6 Building an Official Website for the TEM

7.2 Technical Recommendations for TEM

7.2.1 Scoring TEM

7.2.1.1 The Dimensionality of TEM

7.2.1.2 Score Scales

7.2.2 Reporting TEM Result

7.2.2.1 The TEM Score Report

7.2.2.2 The Uncertainty and Normative Information of TEM

7.2.3 Interpreting TEM Scores

7.2.3.1 Descriptors

7.2.3.2 Users’Guide

7.3 Summary

Chapter 8 Concluding Remarks

8.1 Major Contributions

8.1.1 Theoretical Contributions

8.1.2 Practical Contributions

8.2 Limitations and Suggestions for Further Research

8.3 Summary

Bibliography

发布时间: 2006-12-30

标签:;  ;  ;  ;  ;  ;  

语言测试分数的导出、报道和解释:对TEM的几点建议
下载Doc文档

猜你喜欢