CLEC收集了包括中学生、大学英语4级和6级、专业英语低年级和高年级在内的5种学生的语料一百多万词,并对言语失误进行标注。其目的就是观察各类学生的英语特征和言语失误的情况,希望通过定量和定性的方法对中国学习者英语作出较为精确的描写,为我国学生的英语教学提供有用的反馈信息。

1       CLEC语料分布

类型

词次

ST2

208088

ST3

209043

ST4

212855

ST5

214510

ST6

226106

总计

1070602

 

 

言语失误标注 原则

 1.         简单合理,易于系统操作。参与标注的人比较多,分类表过于繁复,就难于掌握。我们采取两级分类,第一级有11类:词形(fm)、动词短语(vp)、名词短语(np)、代词(pr)、形容词短语(aj)、副词(ad)、介词短语(pp)、连词(cj)、词汇(wd)、搭配(cc)、句子(sn)。每一类里再用数目字细分。如[cc]为词语搭配不当,[cc1]表示名词和名词的搭配,[cc2]表示名词和动词的搭配,[cc3]表示动词和名词的搭配,等等。

 2.         分类表的类别要适中。过粗容易统一,但信息太少,不利于分析学习者的失误/过细难以统一,容易把同一种失误归到不同类别。目前我们采取的办法是对常见的失误从细(如vpnp都有9小类),对少见的失误从粗(如cj只有两小类)。现在的分类表有61个失误码,是属于中等规模的分类表。

 3.      提供足够的失误信息(失误本身、失误类型和失误发生范围)。例如In the past, people are [vp6, 4-] kind to each other…, 失误用方括号表示,放在失误之后。 [vp6]In the past, people are [vp6,4-]kind to each other……,[vp6,4-] are vp(动词)第6种(时态)失误,4-为失误发生的范围,-表示失误的位置,4表示失误前有4个词。要联系这4个词,才能判断areare这个词用错了。

 4.      开放性。容许研究者根据需要对失误类型进行补充或进一步再分出细类。例如[sn8]为句子结构有缺陷,研究者可以对这种失误再分为若干细类来研究。这需要把sn8的失误全部检索出来,然后定出第三级的分类范畴,如sn81sn82,等等。

5.         对语体或失误的来由暂不作标注,因为这需要标注者较多的主观判断,更难以统一。

言语失误分类表(总数:61

词形

动词短语

名词短语

代词

类型

类型

类型

类型

fm1  

Spelling

vp1

pattern

np1

pattern

pr1

Reference

fm2

word building

vp2

set phrase

np2

set phrase

pr2

anticipatory it

fm3 

capitalization

vp3

agreement

np3

agreement

pr3

Agreement

 

 

vp4

finite/non-finite

np4

case

pr4

Case

 

 

vp5

non-finite

np5

countability

pr5

wh-

 

 

vp6

tense

np6

number

pr6

Indefinite

 

 

vp7

voice

np7

article

 

 

 

 

vp8

mood

np8

quantifiers

 

 

 

 

vp9

modal/auxiliary

np9

other determiners

 

 

形容词短语

副词

介词短语

连词

类型

类型

类型

类型

aj1  

pattern

ad1

order

pp1

pattern

cj1

pattern

aj2

set phrase

ad2

modification

pp2

set phrase

cj2

set phrase

aj3 

degree

ad3

degree

 

 

 

 

aj4

-ed/-ing confusion

 

 

 

 

 

 

aj5

predicative/attributive

 

 

 

 

 

 

词语

搭配

句子

 

类型

类型

类型

 

wd1

order

cc1

noun/noun

sn1

run-on sentence

 

wd2

part of speech

cc2

noun/verb

sn2

sentence fragment

 

wd3

substitution

cc3

verb/noun

sn3

dangling modifier

 

wd4

absence

cc4

adj/noun

sn4

illogical comparison

 

wd5

redundancy

cc5

verb/adv

sn5

topic prominence

 

wd6

repetition

cc6

adv/adj

sn6

Coordination

 

wd7

ambiguity

 

 

sn7

Subordination

 

 

 

 

 

sn8

structural deficiency

 

 

 

 

 

sn9

Punctuation

 

                 

标注说明

  

       

fm1

word

Spelling(拼写)

spelling, coinage, abbreviation, apostrophe

fm2

word

word building(构词)

derivation, inflection, compounding, plurality (noun), irregularity(verb), 3rd person singular form(verb), syllabification, hyphenation, word division or fusion     

 

 

fm3

word

Capitalization(大小写)

lower initial letter for upper initial letter or vice versa

vp1

vb phr

Pattern(及物性型式)

error in transitivity(vi as vt or vice versa), transitive verb pattern/ grammatical(cf Oxford advanced learner’s dictionary of current English edited by A. S. Hornby)

vp2

vb phr

set phrase(固定词组)

phrasal verb and verbal phrase: error in form or use

vp3

vb phr

Agreement(主谓一致性)

number agreement with its subject (noun or pronoun)

vp4

vb phr

finite/non-finite(定式)

finite verb for non-finite verb or vice versa

vp5

vb phr

non-finite(不定式)

infinitive error: form and use/ infinitive for participle or vice versa/ -ed participle for -ing participle or vice versa

vp6

vb phr

Tense(时态)

error in tense use within a sentence/ the sequence of tenses between sentences

vp7

vb phr 

voice (语态)

error in the use of voice: active for passive or vice versa

vp8

vb phr

Mood(语气)

error in the use of mood: imperative, subjunctive/ improper structure of conditional sentences

vp9

vb phr

modal/auxiliary(情态)

misuse of modal/auxiliary verbs/ wrong form of modal verb(or auxiliary verb) and verb combination (e.g tense form, voice form, etc)

np1

nn phr

Pattern(名词型式)

Error in combination with other words/grammatical

np2

nn phr

set phrase(固定词组)

omission or replacement of a fixed element that goes after a certain noun

np3

nn phr

Agreement(主谓一致性)

number agreement of a noun with its determiner or a word that refers to it

np4

nn phr

Case(格)

possessive case error: form or use      

np5

nn phr

Countability(可数性)

uncountable noun used as countable noun

np6

nn phr

Number(数)

countable noun used with no determiner or -s/  a or -s with plural noun

np7

nn phr

Article(冠词)

a/an confusion or definite/indefinite confusion

np8

nn phr

Quantifiers(数量词)

misuse or confusion between many/much, (a) few/(a) little, some/any, etc

np9

nn phr

other determiners(其他限定词)

misuse or confusion of demonstratives, wh- determiners, numerals, etc.

pr1

pron

Reference(指称)

incorrect/ambiguous pronoun reference/anaphoric

pr2

pron

anticipatory it(先行it

improper or wrong use of anticipatory it /  it replaced by a demonstrative, etc

pr3

pron

Agreement(主谓一致性)

number agreement with a noun it refers to

pr4

pron

Case(格)

case error of any personal pronoun

pr5

pron

wh-wh-代词)

misuse or confusion of interrogative, relative and conjunctive pronouns

pr6

pron

Indefinite(不定式)

misuse or confusion of indefinite pronouns such as all/both,  few/little, some/any, either/neither, etc      

aj1 

adj

Pattern(形容词型式)

error in the combination with other words/grammatical

aj2

adj

set phrase(固定词组)

error in the idiomatic use of an adjectival phrase/ omission or replacement of a fixed element that goes after a certain adjective

aj3

adj

Degree(级)

adjective degree error: form and use

aj4

adj

-ed/-ing confusion-ed/-ing混淆)

-ed adjective for -ing adjective or vice versa

aj5

adj

predicative/attributive(谓语/定语)

predicative adjective used as attributive adjective

ad1

adv

Order(词序)

improper adverb placement/wrong position

ad2

adv

Modification(修饰语)

adjective modifier used as verb modifier/ other kinds of confusion

ad3

adv

Degree(级)

adverb degree error: form and use

pp1

prep

Pattern(介词型式)

unacceptable combination with other words/grammatical

pp2

prep

set phrase(固定词组)

error in the formation or use of an idiomatic prepositional phrase

cj1

conj

Pattern(连词型式)

unacceptable combination with other words/grammatical

cj2

conj

set phrase(固定词组)

error in the formation or use of a phrase functioning as a conjunction

wd1

word

Order(词序)

misplacement of any word other than an adverb

wd2

word

part of speech(词类)

error in part of speech: right root but wrong word class

wd3

word

Substitution(替代)

error in word choice: right word class but wrong selection (any part of speech)

wd4

word

Absence(缺少)

omission of a word(any part of speech)

wd5

word

Redundancy(冗余)

oversuppliance of a word(any part of speech)

wd6

word

Repetition(重复)

unnecessary repeating of a word       

wd7

word

Ambiguity(歧义)

not clear word meaning/semantic

cc1

notional

n/n collocation(名词/名词)

improper noun(phrase) and noun(phrase) combination/semantic

cc2

notional

n/v collocation(名词/动词)

improper noun(phrase) and verb(phrase) combination/semantic

cc3

notional

v/n collocation(动词/名词)

improper verb and noun(phrase) combination/semantic

cc4

notional

a/n collocation(形容词/名词)

improper adjective and noun(phrase) combination/semantic

cc5

notional

v/ad collocation(动词/副词)

improper verb and adverb (or ad/v) combination/semantic

cc6

notional

ad/a collocation(副词/形容词)

improper adverb and adjective combination/semantic

sn1

sentence

run-on sentence(不断句)

improper addition of clauses/fused sentence

sn2

sentence

sentence fragment(片段)

subordinate clause as a sentence/ any phrase as a sentence

sn3

sentence

dangling modifier(垂悬修饰语)

illogical adverbial modification of a clause

sn4

sentence

illogical comparison(比较不符合逻辑) 

error in the comparison of words or phrases in a sentence which can not be compared

sn5

sentence

topic prominence(主题突出)

the co-occurrence of an initial noun phrase and its equivalent(usually a pronoun) in the same sentence

sn6

sentence

Coordination(并列)

faulty parallelism of clauses (or words/phrases) in a sentence

sn7

sentence

Subordination(主从)

faulty attachment of a subordinate clause to the main clause

sn8

sentence

structural deficiency(结构缺陷)

error in the grammatical construction of a sentence: improper splitting, pattern shifting, confusing structure, etc

sn9

sentence

Punctuation(标点符号)

overuse, absence, choice, apostrophe, comma splice, etc.

 

标准化处理后的各种失误频数及其比例

失误类型

st2

st3

st3

st4

st5

总计

百分比(%

fm1

1928.8

2877.4

2112.6

1826.7

1686.7

10432.2

17.47

fm2

349.3

448.9

438.9

226.9

328.7

1792.7

3

fm3

1474.4

731.8

405.8

694.1

174.6

3480.7

5.83

vp1

259.4

325.9

498.4

103.4

200.8

1387.9

2.32

vp2

179

139.3

61.2

104.2

22.1

505.8

0.85

vp3

374

524.6

785.2

273.1

327

2283.9

3.82

vp4

140.8

159.1

110.8

63.9

51.6

526.2

0.88

vp5

140

118.7

107.4

89.9

46.7

502.7

0.84

vp6

1165.7

356

311.6

379.8

215.6

2428.7

4.07

vp7

172.7

104.1

98.4

63.9

46.7

485.8

0.81

vp8

27.1

16.3

8.3

25.2

11.5

88.4

0.15

vp9

111.4

274.3

278.5

42.9

86.1

793.2

1.33

np1

46.9

33.5

28.9

16.8

10.7

136.8

0.23

np2

24.7

22.4

17.4

19.3

2.5

86.3

0.14

np3

202.1

247.7

249.6

210.9

186

1096.3

1.84

np4

66.8

55.9

26.4

22.7

21.3

193.1

0.32

np5

58.9

98

71.9

60.5

84.4

373.7

0.63

np6

374

654.4

481

358.8

354.1

2222.3

3.72

np7

237.9

107.5

89.3

174.8

54.9

664.4

1.11

np8

35

65.4

47.9

13.4

7.4

169.1

0.28

np9

6.4

41.3

12.4

7.6

5.7

73.4

0.12

pr1

82

236.5

205

89.9

18.9

632.3

1.06

pr2

16.7

78.3

23.1

4.2

0

122.3

0.2

pr3

52.5

54.2

172.7

28.6

60.6

368.6

0.62

pr4

74.8

37

20.7

48.7

10.7

191.9

0.32

pr5

26.3

53.3

14.1

7.6

10.7

112

0.19

pr6

9.5

2.6

5

3.4

0

20.5

0.03

aj1

6.4

18.9

15.7

5

9

55

0.09

aj2

9.5

3.4

9.9

5.9

7.4

36.1

0.06

aj3

38.2

39.6

32.2

43.7

97.5

251.2

0.42

aj4

16.7

2.6

22.3

12.6

5.7

59.9

0.1

aj5

0.8

3.4

7.4

1.7

0

13.3

0.02

ad1

35.8

96.3

39.7

27.7

15.6

215.1

0.36

ad2

42.2

37.8

12.4

9.2

4.9

106.5

0.18

ad3

7.2

12

9.9

1.7

2.5

33.3

0.06

pp1

136.1

98

43

169.7

28.7

475.5

0.8

pp2

25.5

262.3

143.8

37

27.9

496.5

0.83

cj1

27.8

20.6

18.2

21.8

12.3

100.7

0.17

cj2

4

7.7

13.2

5.9

4.9

35.7

0.06

Wd1

43.8

151.3

114.1

25.2

37.7

372.1

0.62

Wd2

324.6

929.6

772.8

226.9

242.6

2496.5

4.18

Wd3

1102

1634.7

1815

757.1

359.8

5668.6

9.49

Wd4

585.6

829.8

443.8

403.3

427

2689.5

4.5

Wd5

410.6

613.1

518.2

265.5

171.3

1978.7

3.31

Wd6

27.1

37

22.3

34.5

29.5

150.4

0.25

Wd7

261.8

430.8

261.2

228.6

209.8

1392.2

2.33

cc1

72.4

65.4

76

23.5

36.1

273.4

0.46

cc2

35

177.1

49.6

6.7

21.3

289.7

0.49

Cc3

168.7

514.2

417.4

75.6

112.3

1288.2

2.16

Cc4

64.5

94.6

134.7

42

39.3

375.1

0.63

Cc5

23.9

40.4

29.8

5

4.1

103.2

0.17

Cc6

17.5

12

6.6

2.5

1.6

40.2

0.07

Sn1

419.3

596.8

576.9

118.5

42.6

1754.1

2.94

Sn2

424.9

389.6

303.3

132.8

76.2

1326.8

2.22

Sn3

10.3

20.6

17.4

2.5

10.7

61.5

0.1

Sn4

17.5

24.9

6.6

20.2

4.9

74.1

0.12

Sn5

9.5

14.6

17.4

2.5

4.9

48.9

0.08

Sn6

84.3

41.3

39.7

41.2

1.6

208.1

0.35

Sn7

49.3

55.9

63.6

23.5

3.3

195.6

0.33

Sn8

1103.6

446.3

862.1

493.2

231.9

3137.1

5.25

Sn9

861.7

573.6

337.2

649.5

322.9

2744.9

4.6

总计

14105.2

16160.6

13935.9

8883.4

6633.8

59718.9

100

 

按大类区分言语失误排列表

 

 

 

 

 

 

st2

st3

st4

st5

st6

总计

百分比

累积百分比

词形

3752.5

4058.1

2957.3

2747.7

2190

15705.6

26.299

26.299

词汇

2755.5

4626.3

3947.4

1941.1

1477.7

14748

24.696

50.995

句法

2980.4

2163.6

2224.2

1483.9

699

9551.1

15.993

66.988

动词

2570.1

2018.3

2259.8

1146.3

1008.1

9002.6

15.075

82.063

名词

1052.7

1326.1

1024.8

884.8

727

5015.4

8.398

90.461

搭配

382

903.7

714.1

155.3

214.7

2369.8

3.968

94.429

代词

261.8

461.9

440.6

182.4

100.9

1447.6

2.424

96.853

介词

161.6

360.3

186.8

206.7

56.6

972

1.628

98.481

形容词

71.6

67.9

87.5

68.9

119.6

415.5

0.696

99.177

副词

85.2

146.1

62

38.6

23

354.9

0.594

99.771

连词

31.8

28.3

31.4

27.7

17.2

136.4

0.228

99.999

总计

14105.2

16160.6

13935.9

8883.4

6633.8

59718.9

99.999

 

百分比

0.24

0.27

0.23

0.15

0.11

 

 

 

 

中国学习者最常见的言语失误

 

 

 

 

 

类型

st2

st3

st4

st5

st6

总计

百分比

fm1

1928.8

2877.4

2112.6

1826.7

1686.7

10432.2

17.47

wd3

1102

1634.7

1815

757.1

359.8

5668.6

9.49

fm3

1474.4

731.8

405.8

694.1

174.6

3480.7

5.83

sn8

1103.6

446.3

862.1

493.2

231.9

3137.1

5.25

sn9

861.7

573.6

337.2

649.5

322.9

2744.9

4.6

wd4

585.6

829.8

443.8

403.3

427

2689.5

4.5

wd2

324.6

929.6

772.8

226.9

242.6

2496.5

4.18

vp6

1165.7

356

311.6

379.8

215.6

2428.7

4.07

vp3

374

524.6

785.2

273.1

327

2283.9

3.82

np6

374

654.4

481

358.8

354.1

2222.3

3.72

wd5

410.6

613.1