Skip to content

Commit 10085da

Browse files
committed
Revert "documents baseline comes with CRF"
This reverts commit 7ed792a.
1 parent 7ed792a commit 10085da

23 files changed

+61
-179
lines changed

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,10 @@ and [benchmark results](/docs/transformers_benchmark.md) with fine-tuning BERT).
88

99
| Model| Dataset | Precision | Recall | F1 |
1010
|-------| ------- | :---------: | :------: | :--: |
11-
|BERT-base-cased + CRF (this repo)| CONLL-2003 | 91.69 | 92.05 | 91.87 |
12-
|Roberta-base + CRF (this repo)| CoNLL-2003 | **91.88** | **93.01** |**92.44**|
13-
|BERT-base-cased + CRF (this repo)| OntoNotes 5 |89.57 | 89.45 | 89.51 |
14-
|Roberta-base + CRF (this repo)| OntoNotes 5 | **90.12** | **91.25** |**90.68**|
11+
|BERT-base-cased (this repo)| CONLL-2003 | 91.69 | 92.05 | 91.87 |
12+
|Roberta-base (this repo)| CoNLL-2003 | **91.88** | **93.01** |**92.44**|
13+
|BERT-base-cased (this repo)| OntoNotes 5 |89.57 | 89.45 | 89.51 |
14+
|Roberta-base (this repo)| OntoNotes 5 | **90.12** | **91.25** |**90.68**|
1515

1616
More [details](/docs/transformers_benchmark.md)
1717

File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

config/reader.py

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
#
2+
# @author: Allan
3+
#
4+
5+
from tqdm import tqdm
6+
from common import Sentence, Instance
7+
from typing import List
8+
import re
9+
10+
11+
class Reader:
12+
13+
def __init__(self, digit2zero:bool=True):
14+
"""
15+
Read the dataset into Instance
16+
:param digit2zero: convert the digits into 0, which is a common practice for LSTM-CRF.
17+
"""
18+
self.digit2zero = digit2zero
19+
self.vocab = set()
20+
21+
def read_txt(self, file: str, number: int = -1) -> List[Instance]:
22+
print("Reading file: " + file)
23+
insts = []
24+
with open(file, 'r', encoding='utf-8') as f:
25+
words = []
26+
ori_words = []
27+
labels = []
28+
for line in tqdm(f.readlines()):
29+
line = line.rstrip()
30+
if line == "":
31+
insts.append(Instance(Sentence(words, ori_words), labels))
32+
words = []
33+
ori_words = []
34+
labels = []
35+
if len(insts) == number:
36+
break
37+
continue
38+
ls = line.split()
39+
word, label = ls[0],ls[-1]
40+
ori_words.append(word)
41+
if self.digit2zero:
42+
word = re.sub('\d', '0', word) # replace digit with 0.
43+
words.append(word)
44+
self.vocab.add(word)
45+
labels.append(label)
46+
print("number of sentences: {}".format(len(insts)))
47+
return insts
48+
49+
50+
File renamed without changes.
File renamed without changes.

0 commit comments

Comments
 (0)