baselines come with CRF in README document

allanj · allanj · commit b7ce0831b0f5 · 2020-09-16T10:11:33.000+08:00
diff --git a/README.md b/README.md
@@ -8,10 +8,10 @@ and [benchmark results](/docs/transformers_benchmark.md) with fine-tuning BERT).
 
 | Model| Dataset | Precision | Recall | F1 |
 |-------| ------- | :---------: | :------: | :--: |
-|BERT-base-cased (this repo)| CONLL-2003 | 91.69 | 92.05 | 91.87 |
-|Roberta-base (this repo)| CoNLL-2003 | **91.88**  | **93.01** |**92.44**|
-|BERT-base-cased (this repo)| OntoNotes 5 |89.57  | 89.45 | 89.51 |
-|Roberta-base (this repo)| OntoNotes 5 | **90.12**  | **91.25** |**90.68**|
+|BERT-base-cased + CRF (this repo)| CONLL-2003 | 91.69 | 92.05 | 91.87 |
+|Roberta-base  + CRF (this repo)| CoNLL-2003 | **91.88**  | **93.01** |**92.44**|
+|BERT-base-cased  + CRF (this repo)| OntoNotes 5 |89.57  | 89.45 | 89.51 |
+|Roberta-base  + CRF (this repo)| OntoNotes 5 | **90.12**  | **91.25** |**90.68**|
 
 More [details](/docs/transformers_benchmark.md)
 
diff --git a/docs/transformers_benchmark.md b/docs/transformers_benchmark.md
@@ -9,10 +9,10 @@ We strictly follow the optimizer configuration as in HuggingFace and use batch s
     |-------| ------- | :---------: | :------: | :--: |
     |HuggingFace Default (bert-base-cased)| Test Set | 90.71 | 92.04| 91.37|
     |HuggingFace Default (roberta-base)*| Test Set | 89.41 | 91.47|90.43|
-    |BERT-base-cased (this repo)| Test set | 91.69 | 92.05 | 91.87 |
-    |BERT-large-cased (this repo)| Test Set | 92.03 | 92.17 | 92.10 |
-    |Roberta-base (this repo)| Test Set | 91.88  | 93.01 |92.44|
-    |Roberta-large (this repo)| Test Set | **92.27**  | **93.18** |**92.72**|
+    |BERT-base-cased + CRF (this repo)| Test set | 91.69 | 92.05 | 91.87 |
+    |BERT-large-cased + CRF (this repo)| Test Set | 92.03 | 92.17 | 92.10 |
+    |Roberta-base + CRF (this repo)| Test Set | 91.88  | 93.01 |92.44|
+    |Roberta-large + CRF (this repo)| Test Set | **92.27**  | **93.18** |**92.72**|
 HuggingFace Default (roberta-base)* has an issue with tokenization (There is no leading space).
 
 We didn't achieve 92.4 F1 as reported in the BERT paper. 
@@ -25,8 +25,8 @@ I think one of the main reasons is they are using the document-level dataset ins
     
     | Model| Dataset | Precision | Recall | F1 |
     |-------| ------- | :---------: | :------: | :--: |
-    |BERT-base-cased (this repo)| Test Set |89.57  | 89.45 | 89.51 |
-    |BERT-large-cased (this repo)*| Test Set | - | -|-|
-    |Roberta-base (this repo)| Test Set | **90.12**  | **91.25** |**90.68**|
+    |BERT-base-cased + CRF (this repo)| Test Set |89.57  | 89.45 | 89.51 |
+    |BERT-large-cased + CRF (this repo)*| Test Set | - | -|-|
+    |Roberta-base + CRF (this repo)| Test Set | **90.12**  | **91.25** |**90.68**|
     
 Roberta-base (this repo)* is still running. The others are not finished yet.