Resources

Dataset

Dataset for Japanese Short Answer Scoring
MED
- Hitomi Yanaka, Koji Mineshima, Daisuke Bekki, Kentaro Inui, Satoshi Sekine, Lasha Abzianidze, and Johan Bos. MED: Monotonicity Entailment Dataset. In Proceedings of ACL2019 Workshop BlackboxNLP: Analyzing and interpreting neural networks for NLP (BlackboxNLP 2019).
HELP
- Yanaka, Hitomi and Mineshima, Koji and Bekki, Daisuke and Inui, Kentaro and Sekine, Satoshi and Abzianidze, Lasha and Bos, Johan. HELP: A Dataset for Handling Entailments with Lexical and Logical Phenomena. In Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019).
Right-for-Right-Reasons Reading Comprehension (R4C)
- Naoya Inoue, Pontus Stenetorp and Kentaro Inui. R4C: A Benchmark for Evaluating RC Systems to Get the Right Answer for the Right Reason. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL2020).
GitHub Typo Corpus
- Masato Hagiwara and Masato Mita. GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors. In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020).
PheMT
- Ryo Fujii, Masato Mita, Kaori Abe, Kazuaki Hanawa, Makoto Morishita, Jun Suzuki, Kentaro Inui. PheMT: A Phenomenon-wise Dataset for Machine Translation Robustness on User-Generated Contents. In Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020).
COPA-SSE
- Ana Brassard and Benjamin Heinzerling and Pride Kavumba and Kentaro Inui. COPA-SSE: Semi-structured Explanations for Commonsense Reasoning. Preprint arXiv:2201.06777

Code

ELMo-Japanese
GEC Cross-Corpora Evaluation
- Masato Mita, Tomoya Mizumoto, Masahiro Kaneko, Ryo Nagata, Kentaro Inui. Cross-Corpora Evaluation and Analysis of Grammatical Error Correction Models — Is Single-Corpus Evaluation Enough? In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019).
Pseudo Data for GEC
- Shun Kiyono, Jun Suzuki, Masato Mita, Tomoya Mizumoto and Kentaro Inui. An Empirical Study of Incorporating Pseudo Data into Grammatical Error Correction. In Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019).
BERT-GEC
- Masahiro Kaneko, Masato Mita, Shun Kiyono, Jun Suzuki, Kentaro Inui. Can Encoder-decoder Models Benefit from Pre-trained Language Representation in Grammatical Error Correction? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020).
Instance-Based Named Entity Recognizer
- Hiroki Ouchi, Jun Suzuki, Sosuke Kobayashi, Sho Yokoi, Tatsuki Kuribayashi, Ryuto Konno, Kentaro Inui. Instance-Based Learning of Span Representations: A Case Study through Named Entity Recognition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020).
LM-as-KB
- Benjamin Heinzerling and Kentaro Inui. 2021. Language Models as Knowledge Bases: On Entity Representations, Storage Capacity, and Paraphrased Queries. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021).
Shifted Absolute Position Embedding (SHAPE)
- Shun Kiyono, Sosuke Kobayashi, Jun Suzuki, Kentaro Inui. SHAPE: Shifted Absolute Position Embeddings for Transformers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021).
SUQA: Summarizer-Augmented QA
- Naoya Inoue, Harsh Trivedi, Steven Sinha, Niranjan Balasubramanian and Kentaro Inui. Summarize-then-Answer: Generating Concise Explanations for Multi-hop Reading Comprehension. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP2021).
Instance-Based Dependency Parser
- Hiroki Ouchi, Jun Suzuki, Sosuke Kobayashi, Sho Yokoi, Tatsuki Kuribayashi, Masashi Yoshikawa, Kentaro Inui. Instance-Based Neural Dependency Parsing. Transactions of the Association for Computational Linguistics 2021 (TACL 2021).
Feedback Comment Generation
- Kazuaki Hanawa, Ryo Nagata, Kentaro Inui. Exploring Methods for Generating Feedback Comments for Writing Learning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021).

NLU Resources

RIKEN AIP Natural Language Understanding Team

Resources

Dataset

Code