This work was presented at ACL2020.
This project attempted to automatically predict CEFR grades of L2 English speakers from ASR transcriptions of a business English exam. The motivation behind working in a no-audio scenario is to simulate working with smart speaker systems. APIs provided by smart speakers (such as Alexa and Google Home) only provide transcriptions to third parties for privacy reasons.
The two key experiments performed in the paper are:
The auxilliary objectives investigated in this work were:
The paper also provides some analysis on the impact of filled pauses and ASR word error rate on the performance of a speech grader.
The speech graders were built using Python, PyTorch and the HuggingFace transformers library.