Web Scraping for preparing training data for Machine Translation

October 3, 2020 in Blog

During my internship at Birch.ai, we want to test and improve the performance of our machine translation model. To achieve this, we first need to obtain a foreign language data set and its corresponding English data set, and hope that these data sets are translated by humans rather than machine translations. In this article, I would share my experience in scraping The New England Journal of Medicine from scratch using Python.

Web Scraping for preparing training data for Machine Translation

Linlin Li