This notebook is meant to automate/simplify downloading the CORD-19 dataset from kaggle.com.

Making sure that all requirements are installed.

 

Preparing destination folder

download_cord19_dataset[source]

download_cord19_dataset(target_path='./datasets/CORD-19-research-challenge')

download_cord19_dataset()
Authenticating with kaggle.com...
Proceeding to download the dataset. This might take a while.
  0%|          | 0.00/4.37G [00:00<?, ?B/s]
Downloading CORD-19-research-challenge.zip to datasets/CORD-19-research-challenge
100%|██████████| 4.37G/4.37G [11:30<00:00, 6.79MB/s]

/Users/lmarti/.pyenv/versions/3.8.2/envs/risotto/lib/python3.8/site-packages/IPython/core/interactiveshell.py:3263: DtypeWarning: Columns (1,5,6,13,14,15,16) have mixed types.Specify dtype option on import or set low_memory=False.
  if (await self.run_code(code, result,  async_=asy)):
Total records loaded: 187938