Today I’m sharing a COVID-19 global call for artificial intelligence talent help from a coalition of leading research groups. Public COVID-19 datasets can be found on Kaggle along with additional information on specific needed analysis tasks. The summary after my brief personal note is directly from Kaggle.

*** On a personal note, please take care of yourselves and your family. Life is a precious gift to cherish. Be extra patient and kind with one another as emotions may difficult to understand – assume the best intentions. Reach out to one another on video calls and Facetime. If you are struggling with working all alone from home for the first time, try to find joy in the little things…sitting in the sunlight, lighting a scented candle, working near your pet, opening a window, watching a birdfeeder, decorating with flowers, and/or listening to music. Try to focus on the positive and appreciate what is good in your life during this challenging time. If you can help with the Kaggle challenge, please do. ***

Summary from Kaggle

In response to the COVID-19 pandemic, the US White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19). They are issuing a call to action to the world’s artificial intelligence experts to develop text and data mining tools that can help the medical community develop answers to high priority scientific questions. The CORD-19 dataset represents the most extensive machine-readable coronavirus literature collection available for data mining to date.

By sharing the data publicly, the worldwide AI research community has an opportunity to apply text and data mining approaches to find answers to questions within, and connect insights across, this content in support of the ongoing COVID-19 response efforts worldwide. There is a growing urgency for these approaches because of the rapid increase in coronavirus literature, making it difficult for the medical community to keep up.

A list of initial key questions can be found under the Tasks section. These key scientific questions are drawn from the NASEM’s SCIED (National Academies of Sciences, Engineering, and Medicine’s Standing Committee on Emerging Infectious Diseases and 21st Century Health Threats) research topics and the World Health Organization’s R&D Blueprint for COVID-19.

Many of these questions are suitable for text mining, and we encourage researchers to develop text mining tools to provide insights into these questions.


Kaggle is sponsoring a $1,000 per task award to the winner whose submission is identified as best meeting the evaluation criteria. The winner may elect to receive this award as a charitable donation to COVID-19 relief/research efforts or as a monetary payment. More details on the prizes and timeline can be found on the discussion post.

Accessing the Dataset

We have made this dataset available on Kaggle, and are periodically updating it from its source. To learn more and access the latest copy of the dataset, you can also go here:

The licenses for each dataset can be found in the all _ sources _ metadata csv file.


The dataset was created by the Allen Institute for AI in partnership with the Chan Zuckerberg Initiative, Georgetown University’s Center for Security and Emerging Technology, Microsoft Research, and the National Library of Medicine – National Institutes of Health, in coordination with The White House Office of Science and Technology Policy.