Considering the rising menace of coronavirus disease 2019 (COVID-19), it is essential to explore the methods and resources that might predict the case numbers expected and identify the locations of outbreaks. Hence, we have done the following study to explore the potential use of Google Trends (GT) in predicting the COVID-19 outbreak in India.
The Google search terms used for the analysis were “coronavirus”, “COVID”, “COVID 19”, “corona”, and “virus”. GTs for these terms in Google Web, News, and YouTube, and the data on COVID-19 case numbers were obtained. Spearman correlation and lag correlation were used to determine the correlation between COVID-19 cases and the Google search terms.
“Coronavirus” and “corona” were the terms most commonly used by Internet surfers in India. Correlation for the GTs of the search terms “coronavirus” and “corona” was high (
Our study revealed that GTs may predict outbreaks of COVID-19, 2 to 3 weeks earlier than the routine disease surveillance, in India. Google search data may be considered as a supplementary tool in COVID-19 monitoring and planning in India.
Coronavirus disease 2019 (COVID-19) is rapidly spreading across the globe and has become a significant public health threat to humankind infecting millions worldwide [
India has an established disease surveillance system, the Integrated Disease Surveillance Program (IDSP), to identify the signals, suspects, and cases of certain notified diseases [
Internet usage among Indians has been on the rise, reaching about 451 million (36%) active users every month, with two-thirds of them being daily users [
It has been shown that relative search volumes (RSV) of terms specific to a disease from GTs can predict outbreaks of that particular disease in India [
Our study was based on a most common search engine database used in India, Google Trends, using different keywords which the public might have used to access information on COVID-19, from January 30, 2020 to April 15, 2020. All data used in our study were available in open source, and no explicit permission was required to utilize the data.
The Google Trends homepage (
Web search is a generic search, irrespective of whether the content is images, videos, or text news. News search is specific for articles published in the media. The study period RSVs for each of the search terms were retrieved from the GTs for India [
The number of daily new confirmed cases and the cumulative confirmed cases in India were obtained for the period until April 15 from
Data were downloaded in Excel format. The analysis was done using SPSS trial version 26.0 (IBM, Armonk, NY, USA). Spearman correlation was used to determine the correlation between the daily new confirmed cases, daily cumulative cases, and the Google search terms. To establish the temporal relationships for up to 30 days, we also did a lag correlation analysis. An
Search queries have been widely used to predict disease outbreaks all over the world [
We found that the GTs from the Google Web, Google News, and YouTube strongly correlate with the cumulative and new COVID-19 case numbers. The maximum lag period for predicting COVID-19 cases was found to be 21 days with the News search for the term “coronavirus”, that is, the search volume for “coronavirus” peaked 21 days before the peak number of cases. Li et al. [
The greater lag time for India may be attributed to the fact that Indians were sensitized to the corona disease by news from China and other countries, which could have influenced their search behavior. The Internet search pattern and behavior of the population depend on the influence of various factors, such as peer groups, mass media bulletins, government actions, social media interactions, and so forth. They are among the determinants of health-seeking behavior [
In recent years, GTs have been widely explored as an option to predict various diseases. Shin et al. [
In contrast, Provenzano et al. [
However, our study based on GTs should be cautiously interpreted because it had the following limitations. We included only search terms used in the English language. India is a multi-linguistic country, but the search terms in the other major Indian languages were not accounted for in our study. The fundamental measure of association studied here is correlation, and even a strong correlation per se cannot be used as sufficient evidence for making GTs a primary tool of surveillance [
The details of the algorithm of the methodology by which this search data is generated by Google is also unclear. GTs require a large proportion of regular internet users in the country for it to be an effective predictor [
This phenomenon might have occurred in our study, as we saw a spike in searches using keywords related to COVID-19 whenever a landmark decision was taken by the WHO or the Indian government, which might have had greater media dissemination. It might have caused a disproportionate swing among the public in their internet searching patterns, and may have led to overestimation of the ground reality of the disease. On the other hand, if the general public has poor knowledge about a disease, then the epidemiological burden of that particular disease tends to be underestimated by GTs [
In conclusion, our study revealed that Google Web, You-Tube, and News might be useful to predict outbreaks of COVID-19 2 to 3 weeks earlier than the routine disease surveillance or reporting system in India. This can be further explored and tested for each state in India, using the search terms in the state specific languages. However, Google search data may be considered only as a supplementary tool in COVID-19 monitoring and planning in India until more evidence is generated on its reliability and real-time prediction efficacy. Further, positive search terms, such as “handwashing” and “masks”, which are related to public awareness, can be explored for their usefulness in assessing the effectiveness of COVID-19 transmission prevention measures at large.
No potential conflict of interest relevant to this article was reported.
Time series plots of Google Trend relative search volume (RSV) in Web search, YouTube search, and News search.
Correlation matrix of “coronavirus” and “corona” keywords used in different sub-searches with cumulative confirmed cases and daily new cases.
Lag correlation of cumulative confirmed cases (A, B, C) and daily new cases (D, E, F) with data from Google Trends.
Lag correlation coefficients and SE between Google Trends data and cumulative laboratory cases in India, January 30, 2020 to April 15, 2020
Days earlier | Web search | YouTube search | News search | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
| ||||||||||
Coronavirus | SE | Corona | SE | Coronavirus | SE | Corona | SE | Coronavirus | SE | Corona | SE | |
30 | 0.201 | 0.146 | 0.171 | 0.146 | 0.339 | 0.146 | 0.293 | 0.146 | 0.454 | 0.146 | 0.375 | 0.146 |
| ||||||||||||
29 | 0.232 | 0.144 | 0.209 | 0.144 | 0.376 | 0.144 | 0.332 | 0.144 | 0.484 | 0.144 | 0.411 | 0.144 |
| ||||||||||||
28 | 0.257 | 0.143 | 0.237 | 0.143 | 0.402 | 0.143 | 0.367 | 0.143 | 0.517 | 0.143 | 0.448 | 0.143 |
| ||||||||||||
27 | 0.294 | 0.141 | 0.268 | 0.141 | 0.443 | 0.141 | 0.412 | 0.141 | 0.552 | 0.141 | 0.479 | 0.141 |
| ||||||||||||
26 | 0.328 | 0.140 | 0.307 | 0.140 | 0.487 | 0.140 | 0.46 | 0.140 | 0.585 | 0.140 | 0.515 | 0.140 |
| ||||||||||||
25 | 0.363 | 0.139 | 0.350 | 0.139 | 0.535 | 0.139 | 0.507 | 0.139 | 0.617 | 0.139 | 0.551 | 0.139 |
| ||||||||||||
24 | 0.401 | 0.137 | 0.394 | 0.137 | 0.581 | 0.137 | 0.553 | 0.137 | 0.644 | 0.137 | 0.585 | 0.137 |
| ||||||||||||
23 | 0.439 | 0.136 | 0.440 | 0.136 | 0.624 | 0.136 | 0.595 | 0.136 | 0.673 | 0.136 | 0.620 | 0.136 |
| ||||||||||||
22 | 0.475 | 0.135 | 0.483 | 0.135 | 0.662 | 0.135 | 0.634 | 0.135 | 0.698 | 0.135 | 0.651 | 0.135 |
| ||||||||||||
21 | 0.513 | 0.134 | 0.524 | 0.134 | 0.692 | 0.134 | 0.666 | 0.134 | 0.678 | 0.134 | ||
| ||||||||||||
20 | 0.551 | 0.132 | 0.562 | 0.132 | 0.699 | 0.132 | 0.706 | 0.132 | ||||
| ||||||||||||
19 | 0.588 | 0.131 | 0.600 | 0.131 | ||||||||
| ||||||||||||
18 | 0.624 | 0.130 | 0.638 | 0.130 | ||||||||
| ||||||||||||
17 | 0.658 | 0.129 | 0.673 | 0.129 | ||||||||
| ||||||||||||
16 | 0.685 | 0.128 | ||||||||||
| ||||||||||||
15 | ||||||||||||
| ||||||||||||
14 | ||||||||||||
| ||||||||||||
13 | ||||||||||||
| ||||||||||||
12 | ||||||||||||
| ||||||||||||
11 | ||||||||||||
| ||||||||||||
10 | ||||||||||||
| ||||||||||||
9 | ||||||||||||
| ||||||||||||
8 | ||||||||||||
| ||||||||||||
7 | ||||||||||||
| ||||||||||||
6 | ||||||||||||
| ||||||||||||
5 | ||||||||||||
| ||||||||||||
4 | ||||||||||||
| ||||||||||||
3 | ||||||||||||
| ||||||||||||
2 | ||||||||||||
| ||||||||||||
1 | ||||||||||||
| ||||||||||||
0 |
Value in bold text shows high correlation with
SE: standard error.
Lag correlation coefficients and SE between Google Trends data and daily new cases in India, January 30, 2020 to April 15, 2020
Days earlier | Web search | YouTube search | News search | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
| ||||||||||
Coronavirus | SE | Corona | SE | Coronavirus | SE | Corona | SE | Coronavirus | SE | Corona | SE | |
30 | 0.188 | 0.146 | 0.155 | 0.146 | 0.328 | 0.146 | 0.28 | 0.146 | 0.443 | 0.146 | 0.360 | 0.146 |
| ||||||||||||
29 | 0.225 | 0.144 | 0.202 | 0.144 | 0.367 | 0.144 | 0.322 | 0.144 | 0.470 | 0.144 | 0.396 | 0.144 |
| ||||||||||||
28 | 0.256 | 0.143 | 0.235 | 0.143 | 0.397 | 0.143 | 0.363 | 0.143 | 0.516 | 0.143 | 0.444 | 0.143 |
| ||||||||||||
27 | 0.293 | 0.141 | 0.271 | 0.141 | 0.438 | 0.141 | 0.409 | 0.141 | 0.540 | 0.141 | 0.473 | 0.141 |
| ||||||||||||
26 | 0.327 | 0.140 | 0.309 | 0.140 | 0.481 | 0.140 | 0.457 | 0.140 | 0.578 | 0.140 | 0.516 | 0.140 |
| ||||||||||||
25 | 0.355 | 0.139 | 0.341 | 0.139 | 0.520 | 0.139 | 0.495 | 0.139 | 0.602 | 0.139 | 0.537 | 0.139 |
| ||||||||||||
24 | 0.391 | 0.137 | 0.386 | 0.137 | 0.566 | 0.137 | 0.543 | 0.137 | 0.621 | 0.137 | 0.570 | 0.137 |
| ||||||||||||
23 | 0.431 | 0.136 | 0.434 | 0.136 | 0.609 | 0.136 | 0.583 | 0.136 | 0.655 | 0.136 | 0.605 | 0.136 |
| ||||||||||||
22 | 0.466 | 0.135 | 0.478 | 0.135 | 0.645 | 0.135 | 0.622 | 0.135 | 0.672 | 0.135 | 0.631 | 0.135 |
| ||||||||||||
21 | 0.503 | 0.134 | 0.514 | 0.134 | 0.674 | 0.134 | 0.649 | 0.134 | 0.661 | 0.134 | ||
| ||||||||||||
20 | 0.537 | 0.132 | 0.552 | 0.132 | 0.68 | 0.132 | 0.681 | 0.132 | ||||
| ||||||||||||
19 | 0.575 | 0.131 | 0.589 | 0.131 | ||||||||
| ||||||||||||
18 | 0.612 | 0.130 | 0.626 | 0.130 | ||||||||
| ||||||||||||
17 | 0.644 | 0.129 | 0.661 | 0.129 | ||||||||
| ||||||||||||
16 | 0.672 | 0.128 | 0.689 | 0.128 | ||||||||
| ||||||||||||
15 | 0.686 | 0.127 | ||||||||||
| ||||||||||||
14 | ||||||||||||
| ||||||||||||
13 | ||||||||||||
| ||||||||||||
12 | ||||||||||||
| ||||||||||||
11 | ||||||||||||
| ||||||||||||
10 | ||||||||||||
| ||||||||||||
9 | ||||||||||||
| ||||||||||||
8 | ||||||||||||
| ||||||||||||
7 | ||||||||||||
| ||||||||||||
6 | ||||||||||||
| ||||||||||||
5 | 0.693 | 0.118 | ||||||||||
| ||||||||||||
4 | 0.698 | 0.117 | 0.669 | 0.117 | ||||||||
| ||||||||||||
3 | 0.676 | 0.116 | 0.646 | 0.116 | ||||||||
| ||||||||||||
2 | 0.659 | 0.115 | 0.695 | 0.115 | 0.635 | 0.115 | 0.677 | 0.115 | ||||
| ||||||||||||
1 | 0.640 | 0.115 | 0.684 | 0.115 | 0.599 | 0.115 | 0.657 | 0.115 | ||||
| ||||||||||||
0 | 0.606 | 0.114 | 0.676 | 0.114 | 0.574 | 0.114 | 0.667 | 0.114 |
Value in bold text shows high correlation with
SE: standard error.