I. Introduction
Prompt detection is a cornerstone for the control and prevention of infectious diseases. The Integrated Disease Surveillance Programme (IDSP) of India (
http://www.idsp.nic.in) was launched in November 2004 as a project and was later converted to a programme. The IDSP is a one-stop portal where almost 97% of Indian districts report disease surveillance data for 22 notifiable epidemic-prone diseases. This portal has facilities for surveillance and monitoring of disease trends and responds to outbreaks through trained rapid response teams (RRTs) [
1]. The data obtained using S (syndrome), P (presumptive), and L (laboratory) forms flow from bottom to top, i.e., community to state/central level. It takes at least 7 to 10 days for the central surveillance unit to recognise an outbreak through the current process of reporting. Therefore, any system that can supplement the existing system in gathering timely intelligence on infectious diseases may reduce the impact of unwarranted outbreaks. An Internet-based novel surveillance system led by Internet search behaviour of the community has recently emerged as a promising technique [
2]. In the present decade, teledensity in India is rapidly increasing, and the internet has emerged as an indispensable need of people [
34]. A large proportion of internet users go online to search for medical or health-related information [
5]. Recent studies have also shown that the Internet is among the primary sources of information for the population actively using the Internet [
6789].
Data generated from queries fed into search engines is recorded and can be used for surveillance purposes as it is used for marketing purpose. Targeted sources include Internet-search metrics, online news stories, social network data, and blog/microblog data [
2]. The application of this data for monitoring systems of interest is called ‘nowcasting’ [
10]. It can estimate the magnitude of outbreaks in their prodromal stages and produce timely information. Additionally, this near real-time technique can be implemented within the scope of existing infrastructure and human resources. Therefore, this approach is becoming more relevant in the context of resource-constrained countries with already overburdened health systems. Studies from other parts of the world suggest that Google Trends can be a useful tool for disease surveillance [
1112]. It is crucial to study the application of this tool for the surveillance of communicable diseases in India, particularly those listed under the IDSP. This is first study of its kind to assess the feasibility of using Internet-based surveillance systems for the prediction of disease outbreaks in India. This study was conducted with the primary aim of evaluating the temporal correlation between Google Trends and conventional surveillance data generated for diseases reported under the IDSP in Haryana and Chandigarh, India.
IV. Discussion
The Google Trends-based prediction system has the capability to identify disease outbreaks well in advance for the studied diseases with modest reliability [
14]. Real-time disease monitoring may alert respective health departments and other stakeholders in the early phases of a disease outbreak, empowering them to initiate adequate response measures, including case finding, disease containment, and treatment accessibility, thus limiting the disease burden [
1516]. The application of disease surveillance has been tried in both communicable and non-communicable diseases in developing countries, with robust reporting systems and quick response teams. Bragazzi [
11] reported the feasibility of web-based surveillance system for monitoring non-suicidal self-injuries. Since the launch of Google Flu Trends in 2009, much needed attention and respect has been devoted to the new evolving branch of ‘digital epidemiology’ [
1718].
The investigation and application of internet-based surveillance is widely recognised [
1920]. To date, it has not been used for any surveillance system in India. This is the first study reported from India that assessed the potential use of internet search trends for disease surveillance. The use of ‘P’ form data represents cases notified from both public and private health facilities and provides a holistic picture of the disease burden in the community on a weekly basis.
The present study demonstrates that an Internet-search-based surveillance system has the potential to effectively contribute to the control of various diseases. However, correlations alone should not be viewed as definitive evidence of impending outbreaks or epidemics as the analyses performed were univariate and exploratory in nature. The results of this study should be interpreted with caution keeping in mind the biological plausibility and natural history of the disease concerned.
The Internet-based surveillance system collects data and provides necessary information, instantly circumventing traditional administrative structures that impede information flow [
10]. The epidemic curves for chikungunya and dengue in Haryana and Chandigarh are associated with the rainy season in Northern India and showed sharp peaks during 2016. Malaria also showed a similar trend, with a broader curve than those of chikungunya and dengue. Enteric fever, on the other hand, is transmitted via the faecal-oral route or urine-oral route; thus, cases were reported throughout the year, with a peak around the rainy season. Also, the IDSP reporting of all the febrile illnesses included in the study showed good positive correlation with each other, and this adds to the robustness of the IDSP data retrieved from the P-forms.
The lag period used in this study was −4 to +4 weeks. This range was nearly two times the incubation period of any febrile illness studied. The negative lag period will help to understand the approximate time of primary case occurrence and further analysis to look for biologically plausible associations. The observed maximum correlation 2 to 3 weeks before the actual outbreak provides sufficient time to deploy RRTs for timely action. Similarly, the positive lag period may support the surveillance team to ensure that the outbreak is over.
The spike of Internet searches, for example, for ‘chikungunya’ may be attributed to various factors. It may be due to increased number of cases in the community and increased attention given by the social media. Media can be a source of bias, as it may seriously affect the trending of searches for a particular disease [
2122]. In northern India, increase in the chikungunya cases was first reported from the national capital and adjoining areas which was highlighted by the media, and later they were reported from Haryana and Chandigarh during the study period. Thus, it may be possible that this increased the interest of people in the adjoining states about chikungunya, which may be responsible for a sudden surge in Google Trends.
The studied febrile illnesses are common in India. Therefore, whenever a patient with fever visits any health facility, a battery of lab investigations are conducted depending on the previous experience from the community. This list also serves as a driver for the searches related to the diseases. However, these two processes, i.e., Internet searches as per the Google Trends and the actual number of cases in community and their notification may not be mutually exclusive.
The study had following limitations. The study used only the ‘P’ form data of malaria, enteric fever, chikungunya, and dengue. This study did not use the ‘S’ form data because the form did not differentiate the fever cases reported. Similarly, ‘L’ form data also was not included in the analysis because case reporting is usually delayed for laboratory confirmation. There is also a need to test and establish the correlation of Internet search data with other diseases and other forms of IDSP data. Similarly, there is a need to demonstrate the applicability of this internet search data to be used by all states. Second, in a country like India with varied culture, we have a variety of languages that are used as primary languages by the mobile and Internet users. However, only English was used as the main language to retrieve the search results, which may have caused underreporting of cases and thus errors in the correlation. Third, the established correlation may not help to identify the exact place of an outbreak or epidemic at intrastate and intra-district level because the Google Trends does not provide data at these levels. Fourth, this study assessed the performance of only one term that had the maximum correlation with the febrile illnesses included in the study. Other search terms may also add to the burden of the searches related to the particular disease. Despite this, we observed a positive correlation with all the febrile illnesses, though the strength varied. Finally, seasonal differencing could not be applied to cross correlations to remove cyclic seasonal trends as IDSP data was available for only 1 year.
We recommend the use of an Internet-based surveillance system to supplement the existing IDSP system. Such a system can be tested at the field level for taking timely action, especially for epidemic prone diseases. Future studies should focus on forecasting epidemics and outbreaks for various other diseases by using mathematical modelling that adjusts for other parameters. The search trends from social media platforms can also be assessed further along with Google or other portal site trends for disease surveillance.
In conclusion, similar results were obtained when applying the results of previous studies to specific diseases, and it is considered that many other diseases should be studied at national and sub-national levels. Internet-based surveillance systems have broader applicability for the surveillance of infectious diseases than is currently recognised, especially in resource-constrained areas. Despite the huge potential of this approach, this method cannot be used as an alternative to traditional surveillance systems and can only be used to supplement the existing system. However, the results of this study suggest that internet-based surveillance systems have potential role in forecasting of emerging infectious disease events.