Sentiment and Topic Modeling Analysis on Twitter Reveals Concerns over Cannabis-Containing Food after Cannabis Legalization in Thailand
Article information
Abstract
Objectives
Twitter has been used to express a diverse range of public opinions about cannabis legalization in Thailand. The purpose of this study was to observe changes in sentiments after cannabis legalization and to investigate health-related topics discussed on Twitter.
Methods
Tweets in Thai and English related to cannabis were scraped from Twitter between May 1 and June 13, 2022, during cannabis legalization in Thailand. Sentiment and topic-modeling analyses were used to compare the content of tweets before and after legalization. Health-related topics were manually grouped into categories by their content and rated according to the number of corresponding tweets.
Results
We collected 21,242 and 6,493 tweets, respectively, for Thai and English search terms. A sharp increase in the number of tweets related to cannabis legalization was detected at the time of its public announcement. Sentiment analysis in the Thai search group showed a significant change (p < 0.0001) in sentiment distribution after legalization, with increased negative and decreased positive sentiments. A significant change was not found in the English search group (p = 0.4437). Regarding cannabis-containing food as a leading issue, topic-modeling analysis revealed public concerns after legalization in the Thai search group, but not the English one. Topics related to cannabis tourism surfaced only in the English search group.
Conclusions
Since cannabis legalization, the primary health-related concern has been cannabis-containing food. Education and clear regulations on cannabis use are required to strengthen oversight of cannabis in the Thai population, as well as among medical tourists.
I. Introduction
Cannabis legalization is emerging as a critical issue in global health and policy landscapes. The World Drug Report 2022 of the United Nations Office on Drugs and Crime underscored the increase in daily cannabis consumption and its associated health ramifications in regions that have legalized the substance [1]. On June 9, 2022, Thailand embarked on a significant move by removing cannabis from Narcotic Category 5, making Thailand the first country in Southeast Asia to legalize cannabis [2]. This allowed individuals to sell cannabis after registering and applying through the Thai FDA website (https://plookganja.fda.moph.go.th/) [3]. People who would like to grow cannabis for in-house consumption are encouraged to register, but registration is not mandatory. There is also no limit on the number of cannabis plants that can be grown, sold, or possessed by individuals. The government discourages but does not prohibit recreational use of cannabis. Despite these provisions, Thailand lacks some regulations present in other countries, notably concerning tetrahydrocannabinol possession limits, plant cultivation, non-medical product tracking, and driving under the influence [4–6].
Twitter is the most suitable social media platform for gathering public opinion on a theme or event. Its real-time feed allows the immediate capture and analysis of user reactions. Hashtags and trends categorize discussions, making it easy to track specific topics. Public visibility enables access to a wide range of opinions from experts and the general public. Moreover, Twitter provides diverse perspectives that are essential for a comprehensive understanding of public sentiment on cannabis legalization, encompassing potential health implications, societal attitudes, and areas where regulation may be needed. Whereas other platforms offer opinion-gathering features, Twitter’s focus on real-time updates, hashtags, and public conversations makes it ideal for tracking and analyzing public opinion. Studies have already utilized Twitter for similar purposes. For example, to analyze public sentiment, Mann et al. collected the top hashtags and the topics discussed from Twitter during the United States House of Representatives’ vote regarding cannabis decriminalization [7].
Due to the lack of clear regulations during legalization and insufficient previous studies concerning the misuse of cannabis [8], it is important to quickly understand how people perceive legislative change and to see which health-related issues have been raised during this period. In this study, we utilized Twitter’s application programming interface (API) to systematically collect tweets in both Thai and English while cannabis legalization took place in Thailand. The data were preprocessed for further analysis by natural language processing (NLP) tools, including sentiment analysis and topic modeling, to understand sentiment changes and health topics discussed on Twitter during this period. With this information we aimed to improve cannabis oversight by public health agencies following legalization.
II. Methods
The overall workflow of the methodology is presented in Supplementary Figure S1.
1. Data Collection
To quickly understand the public perceptions on Twitter of local Thai people and global bystanders regarding cannabis legalization in Thailand on June 9, 2022 [9], we collected tweets through the Twitter API for academic research using the Tweepy Python package (version 4.9.0) before and after the legalization. For Thai language (TH) searching, we used กัญชา (cannabis). For the English language (EN), we modified a previous study on marijuana-related keywords to make them fit with our study [10]. The cannabis-related search terms were “cannabis Thailand” or “weed Thailand” or “marijuana Thailand” or “pot Thailand” or “blunt Thailand” or “mary jane.” The collection period was between May 1 and June 13, 2022. Retweets were excluded during data collection.
2. Data Preprocessing
Two preprocessing steps were applied to obtain a set of unique tweets before further sentiment and topic modeling analysis. First, non-letters (e.g., emoji such as , symbols such as “”.#, and numerals), @usernames, and hyperlinks were discarded. After the first round of preprocessing, the Thai tweets were translated into English using the deep-translator Python package (version 1.8.3). Next, the Contractions Python package (version 0.1.68) processed all contractions and slang in English, then changed all tweets to lowercase. Finally, redundant tweets were excluded. As noted previously, cannabis (TH) and cannabis (EN) were utilized as representative terms for Thai and English search word groups, respectively, which are denoted as cannabis (TH) and cannabis (EN).
To observe the effects of legalizing cannabis, two periods were compared. The period between May 1 and June 8, 2022 was set as before legalization, whereas the remaining period, from June 9 to June 13, 2022, was defined as after legalization. We analyzed the dynamics of Twitter usage by counting the number of tweets per day and comparing patterns between the “before” and “after” tweets.
3. Sentiment Analysis
Sentiment analysis was employed to understand Twitter users’ attitudes toward legalizing cannabis. In this study, we used the Transformers package for sentiment analysis. Transformers is an NLP machine-learning model designed to analyze emotions based on text data [11]. The Transformers package (version 4.18.0) was built upon TensorFlow (version 2.8.0), a Python-based open-source toolkit for numerical calculation, which enabled us to undertake complicated tasks such as sentiment analysis [11]. To align the tweets’ sentiments, we used the text-classification pipeline model in Transformers, namely distilbert-base-uncased-finetunedsst-2-english, which returns labels (i.e., negative and positive) and confidence scores. Based on our previous study, a positive or negative result with a score of less than 0.99 was reclassified as neutral [12]. Thus, the sentiment analysis resulted in three labels: positive, neutral, and negative. We analyzed the sentiment distribution by counting the number of tweets in each sentiment and normalizing each as a percentage of the total.
4. Statistical Analysis
The sentiment contribution difference between the “before” and “after” groups was compared, and the statistical difference was calculated using the contingency chi-square test in the SciPy Python package (version 1.7.1) with an accepted p-value threshold of <0.05.
5. Topic Modeling Analysis
For an overview of the topics discussed in the tweets, topic analysis was performed using the BERTopic Python package (version 0.10.0) [13], which can group a large amount of unlabeled text data, such as tweets, into topic groups. A dataset from the 20 newsgroups collection on netnews was used to generate the model. The results yielded the number of topics, the number of representative tweets within each topic, and the top 15 words with scores indicating the most relevant word presented coherently in the topic.
To gain an in-depth understanding of public perception regarding health-related topics, we had all such topics read and categorized manually into nine distinct categories by two medical experts from our research group, who agreed on topic categorization. The topic categories were arranged based on previous studies [14] as follows: (1) healthcare accessibility, national herbal list, national health social office, universal health coverage, regulation, policy; (2) concerns over the effects of cannabis on children, teenagers, pregnant women, and individuals with illnesses; (3) concern over allergies to cannabis and its smell; (4) discussion of whether cannabis could be used to treat certain diseases or physical barriers (e.g., cancer or pain); (5) side effects of cannabis; (6) cannabis addiction; (7) being intoxicated and losing self-control, endangering others (i.e. driving under the influence or loss of emotional control); (8) concern over cannabis-containing food; and (9) cannabis tourism. Each topic was counted and normalized as a percentage in order to compare its manifestations before and after legalization. The summation of related tweets in each category was then calculated.
All figures were generated using the Matplotlib (version 3.4.3) and Seaborn (version 0.11.2) Python packages.
III. Results
1. Dynamics of Tweets during Cannabis Legalization
In total, we collected 21,242 and 6,493 tweets for two different language search terms, cannabis (TH) and cannabis (EN), respectively, as shown in Table 1. Overall, the number of tweets ranged per day from 122 to 3,585 for cannabis (TH) and from 5 to 1,514 for cannabis (EN). Figure 1 shows the changes in number of tweets over time and the different patterns between cannabis (TH) and cannabis (EN). For cannabis (TH), the trend peaked around the day of cannabis legalization. On June 8, 2022, the number of tweets was 1,503. During June 9–12, 2022, the total numbers of tweets were >2,000—June 9, 2022 (n = 2,274), June 10, 2022 (n = 2,082), June 11, 2022 (n = 2,233), June 12, 2022 (n = 2,248). On June 13, 2022, the number of tweets spiked to 3,585. The pattern of cannabis (EN) showed relative peaks at two different periods: May 11–14, 2022 (peak on 12 May 2022; n = 360) and June 8–11, 2022 (peak on June 9, 2022; n = 1,514).
Next, we compared the numbers of tweets before and after legalization. We found that the number of tweets during the period before legalization—cannabis (TH), n = 8,820; cannabis (EN), n = 2,457—was lower than that after legalization—cannabis (TH), n = 12,422; cannabis (EN), n = 4,036—despite the fact that the “before” collection period (May 1–June 8, 2022; 39 days) was longer than the “after” period (June 9–13, 2022; 5 days).
2. Sentiment Analysis before and after Legalization
Sentiment analysis before and after legalization showed that the majority of the sentiments were neutral, whereas low proportions of positive sentiments were found for both cannabis (TH) and cannabis (EN) search results, as shown in Figure 2. For cannabis (TH), the percentages of neutral, negative, and positive sentiments before legalization were 51.4%, 39.2%, and 9.4%, respectively, whereas such percentages after legalization were 47.4%, 45%, and 7.6%, respectively. For cannabis (EN), the percentages of neutral, negative, and positive sentiments before legalization were 73%, 17.5%, and 9.5%, respectively, whereas such percentages after legalization were 72.5%, 18.5%, and 8.9%, respectively. A comparison of sentiment distribution before and after legalization showed a significant difference in cannabis (TH) (p < 0.0001), but not in cannabis (EN) (p = 0.4437) (Figure 2).
3. Topic Modeling of Health-Related Categories
Before and after legalization, respectively, we collected 156 and 138 topics in cannabis (TH) and 48 and 91 topics in cannabis (EN). The number of health-related topics in cannabis (TH) and cannabis (EN) increased after legalization from 5.7% (9/156) to 30.4% (42/138) and from 6.25% (3/48) to 16.5% (15/91), respectively. Overall, the number of topics in cannabis (TH) was higher than that in cannabis (EN) (Table 2).
Next, we analyzed the number of health-related topic categories and found that it differed between the “before” and “after” groups in cannabis (TH), as shown in Figure 3, left panel. In cannabis (TH), seven of the nine categories were found in the period before legalization, and eight of the nine categories were found in the period after legalization; the eighth category was “concern over cannabis-containing food.” In cannabis (EN), by contrast, only two categories occurred before, and after legalization.
We analyzed the change in the distribution of each category based on the number of related tweets from the cannabis (TH) group. The result revealed that the top category before legalization, “healthcare accessibility, national herbal list, national health social office, universal health coverage, regulation, policy,” was replaced by “concern over cannabis-containing food” after legalization.
On the other hand, the analysis of the cannabis (EN) group showed that the top category before and after legalization, “healthcare accessibility, national herbal list, national health social office, universal health coverage, regulation, policy,” remained unchanged. Table 3 shows examples of representative tweets for each topic by category.
IV. Discussion
This study demonstrated the use of Twitter as a social media platform to monitor public perception regarding the recent legalization of cannabis in Thailand. One recent study used data from Facebook to analyze the content and emotional tone of Thai-language posts related to cannabis and kratom during April and November 2015 [15], when possession of either substance was illegal. This indicates that social media can be an informative means of observing opinions regarding cannabis use.
Our results revealed a spike in the number of tweets for both cannabis (TH) and cannabis (EN) associated with cannabis legalization on June 9, 2022. Moreover, we observed that the small spike in cannabis (EN) during May 11–13, 2022 was related to the “Thai officials are giving away 1 million free cannabis plants for citizens to grow at home” campaign [16]. One study that collected data from Twitter during November 2016 demonstrated that the peak in the tweet count on November 8 was related to the United States presidential election: the newly elected president supported the use of cannabis for medical purposes and pushed for more states to allow votes on recreational marijuana legalization [17]. Together, these studies show that key public events could influence the volume of tweets. Future studies will focus on how public events or news related to cannabis influence public responses on social media and identify the characteristics of these events that are associated with positive and negative sentiments. Policymakers and relevant officials could use timely social media monitoring to capture prevalent public concerns or questions, setting key messages for effective public communication.
As shown by our study, negative sentiment was higher than positive sentiment across the periods analyzed for both cannabis (TH) and cannabis (EN). Moreover, a sentiment-analysis comparison before and after legalization showed that negative sentiment increased after the country announced legalization. This change may be explained by two points. First, there have been reports of people becoming ill after consuming food containing cannabis [18,19], which may have raised concerns over food contamination. Second, there is confusion about taking legal action. As a result, concerns may rise for people in Thailand and elsewhere, including sellers, growers, chefs, and consumers. However, this supposition is contradicted by a study from New Zealand, which collected tweets between July 2009 and August 2020, a period before the vote to legalize cannabis for recreational use; it found a positive view of cannabis [20]. A study of cannabis-related tweets in the United States between March and May 2016 found that personal tweets elicited more positive than negative sentiments [21], with more positive sentiment found in states with fewer restrictions. A similar result was found in a study analyzing sentiments toward cannabis-related tweets in the United States and Canada between 2017 and 2019, in which the increase in positive sentiment was correlated to states where cannabis was legal for adult recreational use [22]. The sentiment changes in our results differ from other studies, and not only because of differences in social or general characteristics; the perception of incomplete regulation may be the reason for negative sentiment on Twitter and in other nations’ viewpoints [2,23]. The negative sentiments in our findings may be explained by baseline marijuana regulations in the Thai setting, which have relied on a medical marijuana policy since 2019. Further research should identify factors associated with different sentiments between EN tweets and tweets in other languages on the same topics.
Our findings from topic modeling revealed a health-related emphasis. A manual examination of these topics resulted in nine health categories. Some of these categories were also used in other cannabis-related Twitter studies [24–27]. Regarding our result on “concern over the effects of cannabis on children, teenagers, pregnant women, and individuals with illnesses,” one study used a topic-modeling approach to investigate tweets related to cannabis use in pregnancy [24]. The study identified nine topic clusters, including effects of cannabis during pregnancy, cannabis exposure on infants, and legalization and police. The topic “discussion of whether cannabis could be used to treat certain diseases or physical barriers (e.g., cancer or pain)” appeared in our topic analysis, which agrees with a study in which scraping posts containing cannabis-related terms and exploring the topic discussion suggested that posters thought cannabis might help relieve many health conditions, such as Crohn’s disease, sleep, pain, depression, and cancer [25]. Another topic from our results, “concern over cannabis-containing food,” is consistent with a study of cannabis edibles on Twitter by other researchers, who reported a prevailingly positive sentiment toward edibles. However, their analysis of the content of negative sentiments demonstrated the unreliability of consumption of edibles [27]. The “cannabis tourism” category in our results was found solely in cannabis (EN). One study raised the concern that countries with decriminalization and legalization of cannabis could become an attractive destination for tourists [26]. Thus, the study recommended vigilance regarding vulnerable travelers, particularly those with mental disorders. In this manner, topic-modeling analyses of Twitter data could be used to observe health-related concerns during the legalization of cannabis.
A strength of our study is that Twitter allows us to quickly capture a variety of health-related concerns regarding cannabis legalization. Moreover, this study may help policymakers and healthcare professionals set up a proper oversight program for cannabis to prevent misuse, minimize risks when cannabis is used in food or personal care, and educate vulnerable tourists regarding legalization. Real-time monitoring through Twitter may also enable tailoring fast actions to educate the public on related topics.
However, this study does have several limitations. First, due to the voluntary nature of Twitter, our data may not represent the general population; instead, it was inherently limited to those who choose to use Twitter to publicly express their opinions, which may impose a sampling bias. Second, Twitter’s character limit may constrain the depth and complexity of expressible sentiments, potentially preventing us from capturing nuanced feelings. In addition, the selection and naming of topics were based on the agreement of two researchers with medical backgrounds, which might limit the diversity of perspectives. Experts from fields such as sociology, linguistics, public health, and policymaking could have provided additional insights into topic selection. Moreover, medical viewpoints might have had a strong influence on the topics while potentially ignoring social, legal, or cultural factors. Lastly, the process of sentiment analysis may have struggled with detecting ambiguous phrases, slang, and sarcasm, leading to possible inaccuracies in interpretation.
Notes
Conflict of Interest
No potential conflict of interest relevant to this article was reported.
Data Availability
The code used in this study was deposited at https://github.com/tlerksuthirat/public_perception_cannabis. The raw data is available upon reasonable request.
Acknowledgments
Tassanee Lerksuthirat is supported by the National Research Council of Thailand (NRCT) and Mahidol University (No. NRCT5-TRG63009-04).
Supplementary Materials
Supplementary materials can be found via https://doi.org/10.4258/hir.2023.29.3.269