A research framework for passive surveillance for food safety from social media: Identification and evaluation of customer reviews for regulatory use and case study of 30 restaurants

Akash Gajanan Prabhune*; Neeraj Kumar Sethiya; Heemanshu Arora

Prabhune, Sethiya, and Arora: A research framework for passive surveillance for food safety from social media: Identification and evaluation of customer reviews for regulatory use and case study of 30 restaurants

Journal Information

Journal ID (nlm-ta): Innovative Publication

Journal ID (publisher-id): Innovative Publication

Journal ID (journal_submission_guidelines): https://www.innovativepublication.com/journal/IJFCM

Title: Indian Journal of Forensic and Community Medicine

ISSN: 2394-6776

Article Information

Date received: 18 December 2022

Date accepted: 22 December 2022

Publication date: 9 January 2023

Volume: 9

Issue: 4

Page: 146

DOI: 10.18231/j.ijfcm.2022.031

A research framework for passive surveillance for food safety from social media: Identification and evaluation of customer reviews for regulatory use and case study of 30 restaurants

[] Akash Gajanan Prabhune[1]

Email: prabhunesky@gmail.com

Designation:

PhD Fellow

[] Neeraj Kumar Sethiya[2]

Designation:

Associate Professor

[] Heemanshu Arora[3]

Designation:

Public Health Researcher

Population Health Informatics, DIT University Dehradun, Uttarakhand India

Dept. of Pharmacology, DIT University Dehradun, Uttarakhand India

Foundation of Healthcare Technologies Society New Delhi India

Abstract

The primary objective of this paper is to develop a framework for continuous monitoring of the safety of food business operators without overburdening established regulatory systems through social media for food safety. A phase-wise methodology was adopted, wherein Phase 1 was dedicated to identifying available literature on Adverse Drugs Reactions (ADR) reporting using Social Media data. Phase 2 used the data from google maps review of the restaurants to replicate a similar methodology for Food Safety Surveillance. We identified 5 themes for a complete Surveillance framework, theme 1 involves data collection from social media, theme 2 involves pre-processing of data for analysis, theme 3 involves data annotations, theme 4 involves Identifying the relationship between regulatory violation and event, and theme 5 involves evaluation of the model. We were able to demonstrate the ADR reporting methodology could be adopted till theme 3, whereas theme 4 requires the development of an algorithm to assess the causality of an event with the Food Safety Code. According to our research, it is possible to develop a passive surveillance system for food safety that adheres to the principle of ADR reporting; however, the main obstacle is the absence of a causality assessment algorithm that can link an event to the food safety code and help regulators take immediate action.

Introduction

Food Safety is an important aspect of health and well-being and the very nature of the food industry in India is a challenge for a regulator to monitor the quality of food supplied through the street cart vendor, a low-key restaurant near an office complex, hostel mess of a university or a restaurant in a 5-star establishment. The limited manpower and resources of the Food Safety Standards Authority of India (FSSAI) also complement the problem that the regulator is under-resourced to maintain vigilance at par with its global counterparts.1

Developing nations like India have a highly fragmented and unorganized food industry and limited resources at their disposal, thus the need for a passive surveillance system has been necessitated.2 The Global Foodborne Infections Network (GFN),3 is one such system which aims to build the capacity to detect, control and prevent foodborne and other enteric infections from farm to table. The proposed systems are aimed at promoting integrated, laboratory-based surveillance through intersectoral collaboration among human health, veterinary and food-related disciplines. Another well-known active surveillance system “Food Net” coordinated by the Centre for Disease Control (CDC) in North America is focused on nine organisms and related illnesses, such as hemolytic uremic syndrome (HUS), associated with Escherichia coli.4

FSSAI the apex body under the Ministry of Health and Family Welfare, Government of India for regulation of the food industry has incorporated Active Surveillance as a key mandate under its purview.5, 6 The Manual of Food Safety Officers published by FSSAI divides the task of regulating and licensing the industry into three tiers: a) Central Licensing Authority, b) State Licensing Authority, and c) Registration Authority. The three tiers are coordinated through State and National Commissioner, the food industry is broadly categorized into four categories a) Manufacturers/Millers, b) Hotel 3 stars and above, c) All food service providers including restaurants, boarding houses, clubs, canteens, caterers, banquet halls with food catering arrangements, d) Any other food business operator (Street vendors) The FSSAI has annual active surveillance plan in place targeting state-specific food products with historical evidence of adulteration (Milk, Ghee, Diary Products), contamination (Seafood, Meat) across specific time of a year. Vegetables are to be tested in monsoon season for e-Colli, and milk is also to be tested for e-Colli in monsoon.

Though there are provisions for passive surveillance under the Food Safety and Security (FSS) act 2006, few details are presented in Manuals, and little literature is published or grey literature available, indicating know how’s of undertaking a passive surveillance system.1, 7 This presents a unique opportunity to use two rapidly growing fields of Digital Epidemiology and Consumer Health Informatics to develop a first-line passive surveillance network using principles of population health informatics. Much work has been done to monitor the adverse events related to various drugs. Web-Recognizing Adverse Drug Reaction (WEB RADR),8 by Innovative Medicines Initiative, Vigi4Med Project,9 and EudraVigilance,10 are some of the well-established adverse drug event reporting programs utilizing data from social media, and online forums. In this paper, we propose to develop a framework for using data available on social media to generate credible evidence for preventive action by regulatory agencies in India.

Objectives

The primary objective of this paper is to develop a framework for continuous monitoring of the safety of food business operators without overburdening established regulatory systems through social media for food safety. The secondary objectives include the development of a clear pathway of valid signal detection in food safety reporting from social media; defining technological needs for signal detection; specifying the data protection and confidentiality aspects to use public social media data

Methodology

This paper was written using a two-phase methodology. Phase 1 included a review of the literature available on the use of data available on social media for Adverse Drug Reaction (ADR) monitoring to derive a comprehensive framework for passive surveillance of food business operators. Phase 2 included Use data from social media to evaluate the conceptual feasibility of developing a strong signal using a developed framework.

Under Phase 1, we developed a search strategy based on the Participants, Interventions, Comparison and Outcome (PICO) format. The key search terms include terms such as: “Pharmacovigilance, Adverse Drug Reaction Report, ADR Reporting, ADR Monitoring, Drug Monitoring, Adverse Events, Social Media, Twitter, Facebook, Online Forums, Secondary data, Crowd Sourced data, Framework, Evaluation of systems, Assessment”. Suitable bullion operators “or, and” were used. PubMed and Google Scholar search resulted in 155 articles. According to the following criteria, we screened the articles for eligibility.

Studies using secondary data related to ADR monitoring. Studies published in English were included in the review. While studies using primary data collection using the internet for ADR monitoring and studies published in a language other than English were excluded from the review. The process is presented in Figure 1. This led to 22 papers being shortlisted for full-text reading. We screened through the 22 articles and stratified the articles into categories of a) Theme 1: Data Extraction, b) Theme 2: Pre-Processing, c) Theme 3: Data Annotation, d) Theme 4: Identifying the relationship between regulatory violation and event, e) Theme 5: Evaluation.

Figure 1

Study flow chart (PRISMA flowchart)

https://s3-us-west-2.amazonaws.com/typeset-prod-media-server/16fdb6b8-7162-4f62-9646-2ee281de83a4image1.png

We presented various frameworks or procedures available across the literature and drew parallels between ADR reporting a Food Safety Surveillance. A combined framework was thus developed from phase 1. Under Phase 2, we used data from a Google Maps Review of 10 Food Business Operators (FBOs) in Byatrayanpura Wards of Bhruhat Bengaluru Mahanagar Palike, from 2016 to 2021 to obtain the results from the framework in line with regulatory requirements. We included FBOs with a minimum of 100 reviews, operating in dine-in mode. FBOs serving alcohol were excluded, FBOs operating for less than 1 year and only in the delivery mode were excluded.

Data Extraction and processing were carried out as per outlined framework in phase 1, descriptive data analysis was undertaken along with evaluation analysis using F measure, Precision, Recall and Accuracy.

Results

I have presented the results of Phase 1 and Phase 2, according to the themes, highlighted in the methodology section.

Theme 1: Data Extraction - Under data extraction, we tried to look for specific answers like "What are the possible sources of data? what are the ethical implications of the data available on social media? What are the processes associated with the extraction of quality data from social media?" Social media is defined as “a group of Internet-based applications that build on the ideological and technological foundations of Web 2.0 that allow the creation and exchange of user-generated content based on mobile and web-based technologies to create highly interactive platforms via which individuals and communities share, co-create, discuss, and modify user-generated content”.11 A report from the WEB RADAR study by Cater et al12 found that Twitter and Facebook data was not useful for signal detection in Pharmacovigilance. Another paper by Sarker et al.13 presented a key reason for lower F values from ADR reporting from data derived from Twitter was vague or too descriptive data. Studies by Comfort et al, and Pierce et al, used Twitter, Facebook, and Tumblr data with limited success.14, 15 We tried to search for sources of FBO data, while Twitter, Facebook, Tumblr, and Instagram had videos, and pictures related to food, most were promotional content. Another source was reviews left by dinners over Google Maps, and Zomato. Amazon, Flipkart and other electronic marketplaces also had product reviews related to food, however, they were mostly for groceries, and ready-to-eat meals, related to Type A FBOs consisting of manufacturers and millers. Thus, we focused on the data available over Google Maps specifically.

Typical review on Google Maps for an FBO is as follows

“Food was very good, but considering the current covid pandemic, I'd say the Restaurant is not as prepared. The spoons are kept in water for folks to pick from... Felt very unhygienic because it's unclear if that water is replaced. Five stars for food, but only 2 for seating arrangements. And considering close together the seats I'm going to give them 3 stars for dine-in. I recommend taking it out. Overall, 4 stars” - A Level 7 Local Guide on Google Maps.

The Local Guide program by Google Maps is dedicated to a global community of explorers who write reviews, share photos, answer questions, add or edit places, and check facts on Google Maps. Millions of people rely on contributions "to decide where to go and what to do".16 A Google guide has various levels from Level 1 to Level 10, as the level increases the credibility of information also increases proportionally.

The question of ethics was also explored from the literature, many ADR reporting using social media had put forth ethical benchmarks while using data from social media. A paper by Bousquet et al.8 mentions that privacy becomes a key issue will using data over social media as the ownership of data will remain with the original contributors. Another paper by Azam et al17 mentioned that consent for use of data over social media is not guaranteed, as many people are not aware of the real terms of the use of data they put over social media.

The Google Maps end-user policy highlights the reviews added by its users will be available for all, however, users can control access to personal information. In view of the challenges and policies, the Ethical framework adopted for this study is; a) The FBOs name will be visible as the FBO has voluntarily registered on Google maps for business visibility and the review data is the ownership of individual local guides or reviewers. b) The privacy of the local guide or reviewer will not be disclosed, and all data will be anonymized, the consent from each local guide or reviewer will not be feasible and will consider the user has read through the End User License Agreement provided by Google Maps. The data from Google Maps was extracted using Anaconda Script in Python and exported to the excel sheet.

Theme 2 Pre-Processing: The Data Pre-processing step prepares the raw data for analysis. The data from social media is usually in form of free text. Data pre-processing consists of two steps: text cleaning and sentence boundary detection.18 Various methods are marked in the literature for preprocessing of data, including Sentence Splitting, Parsing, Stemming, and Lemmatization.19 We used the Method prescribed by Liu et al18 of two-step pre-processing

Text cleaning for punctuation removal, personal identifier removal, and URL removal. A Google Map Review about Multicuisine restaurant was written as

“Stepped in as the reviews were good but stepped out with a bad taste after experiencing cockroaches crawl out of our table and over our plates.

The reason given was pest control was recently done.

We changed tables but the next family that walked in was seated at the same until we told them about the cockroaches.”

After the first step of text cleaning the review was processed as

“Stepped in as the reviews were good but stepped out with a bad taste after experiencing cockroaches crawl out of our table and over our plates the reason given was pest control was recently done we changed tables but the next family that walked in was seated at the same until we told them about the cockroaches.”

The second text was on space boundary detection, herein the review was split based on bullion operators stepped in as the reviews were good--------1

Stepped out with a bad taste after experiencing cockroaches crawl out of our table and over our plates the reason given was pest control was recently done we changed tables --------2

The next family that walked in was seated at the same until we told them about the cockroaches------3

Theme 3 Data Annotation: Under theme three the literature presented options of Dictionary or Lexicon based, Rule-based, Machine Learning based techniques.13, 19, 20 We used the New Mexico Restaurant Association (NMRA) Dictionary21 on food safety as it was the most comprehensive open-access dictionary available on food safety data on social media. Another tool at our disposal was the FDA Guidelines for Confirmation of Foodborne Disease Outbreaks.22 We subjected 10 reviews randomly selected from the 10 restaurants sample to the NMRA Dictionary and FDA Guidelines for Confirmation of Foodborne Disease Outbreaks. Using NMRA Dictionary we used the shortest dependency pathway to extract Food Violations, the shortest dependency pathway is presented in Figure 2.

Figure 2

Shortest dependency path algorithm

https://s3-us-west-2.amazonaws.com/typeset-prod-media-server/16fdb6b8-7162-4f62-9646-2ee281de83a4image2.png

Based on the data available from 10 restaurants and 10505 google maps reviews the data was extracted, cleaned for punctions and grammatical errors and split, the common keywords that were extracted from the reviews were presented in table 1 along with frequency. From the 10505 reviews, 263 reviews were having more than three words comprehension, and 125 reviews were evaluated for keywords after excluding reviews without a proper semantic framework. The keywords are summarized in Table 1 in terms of their frequency across the 125 reviews.

Table 1

Frequency of keywords across the sample restaurants

Themes	Keywords	Frequency
Food Taste	Oily	4
Food Taste	Spicy	1
Cost	Pricey	7
Cost	Expensive	5
Hygiene	Not Hygienic	7
Hygiene	Not Clean	2
Pest	Cockroaches	3
Food Taste	Bad taste	5
Food Taste	Salty	3
Uncategorised	Not Good	5
Cost	Price Hiked	1
Service related	Late Service	1
Uncategorised	Worst Restaurant	1
Uncategorised	Bad Food	8
Food Quality	Not Cooked Properly	3
Service related	Wrong order	1
Service related	Worst Service	14
Service related	Pathetic Service	6
Service related	Worst Costumer Service	4
Health Issues	Fell ill	1
Uncategorised	Sub Standard	1
Service related	Slow Service	4
External items in food	Stapler Pin	1
Health Issues	Suffered from very bad health	1
Food Taste	Horrible Food	2
Service related	Order Delay	2
Food Quality	Smelly	3
Pest	Mosquitoes	4
Food Taste	Tasteless	1
Service related	Untidy Waiters	1
Health Issues	Food Poisoning	3
Uncategorised	Noisy	1
Food Quality	Food Sucks	1
Health Issues	Stomach Pain	2
Uncategorised	Not recommended	12
Uncategorised	Not Fresh	1
Pest	Worms	1
External items in food	Hair in the food	1
Uncategorised	Soda	1
Total		125

The thematic area-wise frequency of keywords is presented in Table 2.

Table 2

Thematic area wise frequency of keywords

Themes	Frequency	%
Food Taste	16	12.8
Cost	13	10.4
Hygiene	9	7.2
Pest	8	6.4
Service Related	33	26.4
Health Issues	7	5.6
External items in food	2	1.6
Food Quality	7	5.6
Uncategorised	30	24
Total	125	100

When looking across 10 restaurants the Health Issues (N=7) of Stomachache, Food Poisoning were attributed to biryani, rice, and fried rice. Almost all restaurants had a review mentioning pests like cockroaches, mosquitoes, and mosquitoes in the dining area and 1 restaurant had a review mentioning food items having worms. All the restaurants had negative reviews with respect to service, staff, and food taste.

Theme 4 Identifying the relationship between regulatory violation and event: For the relationship between keywords and food safety violations, the ADR reporting papers suggest a Rule-based assessment - Kramer algorithm, World Health Organisation (WHO) algorithm for severity assessment and Statistical assessment to link an adverse event with a drug based on the time of drug intake and symptoms and determine the causality of an event. For the proposed Food Safety, the FDA Guidelines for Confirmation of Foodborne Disease Outbreaks was a standard algorithm, however, it included confirmation of organisms for laboratory and medical matching of symptoms, which were not feasible from the data available from the Google Maps reviews. Statistical Method was an option; however, it requires stakeholder discussion to define the hypothesis and define the signal from a regulatory perspective.

Theme 5 Evaluation: To evaluate the performance of the system, we propose to use statistical measures of F Square, Accuracy, Precision, and Recall. As the data available in the feasibility study was limited to 125 reviews, the evaluation of the system was not undertaken. The structured proposed framework is presented in Figure 2.

Figure 3

Proposed framework for the food surveillance

https://s3-us-west-2.amazonaws.com/typeset-prod-media-server/16fdb6b8-7162-4f62-9646-2ee281de83a4image3.png

Discussion

Various research papers have used Restaurant reviews on Google Maps, and Yelp, to analyse using sentiment analysis techniques. A paper by Krishna et al., Hossain et al., and Adi et al., all presented machine learning techniques using the Bayesian approach and presented the sentiment analysis that can be used to identify the sentiments of the population visiting the restaurants.23, 24, 25 The analysis was used to classify the restaurant based on the services offered and the general cause of negative or positive feedback. A paper published by Harris et al.26 used a different approach wherein machine learning and human analysis to classify the tweets relevant to food poisoning and automate reply to the individuals to report the food poisoning to local health authority for increased reporting of food poisoning outbreak. Indicating the need for an algorithm to validate the causality between event and outcome.

This study is the first attempt to our knowledge to develop a framework for a passive surveillance network using social media data and providing high-quality signals with regulatory value. The efforts of our study have shown, the data from Google Reviews can be used for general sentiment analysis around the restaurant business. The methodology prescribed for ADR reporting proved to be closely linked to our approach to Food Safety, however, we were not able to identify the algorithm which would assess causality between regulatory violation and an event. This necessitates initiating separate studies to consult with various stakeholders (specifically regulatory bodies) to understand their needs, understand the current method, develop a consensus method to build an algorithm and test the validity of such algorithm.

Conclusions

The review of various articles indicates the work on using social media to derive Regulatory signals specifically in pharmacovigilance has been successfully carried out and success, the key barrier to replicating the same model in Food Safety and Regulation is the non-availability of causality assessment algorithms which would link and event with a food safety violation. The current approach of using review data to generate sentiment around FBOs is statistically robust. The sentiment analysis techniques do indicate the sentiment behind the review left by the customer. Further segmentation and analysis had potential to positively impact the quality of food and customer experience. The study indicates the need to use informatics tools to develop and pilot tech enabled model in accordance to current Food Safety Code.

Source of Funding

None.

Conflict of Interest

None.

References

S Gardner Consumers and food safety: A food industry perspective1993https://www.fao.org/3/v2890t/v2890t05.htm

LL Sharma SP Teret KD Brownell The Food Industry and Self-Regulation: Standards to Promote Success and to Avoid Public Health FailuresAm J Public Health201010022406

N Desai H Joshi TM Chiller Global foodborne infections network (GFN): An opportunity for capacity-building in enteric diseases in India141st APHA Annual Meeting and Exposition2013

BJ Mccabe-Sellers SE Beattie Food safety: Emerging trends in foodborne illness surveillance and preventionJ Am Diet Assoc200410411170817

FSSAI Chapter 6 Inspection of Food Establishmenthttps://www.fssai.gov.in/upload/uploadfiles/files/Chapter6.pdf

FSSAI Chapter 8 Annual Surveillance Planhttps://www.fssai.gov.in/upload/uploadfiles/files/Chapter8.pdf

J Freeman N Ryan J Glenesk Strategic Surveillance for Food Safety: Designing a surveillance approach and considerations for implementation2019https://www.rand.org/pubs/research_reports/RR2519.html

C Bousquet The Adverse Drug Reactions from Patient Reports in Social Media Project: Five Major Challenges to Overcome to Operationalize Analysis and Efficiently Support Pharmacovigilance ProcessJMIR Res Protoc201769e179

B Audeh F Bellet MN Beyens ALL Louët C Bousquet Use of Social Media for Pharmacovigilance Activities: Key Findings and Recommendations from the Vigi4Med ProjectDrug Saf202043983551

R Postigo S Brosch J Slattery AV Haren JM Dogné X Kurz EudraVigilance Medicines Safety Database: Publicly Accessible Data for Research and Public Health ProtectionDrug Saf201841766575

Council conclusions on shaping Europe’s digital future2020https://www.consilium.europa.eu/media/44389/st08711-en20.pdf

O Caster J Dietrich ML Kürzinger M Lerch S Maskell GN Norén Assessment of the Utility of Social Media for Broad-Ranging Statistical Signal Detection in Pharmacovigilance: Results from the WEB-RADR ProjectDrug Saf20184112135569

A Sarker R Ginn A Nikfarjam K O'Connor K Smith S Jayaraman Utilizing social media data for pharmacovigilance: A reviewJ Biomed Inform20155420212

CE Pierce K Bouri C Pamer S Proestel HW Rodriguez HV Le Evaluation of Facebook and Twitter Monitoring to Detect Safety Signals for Medical Products: An Analysis of Recent FDA Safety AlertsDrug Saf201740431731

S Comfort S Perera Z Hudson D Dorrell S Meireis M Nagarajan Sorting Through the Safety Data Haystack: Using Machine Learning to Identify Individual Case Safety Reports in Social-Digital MediaDrug Saf201841657990

Overview - Local Guides Helphttps://support.google.com/local-guides/answer/6225846?hl=en

R Azam Accessing social media information for pharmacovigilance: what are the ethical implications?Ther Adv Drug Saf2018983857

J Liu S Zhao G Wang SSEL-ADE: A semi-supervised ensemble learning framework for extracting adverse drug events from social mediaArtif Intell Med2018843449

AC Tricco W Zarin E Lillie S Jeblee R Warren PA Khan Utility of social media and crowd-intelligence data for pharmacovigilance: a scoping reviewBMC Med Inform Decis Mak201818138

S Rees S Mian N Grabowski Using social media in safety signal management: is it reliable?Ther Adv Drug Saf20189105919

Food Safety Vocabulary. NMRA2020https://www.nmrestaurants.org/food-safety-vocabulary/

B Appendix. Guidelines for Confirmation of Foodborne-Disease Outbreaks5462https://www.cdc.gov/mmwr/preview/mmwrhtml/ss4901a3.htm

A Krishna V Akhilesh A Aich C Hegde V Sridhar MC Padma KAR Rao Sentiment Analysis of Restaurant Reviews Using Machine Learning TechniquesEmerging Research in Electronics, Computer Science and TechnologySpringer201968796

N Hossain MR Bhuiyan ZN Tumpa SA Hossain Sentiment Analysis of Restaurant Reviews using Combined CNN-LSTM11th International Conference on Computing, Communication and Networking Technologies (ICCCNT)202015

RA Laksono KR Sungkono R Sarno CS Wahyuni Sentiment Analysis of Restaurant Customer Reviews on TripAdvisor using Naïve Bayes12th International Conference on Information & Communication Technology and System (ICTS)20194954

JK Harris JB Hawkins L Nguyen EO Nsoesie G Tuli R Mansour Using Twitter to Identify and Respond to Food Poisoning: The Food Safety STL ProjectJ Public Health Manag Pract201723657780

Keywords

Categories:

Subject: Review Article

Keywords:

Keywords

Consumer health informatics

Digital epidemiology

Framework for surveillance system

Passive surveillance

Food safety regulations

Adverse drug reaction reporting

jats-html.xsl

This is an Open Access (OA) journal, and articles are distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 License, which allows others to remix, tweak, and build upon the work non-commercially, as long as appropriate credit is given and the new creations are licensed under the identical terms.

Article highlights

Article tables

Article images

Article History

Received : 18-12-2022

Accepted : 22-12-2022

Article Metrics

Citation Managers

Download Citation

Bookmark article

Article Indexing

Article Access statistics

Viewed: 1007

PDF Downloaded: 290

Indian Journal of Forensic and Community Medicine

Journal Information

Article Information

A research framework for passive surveillance for food safety from social media: Identification and evaluation of customer reviews for regulatory use and case study of 30 restaurants

Abstract

Introduction

Objectives

Methodology

Figure 1

Study flow chart (PRISMA flowchart)

Results

Typical review on Google Maps for an FBO is as follows

Figure 2

Shortest dependency path algorithm

Table 1

Frequency of keywords across the sample restaurants

Table 2

Thematic area wise frequency of keywords

Figure 3

Proposed framework for the food surveillance

Discussion

Conclusions

Source of Funding

Conflict of Interest

References

Keywords

Keywords

Article History

View Article

Downlaod

Digital Object Identifier (DOI)

Article Metrics

Share article

Citation Managers

Bookmark article

Article Indexing

Article Access statistics