Using Social Media Data to Gain Insights into Community Trends

Workshop Description and Syllabus

In this workshop, you will learn how to use social media data (social media listening) to gain insights into emerging community needs, trends, and conversations.

During the workshop, you will get access to Symplur Signals, a daily updated, growing database of currently of more than 750 million disease- and healthcare-related Twitter user messages (tweets), 6,000 hashtags, 12,000 unique healthcare topics, and 285 million user profiles from Twitter.

Part 1: Using a series of example studies, our partners from Symplur will show you how to best use Symplur Signals and explain what types of questions can be answered leveraging this type of social media data.

Part 2: Attendees will work in groups of four to complete one of the proposed analysis projects and develop data reports. Please see the section Student Data Analysis Projects for more detail.

Deadline workshop 1: Please submit your final data analysis and report by Tuesday, March 10prior to the second workshop. Send it to


  • Understand how to use digital and social media data, focused on more than 750 million Twitter user messages (tweets)
  • Understand how to use that data to identify emerging healthcare, disease and health trends at the local and national levels as well as potential study participants for research projects
  • Understand what types of questions digital and social media data can help answer
  • Understand the use of hashtags (#) in online conversations on Twitter
  • Understand privacy guidelines for using that data in support of your work and research (e.g., protected health information, de-identification vs. anonymization)
  • Develop experience in using the Symplur Signals database to answer research questions

Pre-Workshop Reading

Understanding how individuals and communities talk about healthcare issues, diseases, and health concerns can provide valuable insights that inform multiple areas of work such as business, communications and marketing, social work, medicine, arts, and research. However, this type of knowledge is scarce.

Social media listening describes the analysis of online user conversations. It is being used by the pharmaceutical industry, marketers, and a limited number of researchers who have used manual and computerized approaches to gain insights from social media data (e.g., Lyles et al., 2013 ; Pawelek et al., 2014).

Online disease/health communities are growing, and they are particularly active on the microblogging social media platform Twitter – as evidenced by the growing number of disease-related conversations on Twitter and Symplur’s popular hashtag projects (disease-focused , healthcare-focused). For example, did you know that between June 2012 and October 2014, 13,372 users who self-identified as located in Los Angeles sent 35,295 messages on Twitter using the word “diabetes” in more than a dozen languages, including English, Spanish, Tagalog, Haitian, Korean, and Vietnamese (Symplur Signals)?

Please view and study the following materials prior to the workshop:

Student Data Analysis Projects

Symplur Demonstration

Please choose one of the following case studies. You will work in groups of four to complete one of the proposed analysis projects and to develop a data report.

Deadline workshop 1: Please submit your final data analysis and report by Tuesday, March 10, 2015 prior to the second workshop. Send it to


You are doing marketing research for a healthcare firm that is exploring the potential for a new product idea. The product, a continuous blood glucose monitoring (CGM) device, is designed to be as minimally invasive as possible and to communicate directly with patients’ smartphone devices. You are in charge of gathering novel insights into perceptions of the various healthcare stakeholders and decide to leverage social media data. You focus your analysis on the following aspects/questions:

  • Who is participating in these conversations and what can you learn about them and their communities?
  • What languages are the conversations in?
  • Where (locations in the U.S.) are the conversations regarding CGM taking place on Twitter?
  • How do they reference CGM? What can you say about their perception of currently available practices for monitoring glucose level?
  • Identify examples of CGM-related pain points that users express?
  • What resources regarding CGM are being shared and who are the sources? How influential are they?
  • Describe the opportunity that exists in the market for a new CGM device for direct integration with patients’ smartphone devices. In what areas could it potentially improve healthcare for patients, providers and other third parties?


You have started a new job in a non-profit organization that is focused on increasing childhood vaccination rates for measles in the U.S. Your new supervisor gives you the opportunity to showcase your skills and independently develop a proposed communications plan. S/he asks you to support your recommendation with data but within the organization no market communication data is available. You decide to tap into social media data to develop a data-driven communications plan. You focus your analysis on the following aspects/questions:

  • What can you learn about parents’ (target audience) perspectives on the measles vaccination? Are there different communities to be considered for the planned outreach?
  • What topics are being discussed?
  • What languages are being used?
  • Is there a particular time when the target audience is active online?
  • Who are key influencers among the target audience that are active on Twitter? Are all stakeholders engaged in the discussions?
  • How do the various stakeholders communicate? What different communications styles are being used?
  • What concerns and fears do parents express regarding measles vaccinations? How are they being addressed? Are there any positive stories being told?
  • Do they share resources? What are the sources?
  • Should the non-profit partner with any other active and influential organizations in these communities?
  • Based on the insight you can glean from Twitter social media data, what overall approach would you recommend for communicating to the target audience?


You are a student in a research lab. You have been thinking about proposing a new research idea to your supervisor. You want to find out more about how patients living with chronic pain may or may not openly express their condition and how providers, caretakers, family and friends can deepen their understanding, thereby better supporting those patients. To strengthen your case, you have decided to conduct a preliminary analysis using social media data. By providing preliminary insight into the following questions, you want to persuade your supervisor to apply for funding in support of a broader study. You focus your analysis on the following aspects/questions:

  • What communities on social media exist that chronic pain patients are part of?
  • Find out why chronic pain patients choose to join these communities.
  • How has the activity level of these online communities changed over time?
  • Are all stakeholders (patients, providers, caretakers, family and friends) present in these communities? How do they communicate differently from each other?
  • Can you detect different languages?
  • Where are people who participate in these online conversations located?
  • Describe the interactions between the participants.
  • What topics do they discuss? Are the discussions professional or personal in nature, involving how we may change and improve healthcare?
  • What unique insight can you glean that a typical healthcare provider visit may not pick up on?
  • What value may these communities bring to the patients? To the providers? To family and friends? Should you consider them in your research plan?

Q&A Forum

Please post your comments and questions about your data analysis projects, how to use Symplur Signals, the workshop in general, etc. on our workshop discussion forum so that all students can benefit from the knowledge we generate together.

You can also join our conversation on Twitter using the hashtag #DigiScholar15

P.S. Thanks to the Symplur team for creating the forum!

Training Video

NIH Funding Acknowledgment: Important - All publications resulting from the utilization of SC CTSI resources are required to credit the SC CTSI grant by including the NIH funding acknowledgment and must comply with the NIH Public Access Policy.