Document Type

Student Research Paper


Spring 2021

Academic Department

Computer Science

Faculty Advisor(s)

Dr. Peilong Li


The term ‘Broadway’ refers to the live theater performances, either plays or musicals, that take place in the 41 professional 500-seat-or-more theaters located in the Theater District and Lincoln Center in New York City, NY. The data utilized originated from, and it supplies detailed Broadway grosses broken down by week, theater, and individual shows dating back to 1985. To supplement this, data for two other datasets was collected. The first dataset consists of Tony Award wins broken down by show (close to 800 total plays and musicals) for each Broadway season (year) dating back to the year 1997. The second dataset includes opinions and opening night/opening week reviews of shows scraped from the New York Times website for executing sentiment analysis. Using all of these datasets together, it possible to find what key factors produce high sales and long runs for shows on Broadway.


Honors Senior Thesis; Honors in the Discipline; DS 495 Data Science Capstone

Included in

Data Science Commons