Introduction

We all listen to music wherever we are! Be it traveling, working, parties, or maybe just to relax. Each of us has our own music taste. But, there are songs which are popular among all of us! What is it about certain songs that causes them to have billions of streams? In this project, we try to understand what factors influence the popularity of songs. This project will be useful for the artists giving them insights into what type of songs remain popular among users, when to release a song, and which parameters make the song a hit. This analysis will also help Spotify, as they can get insights into what kinds of songs to recommend to its users and curate the playlists accordingly.

In this project, we propose to understand the factors which influence the popularity of Spotify songs. We make use of the US Daily Top 200 Spotify charts between 2018 to 2021 and Spotify audio features to perform this analysis. In our analysis, we look at three broad categories: first, the effect of time period on the popularity of genres, second, the happiness of songs played before and after Covid-19, and third, importance of audio features on the popularity of songs.

Dataset

We focus our analysis on the US region from 2018-2021 and use two data sources to create our dataset. First (Figure 1), we make use of the Spotify API which gives access to audio features like danceability, energy, loudness, etc. This API also has metadata like the duration, name of the artist, artist genre, album name, etc. Second (Figure 2), we scrape the daily top 200 charts which gives us the top songs by their stream count. We combine these two data sources to get the final dataset (Figure 3) with 1,710,807 observations.

  1. Spotify Track Features
    (Spotify API)

The Spotify Track Features dataset shows audio features for each track streamed. A full list of these, along with their verbal definitions, can be found on Spotify’s page for developers. There are 12 audio features for each track, including confidence measures acousticness, instrumentalness, liveness, speechiness; perceptual measures danceability, energy, loudness, valence; and descriptors key, duration, mode, tempo.

 

  1. Daily Top 200 Charts
    (Spotify API)

The Daily Top 200 Charts shows the top \(200\) most streamed tracks each day from January 1, 2018 to December 31, 2021. For example, Spotify Daily Top Songs USA shows the daily update of the most played tracks across the US right now. The variables included in this dataset are rank, uri, artist_names, track_name, source, peak_rank, previous_rank, days_on_chart, streams.

 

Final Data

We merged the two data sources into a single data frame with the combination of being the unique identifier of an observation.

Variable Type Description
date Categorical, date Date of the spotify chart
track_id Categorical, str Unique identifier for each track
track_name Categorical, str Title of the track
all_artists Categorical, str List of all artist names that appeared on the track
main_artist Categorical, str Name of the main artist
main_artist_id Categorical, str Unique identifier for each artist
rank Quantitative, int Rank from 1-200 (1 is the most streamed track that day)
streams Quantitative, int Total number of global streams that day
acousticness Quantitative, float Confidence measure of sound through acoustic (1.0 is the most acoustic)
danceability Quantitative, float Dance friendly measurement (1.0 is most danceable)
energy Quantitative, float Perceptual measure of intensity and activity
instrumentalness Quantitative, float Variety of instruments appeared
key Categorical, int Overall key of the track, sets of sharp or flat
liveness Quantitative, float Detection of whether a track was peformed live with an audience
loudness Quantitative, float Overall loudness of a track in decibels (dB)
mode Categorical, int Modality (major or minor) of a track, the type of scale
speechiness Quantitative, float Measures the number of spoken words
tempo Categorical, int Estimated tempo of a track in beats per minute (BPM)
valence Quantitative, float Measure from 0.0 to 1.0 describing the musical positiveness
duration Quantitative, int Duration of track in milliseconds
explicit Categorical, boolean True or false if contains explicit content
genre Categorical, str Name of the genre associated with that track

 


Question Analysis

We examine the below questions in our analysis -

    1. Does time period change the popularity (proportion of streams) of pop, rap, hip hop, r&b, and rock genres in the Spotify US daily charts?
      1. Does whether or not it is a weekday change the popularity (proportion of songs in the top 200 list) of pop, rap, hip hop, r&b, and rock genres in the Spotify US daily charts?
      2. Does whether or not it is during the holiday season change the popularity (proportion of songs in the top 200 list) of pop, rap, hip hop, r&b, and rock genres in the Spotify US daily charts?
      3. Does what meteorological season it is change the popularity (proportion of songs in the top 200 list) of pop, rap, hip hop, r&b, and rock genres in the Spotify US daily charts?
    2. Did the popularity of happy songs (mean valence) in the top 200 Spotify US daily streams change during Covid?
    3. What parameters are the most important in predicting the popularity on Spotify in the US?

 

Question 1

  1. Does time period change the popularity of genres in the Spotify US daily charts?
    (Question 1)

This question explores the proportion of songs of certain genres (pop, rap, hip hop, r&b, and rock) within a certain time-based constraint (weekday vs weekend, the holiday season or not, meteorological season). As the chart above demonstrates, time plays an important part in what genres people listen to. For example, pop has a very strong seasonality component. People have also been listening to more rock, but less r&b. We want to look at smaller periods of time and see what effect they may have on the proportion of genres in Spottiness’s Top 200 charts.

Data

track_id Date Weekday Holiday Season Pop Rap Hiphop Rb Rock
OwbnC9AlJenxp613TYalsGK 2018-04-03 TRUE FALSE spring FALSE TRUE FALSE FALSE FALSE
71J1100y21WplE<ib2ErSA 2021-12-07 TRUE TRUE winter TRUE FALSE FALSE FALSE FALSE
lhy6kKvsPbv7VTcllCw 2018-12-23 FALSE TRUE winter TRUE FALSE FALSE FALSE FALSE
5uCalC9HTNlzGyblSt03vOh 2019-08-13 TRUE FALSE summer TRUE FALSE FALSE FALSE FALSE

The variable of interest is the proportion of songs of a certain genre within a certain time-based constraint (weekday vs weekend, the holiday season or not, meteorological season). This is calculated by grouping the data by the time-based constraint, and then calculating the proportion by summing the column for the genre being tested and dividing by the number of rows in that group. This works because summing the column is just counting the number of TRUEs in that column which is equivalent to the number of songs with that genre.

While the actual calculation is done based on rows and ignores the track_id, in actuality, this calculation is equivalent to counting the unique songs of the genre and multiplying it by the number of days it has appeared on Spotify’s Top 200 chart (in the relevant time-period).

 

Subquestion A

  1. Does whether or not it is a weekday change the popularity (proportion of songs in the top 200 list) of pop, rap, hip hop, r&b, and rock genres in the Spotify US daily charts?

A weekday is considered to be Monday, Tuesday, Wednesday, or Thursday. A weekend is considered to be Friday, Saturday, or Sunday. The following hypothesis tests are performed for each of the following genres: pop, rap, hip hop, r&b, and rock.

We set up the following hypotheses, where \(p\) is the proportion of songs in the Spotify US Top 200 Daily Charts: \(H_0: p_\mathit{weekday} = p_\mathit{weekend}\) versus \(H_A: p_\mathit{weekday} \neq p_\mathit{weekend}\).

  • Null hypothesis (\(H_0\)): the proportion of songs in the Spotify US Top Daily Charts during weekdays is equal to the proportion of songs during weekends.
  • Alternative hypothesis (\(H_A\)): the proportion of songs in the Spotify US Top 200 Daily Charts during weekdays is NOT equal to the proportion of songs during weekends.

 

Statistical Methods

We want to compare the proportion of songs for each genre on weekdays vs weekends. The two samples (weekday and weekend) are independent with independent observations, and the sample sizes \(n_\mathrm{weekday} = 982,758\) and \(n_\mathrm{weekend} = 728,049\) are both large. Under these assumptions, we can use the two-sided two-sample large sample Z-test to compare the proportions for each genre on weekdays vs the weekend. Because the test will be performed on \(5\) genres, a Bonferroni correction will be applied to the significance level by dividing \(0.05\) by \(5\) (the number of genres), resulting in a significance level of \(\alpha = 0.01\).

The Z-value is calculated using a pooled sample proportion in the following equation:

\[\begin{align}Z = \frac{\hat{p}_A - \hat{p}_B}{\sqrt{\hat{p} \left(1- \hat{p} \right) \left(1/n_A + 1/n_B \right)}}, && \hat{p} = \frac{n_A \, \hat{p}_A + n_B \, \hat{p}_B}{n_A + n_B}\end{align}\]

Results
Genre Weekday Weekend Max \(\Delta\) % Difference \(Z\) Value P-Value
Pop 0.3607 0.3512 0.0096 2.65% 12.8944 4.8375e-38
Rap 0.3969 0.4042 -0.0073 -1.84% -9.6403 5.4052e-22
Hip Hop 0.1489 0.1523 -0.0034 -2.25% -6.0738 1.2491e-09
R&B 0.0186 0.0176 0.0010 5.17% 4.6519 3.2891e-06
Rock 0.0340 0.0363 -0.0024 -7.02% -8.3912 4.8109e-17

Based on the p-values produced using a significance level of 0.01, we have evidence to reject the null hypotheses that the proportion of songs by genre is the same on weekdays and the weekend for all the genres tested (pop, rap, hip hop, r&b, and rock).

Looking at the bar chart, however, it is difficult to visually see much difference in the proportions for any genre. This demonstrates that although there is a statistically significant difference, the difference itself is not particularly strong. This is likely due to the extremely large sample size. Interestingly, pop and r&b songs have a higher proportion during the week, whereas rap, hip hop, and rock have a higher proportion during the weekend.

 

Subquestion B

  1. Does whether or not it is during the holiday season change the popularity (proportion of songs in the top 200 list) of pop, rap, hip hop, r&b, and rock genres in the Spotify US daily charts?

The holiday season is considered to be the day after Thanksgiving through December 31st. The hypothesis tests were performed for each of the following genres: pop, rap, hip hop, r&b, and rock.

We set up the following hypotheses, where \(p_\mathit{holiday}\) and \(p_\mathit{no \, holiday}\) are the proportions of songs in the Spotify US Top 200 Daily Charts during the holiday season and the non-holiday season, respectively: \(H_0: p_\mathit{holiday} = p_\mathit{no \, holiday}\) versus \(H_A: p_\mathit{holiday} \neq p_\mathit{no \, holiday}\).

  • Null hypothesis (\(H_0\)): the proportion of songs in the Spotify US Top 200 Daily Charts during the holiday season is equal to the proportion of songs during the rest of the year.
  • Alternative hypothesis (\(H_A\)): the proportion of songs in the Spotify US Top 200 Daily Charts during the holiday season is NOT equal to the proportion of songs during the rest of the year.

 

Statistical Methods

We want to compare the proportion of songs for each genre during the holiday season and otherwise. The two samples (holiday and not_holiday) are independent with independent observations, and the sample sizes \(n_\mathrm{holiday} = 164,294\) and \(n_\mathrm{not \, holiday}= 1,546,513\) are both large. Under these assumptions, we can use the two-sided two-sample large sample Z-test to compare the proportions for each genre during the holiday season and otherwise. Because the test will be performed on \(5\) genres, a Bonferroni correction will be applied to the significance level by dividing \(0.05\) by \(5\) (the number of genres), resulting in a significance level of \(\alpha = 0.01\).

The Z-value is calculated using a pooled sample proportion in the following equation:

\[ \begin{align} Z = \frac{\hat{p}_A - \hat{p}_B}{\sqrt{\hat{p} \left(1- \hat{p} \right) \left(1/n_A + 1/n_B \right)}}, && \hat{p} = \frac{n_A \, \hat{p}_A + n_B \, \hat{p}_B}{n_A + n_B} \end{align} \]

Results
Genre Not Holiday Holiday Max \(\Delta\) % Difference \(Z\) Value P-Value
Pop 0.3614 0.3118 0.0496 13.73% -39.9270 \(\lt\) 2.2e-16
Rap 0.4094 0.3116 0.0978 23.90% -76.9650 \(\lt\) 2.2e-16
Hip Hop 0.1547 0.1096 0.0450 29.12% -48.5640 \(\lt\) 2.2e-16
R&B 0.0184 0.0159 0.0026 13.87% -7.3751 1.6425e-13
Rock 0.0312 0.0710 -0.0398 -127.76% 83.4927 \(\lt\) 2.2e-16

Based on the p-values produced using a significance level of 0.01, we have evidence to reject the null hypotheses that the proportion of songs by genre is the same during the holiday season and otherwise for all the genres tested (pop, rap, hip hop, r&b, and rock).

This difference is also much more apparent in the bar chart as compared to the bar chart for subquestion A comparing weekday and weekend, meaning the difference is not only statistically significant, but also strong for a majority of the genres tested. Another interesting observation is that of the 5 genres tested, only rock has a higher proportion during the holiday season than outside of it.

 

Subquestion C

  1. Does what meteorological season it is change the popularity (proportion of songs in the top 200 list) of pop, rap, hip hop, r&b, and rock genres in the Spotify US daily charts?

Spring is considered to be March, April, and May. Summer is considered to be June, July, and August. Fall is considered to be September, October, and November. Winter is considered to be December, January, and February. The hypothesis tests were performed for each of the following genres: pop, rap, hip hop, r&b, and rock.

We set up the following hypotheses, where \(p_\mathit{spring}\), \(p_\mathit{summer}\), \(p_\mathit{fall}\), and \(p_\mathit{winter}\) are the proportions of songs in the Spotify US Top 200 Daily Charts during the spring, summer, fall, and winter, respectively: \(H_0: p_\mathit{spring} = p_\mathit{summer} = p_\mathit{fall} = p_\mathit{winter}\) versus \(H_A: p_\mathit{season \,1} \neq p_\mathit{season\, 2}\).

  • Null hypothesis (\(H_0\)): there is no difference in the proportion of songs in the Spotify US Top 200 Daily Charts between the different seasons.
  • Alternative hypothesis (\(H_A\)): there is a difference in the proportion of songs in the Spotify US Top 200 Daily Charts between at least two of the four seasons.

 

Statistical Methods

We want to compare the proportion of songs for each genre during each season. The four samples (spring, summer, fall, winter) are independent with independent observations, and the sample sizes are all large with \(n_\mathrm{spring} = 460,979\), \(n_\mathrm{summer} = 413,090\), \(n_\mathrm{fall} = 394,288\), and \(n_\mathrm{winter} = 442,450\).

Under these assumptions, we can use the chi-squared test (which is the equivalent to the two-sided two-sample large sample Z-test, except it can handle more than 2 samples) to compare the proportions for each genre during each season. Because the test will be performed on 5 genres, a Bonferroni correction will be applied to the significance level by dividing 0.05 by 5 (the number of genres), resulting in a significance level of = 0.01.

The p-value was calculated using the chisq.test function with ‘correct’ set to false. The data input into this function looks similar to the following table (without the season column):

 

Results
Genre Spring Summer Fall Winter Max \(\Delta\) % Difference \(X^2\) Value P-Value
Pop 0.3624 0.3629 0.3535 0.3476 0.0153 4.39% 147.82 \(\lt\) 2.2e-16
Rap 0.4137 0.4069 0.4047 0.3751 0.0386 10.28% 701.28 \(\lt\) 2.2e-16
Hip Hop 0.1541 0.1542 0.1504 0.1428 0.0114 7.98% 218.88 \(\lt\) 2.2e-16
R&B 0.0190 0.0152 0.0208 0.0178 0.0056 36.65% 359.73 \(\lt\) 2.2e-16
Rock 0.0285 0.0355 0.0371 0.0394 0.0109 38.19% 827.50 \(\lt\) 2.2e-16

Based on the p-values produced using a significance level of 0.01, we have evidence to reject the null hypotheses that the proportion of songs by genre is the same regardless of season for all the genres tested (pop, rap, hip hop, r&b, and rock).

We can also see that the maximum increase from one season to another is over 35% for both r&b and rock, but less than 5% for pop. Rap has the largest straight difference in proportion with winter being 0.04 lower than spring. Additionally, summer and spring appear to be the most similar overall in terms of proportion of genre.


Question 2

  1. Did the popularity of happy songs in the top 200 Spotify charts change during Covid?
    (Question 2)

To assess the happiness of a song, we analyze the valence: an audio feature from Spotify’s API that describes the musical positiveness conveyed by a track. Tracks with high valence sound more positive (happy, cheerful, euphoric), while tracks with low valence sound more negative (sad, depressed, angry). As the chart above demonstrates, the averaged valence fluctuates with time and potential seasonality effects. For example, we identify peaks near the end of each year during November and December, around the time of the prominent US holiday season. Hence, we may attribute these peaks to seasonal consequences when Christmas music, which tends to be higher valence, populates Spotify’s Top 200 charts for consecutive days.

 

Data

We use the Spotify Daily Top Tracks as described in the Data Description above. Particularly, question 2 makes use of the track_name, valence, and Date columns. Based on these columns, we created a covid variable to define whether a track entry was added to the top 200 playlist before Covid (Date < “03/13/2020”) or after Covid (Date >= “03/13/2020”). Tracks added before Covid (03/13/2020) are labeled before whereas tracks added after Covid are labeled after.

track valence date covid
Wow. 0.388 2019-12-08 before
Moonlight 0.711 2020-07-10 after
Naked 0.238 2018-01-10 before
ROCKSTAR (feat. Roddy Ricch) 0.497 2021-01-30 after
The Middle 0.437 2019-07-03 before

 

Statistical Method

We want to compare the average valence values for track entries added to Spotify’s Top 200 playlists before and after Covid, i.e., March 13, 2020. For the data, the two samples (before and after) are independent with independent observations, and the sample sizes \(n_{\mathrm{before}}\) and \(n_{\mathrm{after}}\) are both large. Under these assumptions, we can use the two-sided two-sample large sample Z-test to compare the mean valences before and after Covid. This means that we compare the Z test statistic to the standard normal distribution.

We set up the following hypotheses, where \(\mu_{\mathrm{before}}\) and \(\mu_{\mathrm{after}}\) are the mean valences per song for songs added before and after Covid, respectively: \(H_0: \mu_{\mathrm{before}} = \mu_{\mathrm{after}}\) versus \(H_A: \mu_{\mathrm{before}} \neq \mu_{\mathrm{after}}\).

  • Null hypothesis (\(H_0\)): there is no difference between the average valences per song from the Spotify US Top 200 Daily Charts before and after Covid.
  • Alternative hypothesis (\(H_A\)): there is a difference between the average valences per song from the Spotify US Top 200 Daily Charts before and after Covid.

 

Results:

First, we calculate the mean (\(\mu_{\mathrm{before}}\), \(\mu_{\mathrm{after}}\)), standard deviation (\(s_{\mathrm{before}}\), \(s_{\mathrm{after}}\)), and size (\(n_{\mathrm{before}}\), \(n_{\mathrm{after}}\)) for each of the two samples.

m = with(df, tapply(valence, covid, mean))
s = with(df, tapply(valence, covid, sd))
n = with(df, tapply(valence, covid, length))
mean (\(\mu_i\)) std dev (\(\sigma_i\)) size (\(n\))
before 0.4572265 0.2014658 1117863
after 0.4826522 0.2272466 592944

Using these values, we can then calculate the test statistic:

\[ \begin{align} Z & = \frac{\left|\bar{X}_\mathrm{before} - \bar{X}_\mathrm{after}\right|}{\left. s^2_\mathrm{before} {\bf\large /} n_\mathrm{before}\right. + \left. s^2_\mathrm{after} {\bf\large /} n_\mathrm{after} \right.} = \frac{\left|0.4572 - 0.4826 \right|}{\sqrt{0.2015^2 / 1117863 + 0.2273^2/592944}} = 72.379 \end{align} \]

Next, we calculate the p-value using the standard normal distribution (since \(n_\mathrm{before}\) and \(n_\mathrm{after}\) are both large). The p-value is a probability about the test statistic, calculated under the assumption that the null hypothesis is true.

If the p-value is less than \(\alpha\) (i.e., \(p \lt 0.05\)), then we reject the null hypothesis of equal means. This would mean that the mean valences per song before and after Covid are not equal (i.e., the popularity of happy songs changed during Covid). If the p-value is greater than \(\alpha\) (i.e., \(p \gt 0.05\)), then we do not reject the null hypothesis of equal means. This would mean that the mean valences per song before and after Covid are equal (i.e., the popularity of happy songs did not change during Covid).

z = (m[1] - m[2] - 0) / sqrt(sum(s^2 / n))
p = 2 * (1 - pnorm(z))

The p-value for the test is \(p \lt 0.001\). Based on the test, we reject the null hypothesis of equal valence means at the \(0.05\) level of significance.

We can also calculate the confidence interval for the difference between population means. A confidence interval provides additional information beyond the hypothesis test. In general, we can interpret a confidence interval as the set of all values of the population parameter that would not have been rejected by the corresponding hypothesis test. We evaluate this by checking whether the confidence interval contains the value 0.

Using the same sample mean (\(\mu_{\mathrm{before}}\), \(\mu_{\mathrm{after}}\)), standard deviation (\(s_{\mathrm{before}}\), \(s_{\mathrm{after}}\)), and size (\(n_{\mathrm{before}}\), \(n_{\mathrm{after}}\)) computed above, the estimated SE is calculated as

\[ \begin{align} SE &= \sqrt{\frac{s^2_\mathrm{before}}{n_\mathrm{before}} + \frac{s^2_\mathrm{after}}{n_\mathrm{after}}} = \sqrt{\frac{{0.2015}^2}{{1117863}} + \frac{{0.2273}^2}{592944}} = 0.00035129 \end{align} \]

We now write the 95% confidence interval for the difference between valence means as follows:

\[ \begin{align} &\left( \left |{\bar{X}_\mathrm{before} - \bar{X}_\mathrm{after}}\right| - 1.96 \times \mathrm{SE}, \left | {\bar{X}_\mathrm{before} - \bar{X}_\mathrm{after}} \right| + 1.96 \times \mathrm{SE} \right) \\ &= \left(\left |{0.4572-0.4826}\right| - 1.96 \times 0.0003513, \left |{0.4572-0.4826}\right| + 1.96 \times 0.0003513 \right) \\ &= \left(0.0247372, 0.02611422\right) \end{align} \]

se = sqrt(s[1]^ 2 / n[1] + s[2]^ 2 / n[2])
z.05 = qnorm(0.975)
lower = m[1] - m[2] - z.05 * se
upper = m[1] - m[2] + z.05 * se

The confidence interval for the difference between the population means is \((0.0247, 0.0261)\), which is very similar to the result from the large-sample procedure. Therefore, since the interval does not contain the value 0, we reject the null hypothesis and conclude that there is statistically significant evidence that the average valence per song differs before and after Covid.

Our graphs display a slightly increasing trend of mean valences after the specified date, marking the pandemic’s beginning. Initially speculating the effects of the pandemic to have a negative impact on valence values, we were surprised to find that valences continued to follow a more positive trend in the years following the onset of Covid. So, despite an increase in reported cases of depression and restlessness during the pandemic, we cannot assume that the majority of the American public has turned to sad (low-valence) songs.


Question 3

  1. What parameters are the most important in predicting the popularity on Spotify in the US?
    (Question 3)

Data

For this question we used the Spotify Daily Top Tracks data described in the dataset section. We then aggregated the data by song id, so each song has its own row. Each song has the same values for each attribute so we take the mean of these attributes. These attributes include explicit, acousticness, danceability, duration, energy, instrumentalness, key, liveness, loudness, mode, speechiness, tempo, and valence. Then, we created a new variable for popularity and named it updated_rank. To calculate the updated_rank we do \(201 - \mathrm{Rank}\) (i.e. rank \(1\) has score \(200\), rank \(200\) has score \(1\)) so a higher updated_rank means the song is performing better in the ranks. We then sum this updated_rank for each song to get a popularity score. This way the longer the song is in the Spotify top 200 the higher the popularity score.

 

Statistical Methods

To answer this question we created three linear regression models and looked at the significance of each independent variable in each model. For each independent variable we look at the p-value for the t-test \(H_0\colon \beta_x=0\) and \(H_a\colon \beta_x \neq 0\) for each variable, \(x\), in the linear regression model. First we started by looking at the correlation matrix.

From the above correlation matrix graph we can see that there is little to no correlation between the other variables and the updated_rank.

Next, we create multiple linear regression models in R. A few of our models implement the backwards AIC method using the step() function. This is an algorithm that starts with a full model and removes the least significant variables one after the other, stopping when it reaches the optimal Akaike Information Criterion. Below are a few of the models we created:

  1. Model 1: Simple additive model using all predictor variables in the correlation matrix above.
    1. Adjusted R-squared: 0.01226
    2. Does not meet linearity or constant variance assumption
  2. Model 2: Model created using backwards AIC and all predictor variables in the correlation matrix above as well as all the interaction terms.
    1. Adjusted R-squared: 0.01749
    2. Does not meet linearity or constant variance assumptions
  3. Model 3: Model created using backwards AIC and all predictor variables in the correlation matrix above as well as all the interaction terms and taking log base 10 of the updated_rank.
    1. Adjusted R-squared: 0.02856
    2. Meets the linearity and constant variance assumptions
  4. Model 4: Model created using backwards AIC and all predictor variables in the correlation matrix above and taking log base 10 of the updated_rank.
    1. Adjusted R-squared: 0.02008
    2. Meets the linearity and constant variance assumptions

After looking over all the models, we decided to use Model 4 for our analysis.

## Full Model
full.model <- lm(log10(updated_rank) ~ acousticness + energy + duration + instrumentalness + key + liveness + loudness + mode + speechiness + tempo + valence + danceability, data = dfQ3)

## Backwards AIC Method
backwards = step(full.model, direction = "backward")

## Reduced Model
model4 <- lm(log10(updated_rank) ~ acousticness + energy + duration + instrumentalness + key + liveness + loudness + speechiness + tempo + valence + danceability, data = dfQ3)

As seen below, the fitted vs. residuals plot seems to have a mean around \(0\), so Model 4 satisfies the linearity assumption. And the residuals are fairly equally spread out, so we can also assume constant variance. Lastly, the sample size is large enough where we can assume normality. Hence, Model 4 meets all the requirements of hypothesis testing for linear regressions.

 

Results

The estimates of the coefficients and their p-values are below:

Estimate P.Value
(Intercept) 2.521 \(\lt\) 2e-16
danceability 1.003 \(\lt\) 2e-16
duration 5.081e-07 0.045577
energy -1.364e-01 0.108633
instrumentalness -5.448e-01 0.000215
mode -7.221e-02 0.008720
speechiness -6.841e-01 9.82e-11
tempo 1.117e-03 0.011740
valence 1.877e-01 0.005147

Overall, we can see that danceability has the highest coefficient estimate out of all the other independent variables, so increasing the danceability score by \(1\) would have more effect on the popularity score than changing any other variables by \(1\). Let’s take a closer look at the relationship between danceability and the popularity score.

From the above graph we can see that the more popular songs tend to have a higher danceability score. From our statistical analysis we can For each increase of \(0.1\) in the danceability score, the popularity score is expected to increase by \(25.99\%\) holding all other variables constant. We decided to look at an increase of \(0.1\) instead of \(1\) since the danceability score ranges from \(0\) to \(1\), so an increment of \(0.1\) made more sense to analyze in this context.

Along with looking at the hypothesis tests in the linear regression model it is also important to analyze the data visualizations. For example, duration is statistically significant and the coefficient is positive, but as we can see in the graph popularity increases as duration increases but only to a certain point. At around \(3.5\) minutes the popularity actually begins to decline. Combining the results from our significance test and the data visualizations above, if a music producer were to ask us what features should they focus on to create a popular song, we would recommend creating a song with high danceability, low instrumentality and low speechiness. They all have the highest coefficients in the linear regression model and the data visualizations support this conclusion.


Conclusion

All of the questions we explored included statistically significant results, but many were not practically significant.

The effect of time period on the popularity of genres was not particularly strong when comparing weekdays and weekends, with a maximum 7% difference. The change during the holiday season, however, was much stronger, with all differences being at least 13% and rock having a 127% increase during the holiday season. Meteorological season was hit or miss, with pop having a 4% difference between winter and summer and rock having a 38% difference between winter and spring.

Based on statistical tests comparing average valences before and after Covid, we reject the null hypothesis and reason that there is statistically significant evidence that the valence per song differs before and after Covid. Though the statistical analysis rejected the null hypothesis indicating that the popularity of happy songs changed during Covid, we determined that there’s no practical significance for the difference in valence values. Hence, the effect is not large enough, and there are too many limitations for the study results to be meaningful in the real world. In other words, the average valence is about the same as it was before the pandemic. Limitations of this analysis include that we only considered one audio feature of a song; however, it might be helpful to look at other features’ roles in happiness levels and their interaction. Another limitation is the seasonal consequences of music listening habits. Though in future works, it’s possible to control this effect by potentially using a seasonally adjusted valence measure.

Almost all of the independent variables in our model are statistically significant. Although the variables are statistically significant, we also look at visualizations of the independent variables vs. popularity score. By analyzing a combination of statistical significance for the variables, value of the coefficient and patterns in the data visualizations we would conclude that to make a song popular on spotify, we would recommend a song with high danceability, low instrumentalness, low speechiness and around 3.5 minutes.

 

Limitations

We faced some major limitations in terms of our dataset which are as follows –

Missing genres for each song - The Spotify API provides us with genres for each artist but not for individual songs. As a result, we mapped all the genres belonging to artists with their songs and used these artists’ genres for each song. This resulted in songs having multiple genres, and some may not be the actual genre of that song.

Popularity of artist - We are biased towards popular artists and tend to like their songs more as compared to other artists. Hence, songs by artists with huge popularity become popular much faster. However, we could not make use of this intuition as we did not have the popularity of the artist at each timestamp. Instead, we had the most recent count of the number of followers for an artist. If we had an artist popularity score at each timestamp, we could have made use of that information and come up with questions like - “Does popularity of artists have an impact on the popularity of songs?”

Different song titles across datasets - Initially, our plan was to combine Billboards Weekly Top 100 charts dataset and Spotify audio features to examine questions like “Which audio features are significant for a song to reach the Billboards top 100 charts?”. However, we couldn’t join the two data sources on the title of the songs in Billboards and Spotify data, because each of them had a slightly different title for the same song and different songs can have the same title. As a result, straightforward joining on titles became impossible.

Limited dataset and seasonality - Our dataset only shows the top 200 songs within a narrow time limit, creating a sampling bias. Another limitation is the seasonal consequences of music listening habits. Though in future works, it’s possible to control this effect by potentially using a seasonally adjusted valence measure.


References

Dave, Dhruvil. 2021. “Billboard "the Hot 100" Songs.” Kaggle. https://doi.org/10.34740/KAGGLE/DS/1211465.
Thompson, Charlie, Daniel Antal, Josiah Parry, Donal Phipps, and Tom Wolff. 2021. Spotifyr: R Wrapper for the Spotify Web API. https://github.com/charlie86/spotifyr.
“Web API Reference: Spotify for Developers.” 2021. Spotify for Developers. https://developer.spotify.com/documentation/web-api/reference/#/.