Investigating Spotify’s Danceability Index & Other Song Attributes
This study uses R and statistical methods to analyze and visualize data collected on >170,000 Spotify songs and their attributes, with a primary focus on the danceability index.
I recommend also checking out this interactive R ShinyApp website I created that includes further visualization along with a sortable and filterable table of the data that I used: https://dkhurjekar.shinyapps.io/spotify/
Abstract
While music is an art of emotional and poetic expression, it is also inherently mathematical; a sequence of tones and/or words that follow certain patterns and are accompanied by a certain harmony and rhythm is what people call music. A dataset publicly available in Kaggle includes a variety of song attributes, with many being indexes that Spotify has created for their research and analytics. The following analytical study will first utilize tests and confidence intervals to understand the song data and learn how different categorical variables affect each other or affect quantitative variables. The focus will then shift to correlating song attributes, with the ultimate goal of making predictions about song popularity and danceability. Four important conclusions will be made in this study: songs in F# are likely more danceable than songs in C, decade and key appear to be associated, explicit songs are likely more danceable than clean songs, and one can predict average danceability from a given year.
1. Utilizing Tests & Confidence Intervals to Understand the Data
Before getting into the tests and comparative techniques to explore the song data, it is essential to define and describe the key variables around which this study will center. One binary variable will be used in this study: explicitness (songs are either ‘explicit’ or ‘clean’). The categorical variables of interest include decade (ranges from 20s to 10s and includes the years 2020–21) and key (0 = C, 1 = C#… 11 = B). Tables of the binary and categorical variables are shown below:
The dataset also includes several quantitative variables: danceability (an index created by Spotify using tempo, beat, and other variables to measure how easily one can dance to a given song; no danceability = 0 and ranges continuously to high danceability, which = 1), tempo (beats per minute), and loudness (the overall loudness in decibels and that ranges continuously from -60.00 dB to a sample max of 3.855 dB). Some summary statistics of these quantitative variables are shown here:
The standard deviations of danceability, popularity, and loudness respectively are 0.1760, 5.6916, and 0.1708.
Distributions of the three quantitative variables are shown below with vertical lines representing their means and a normal curve displayed over each as a way to identify skewness:
Most analyses in this study will involve some of these above variables, but of the quantitative variables, danceability (a variable that is of particular interest) will remain the center of focus throughout. Except for the left-skewness and low outliers of Loudness, the distributions above appear to be relatively symmetrical with no outliers.
1.1 Key and Danceability
At first, these two variables may seem irrelevant in the context of each other. Key describes the grouping of pitches that a song follows while danceability is defined by how easily one can dance to a given song.
However, from the comparative boxplot shown above, songs that follow three keys, in particular, have danceability levels stand out from the rest:
Specifically, the keys of B, C#, and F# appear to have higher medians than the overall sample median of 0.548. The key of F# would perhaps be the most surprising to most musicians and music enthusiasts, especially since the sample proportion of songs in the key of F# (0.0529) is the second-smallest out of all twelve keys (D# is the lowest at 0.0417). Since C is commonly known as the most popular key (as substantiated by the table shown below with the key of C having the largest sample proportion songs), it would make the most sense to run a 2-sample T-test to find out whether there is convincing evidence of a difference in true mean danceability between songs in the keys of F# and C.
All conditions are met for this test (a simple random sample of songs from the respective decades can be assumed, each sample is independent because sample sizes of 9,226 songs in F# and 21,967 songs in C are both < 10% of all songs in each respective key, and there is normality since both sample sizes are > 30 so the Central Limit Theorem holds). The null hypothesis is that the true mean danceability from both keys is equal, and the alternative hypothesis is that the mean danceability of songs in F# is greater than that of songs in C. Results from the T-test using R’s t.test() function are shown below:
The very low p-value (also attributed to having large samples) indicates that the true difference in mean danceability between the keys of F# and C is likely > 0. One can also be 99% confident that the true difference in mean danceability of songs in F# minus songs in C is greater than 0.0240.
Perhaps this means that this key is “underrated” in that artists who want to create more danceable songs should consider making more songs in the key of F#. In fact, the proportion of songs in the key of F# appears to have steadily increased since the 50s from about 0.035 to about 0.075, as shown in the barplot below:
Maybe artists and/or producers have grown more intelligent in methods and techniques of creating more “danceable” music, but this could also be attributed to the rise of Hip-Hop and Electronic Dance Music (EDM) in recent decades. These songs tend to have heavier bass and more punchy rhythm and beats — elements that possibly make songs easier to dance to according to Spotify’s formula.
**As a side note, the current danceability index created by Spotify does not include key as one of the variables (it rather uses variables like tempo, and considers the presence of beat and rhythm).
1.2 Decade and Key
This general upward trend must lead one to investigate the distribution of the keys of songs over the past decades. Below, a stacked barplot of the distribution of song keys of the past ten decades is shown.
Some change in distribution is evident from the stacked barplot, indicating the need to perform a chi-squared test of independence to make inferences about any possible association between the two categorical values of interest.
A chi-squared test for independence will produce a value that can determine whether or not a song’s key and the decade from which the song comes are both likely independent of each other. Below are two tables that compare different decades for their respective distribution of song keys. The first displays the expected values of each subcategory, and the second shows the observed values (with totals).
The chi-squared test can be run since all conditions are met (a simple random sample from all Spotify songs since 1920 can be assumed, both variables are categorical, expected values are all > 5, and there is no effect of songs from one decade on songs of another). The results of the chi-squared test are below:
As apparent from this chi-squared test for independence, the very low p-value indicates that there is convincing evidence to reject the null hypothesis, meaning that there appears to be an association between key and decade. This likely means that the keys that artists and producers are using for songs have changed over the years.
1.3 Explicitness and Danceability
Explicitness is a binary variable that may potentially have some effect on danceability. The boxplot below compares the danceability of songs based on whether they are clean or explicit.
From the boxplots, it is apparent that explicit songs might be more danceable than clean songs. A 2-sample T-test of difference in means can determine whether there appears to be such a difference. All conditions are met (a simple random sample from each category of explicitness within the larger random sample of songs from 1920–2021 can be assumed, each sample is independent since 11,882 (#explicit songs) and 162,507 (# clean songs) are both < 10% of all explicit and all clean songs respectively, and there is normality because both sample sizes are > 30 so the Central Limit Theorem holds). The null hypothesis is that the true mean danceability for clean songs is equal to that of explicit songs while the alternative hypothesis is that the true difference in means of explicit songs minus clean songs is > 0. Results from the 2-sample T-test for means are shown below:
The low p-value indicates that the null hypothesis can be rejected, meaning that the true difference in mean danceability between songs that are explicit and songs that are clean is > 0. One can also be 99% confident that the true difference in mean danceability of explicit songs minus clean songs is greater than 0.1367. Explicit songs being more danceable might be attributed to the fact that Hip Hop and rap songs, which tend to be explicit more often than other genres, have enunciated beats and rhythms that can contribute toward a more danceable song according to Spotify’s index.
2. Correlating Song Attributes & Predicting Danceability
As mentioned above, artists and producers are likely growing more intelligent, or aware, rather, of the industry and what type of songs, beats, rhythms, melodies, etc. it takes to make a song that people enjoy. If one were to assume that danceability was one of these song traits that producers want to improve in their music, then the following analysis is pertinent.
Above is a series of density plots that display the distribution of song danceability for each decade. While the steeper peaks evident in the 20s and 30s will not be investigated in this study, a trend of increased danceability since the 50s is apparent and will be analyzed. The density curve changes from roughly symmetrical in the 50s to more and more left skewness over the following decades. This trend could potentially be substantiated by creating a linear regression model like this one:
To confirm that this linear regression model is valid, the conditions must first be passed, for which a residual plot of the data, like the one below, is required.
All conditions are passed (the data appears to be relatively linear according to the scatter plot at the bottom left, there appears to be independence among the residuals, which is supported by the fact that the correlation between the danceability and the residuals in the above plot is -1.147791e-14 (essentially 0), the residuals appear to have constant variance throughout the plot, and appear to be relatively normally distributed). The data points on the scatterplot to the left indicate a strong, positive, linear correlation between year and danceability. The equation displayed represents that of the linear regression model, from which one can predict the danceability for a given year. The beta value (slope) of 0.001865 indicates that the average danceability increases by about 0.001865 each year.
For example, one may predict through extrapolation that the average danceability of songs in 2025 will be 0.001865(2025) — 3.163114 = 0.6135 estimated danceability.
3. Conclusion
Besides understanding the basics of the Spotify songs data, four important conclusions have been made in this study: songs in F# are likely more danceable than songs in C, decade and key appear to be associated, explicit songs are likely more danceable than clean songs, and one can predict average danceability from a given year. These inferences and trends can be generalized to all Spotify songs and are thus applicable for current artists and producers in the industry.
Code (from RStudio): https://docs.google.com/document/d/14afTO_GU0J9w2Ov_-5cD4W5R0aGHuI9N3YuCB3dDFIw/edit?usp=sharing
(Note: The code is not cleaned of unused lines; code is in no specific order; please excuse any repetition)
And again, please check out this interactive website I created that provides more visuals and the data itself: https://dkhurjekar.shinyapps.io/spotify/