import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import pandas as pd
client_id = "xxx"
client_secret = "xxx"
my_auth = SpotifyClientCredentials(client_id, client_secret)
sp = spotipy.Spotify(auth_manager=my_auth)Extracting Spotify Data
Getting started, we want to extract data for a set of tracks within one of Spotify’s top-featured playlists. Leveraging the Spotify Web API, we can seamlessly obtain detailed data for a song, such as the performing artist, the album it belongs to, its release date, popularity, and audio features like danceability, energy, and tempo.
Python libraries like spotipy offer a user-friendly way to interact with the Spotify API, offering a range of functions that streamline tasks like API authentication, retrieving playlist data, and obtaining information about any given song.
Accessing the Spotify Web API
To access data from Spotify, we import the spotipy library and the SpotifyClientCredentials module. Additionally, we utilize the pandas package for data manipulation and display. In order to authenticate our access to the Spotify API, we must provide our client ID and client secret to a client credentials manager. Once authenticated, we can use the spotipy module to interact with the Spotify API and retrieve data.
Spotify’s Featured Playlists
Let’s take a look at the popular Spotify playlists. Below, the code retrieves a range of Spotify playlists and generates a dataframe containing details for each playlist, including its name, ID, description, thumbnail, total number of tracks, and follower count. The resulting dataframe is displayed as an HTML table.
username = "spotify"
spotify_playlists = sp.user_playlists(username)
top_playlists = get_top_playlists(username, 6)The function get_top_playlists retrieves all playlists for a given user and returns the playlists with the most followers. Specifically, the function gets all playlists from a given user, iterating over each playlist item to extract the thumbnail image URL, the playlist name, ID, description, total number of tracks, and follower count for the playlist. It then creates a DataFrame from the playlist data, sorts the DataFrame by the number of followers in descending order, and returns the top playlists with the most followers.
| thumbnail | name | id | description | tracks | followers | |
|---|---|---|---|---|---|---|
| 0 | Today’s Top Hits | 37i9dQZF1DXcBWIGoYBM5M | Karol G is on top of the Hottest 50! | 50 | 34735922 | |
| 1 | RapCaviar | 37i9dQZF1DX0XUsuxWHRQd | New music from Eminem, Ice Spice and BossMan DLow. | 50 | 15979449 | |
| 3 | Viva Latino | 37i9dQZF1DX10zKzsJ2jva | Today's top Latin hits, elevando nuestra música. Cover: Natanael Cano, Oscar Maydon | 50 | 15144445 | |
| 12 | All Out 2000s | 37i9dQZF1DX4o1oenSJRJd | The biggest songs of the 2000s. Cover: The Killers | 150 | 12331870 | |
| 7 | Rock Classics | 37i9dQZF1DWXRqgorJj26U | Rock legends & epic songs that continue to inspire generations. Cover: The Rolling Stones | 200 | 12205175 | |
| 14 | All Out 80s | 37i9dQZF1DX4UtSsGT1Sbe | The biggest songs of the 1980s. Cover: Bruce Springsteen | 150 | 11312274 |
Extracting Tracks From a Playlist
The following script enables the compilation of song and artist data from any Spotify playlist through its URI. To analyze a particular playlist, simply copy the URI from the Spotify Player interface and input it into the function defined below. The get_playlist_tracks method returns a complete list of track IDs and corresponding artists from the selected playlist.
def get_playlist_tracks(playlist_URI):
tracks = []
results = sp.playlist_tracks(playlist_URI)
tracks = results["items"]
while results["next"]:
results = sp.next(results)
tracks.extend(results["items"])
return tracksExtracting Features from Tracks
The following script utilizes Spotify’s API to extract further details about each song within the playlist. It obtains metadata such as the track name, the artist it’s sung by, the album it belongs to, the release date, and track features such as danceability, tempo, and popularity.
def playlist_features(id, artist_id, playlist_id):
meta = sp.track(id)
audio_features = sp.audio_features(id)
artist_info = sp.artist(artist_id)
playlist_info = sp.playlist(playlist_id)
# print(audio_features)
if audio_features[0] is None:
return None
name = meta['name']
track_id = meta['id']
album = meta['album']['name']
artist = meta['album']['artists'][0]['name']
artist_id = meta['album']['artists'][0]['id']
release_date = meta['album']['release_date']
length = meta['duration_ms']
popularity = meta['popularity']
artist_pop = artist_info["popularity"]
artist_genres = artist_info["genres"]
artist_followers = artist_info["followers"]['total']
acousticness = audio_features[0]['acousticness']
danceability = audio_features[0]['danceability']
energy = audio_features[0]['energy']
instrumentalness = audio_features[0]['instrumentalness']
liveness = audio_features[0]['liveness']
loudness = audio_features[0]['loudness']
speechiness = audio_features[0]['speechiness']
tempo = audio_features[0]['tempo']
valence = audio_features[0]['valence']
key = audio_features[0]['key']
mode = audio_features[0]['mode']
time_signature = audio_features[0]['time_signature']
playlist_name = playlist_info['name']
return [name, track_id, album, artist, artist_id, release_date, length, popularity,
artist_pop, artist_genres, artist_followers, acousticness, danceability,
energy, instrumentalness, liveness, loudness, speechiness,
tempo, valence, key, mode, time_signature, playlist_name]Choose a specific playlist to analyze by copying the URL from the Spotify Player interface. Using that link, the playlist_tracks method retrieves a list of IDs and corresponding artists for each track from the playlist. Specifically, we analyze Spotify’s Today’s Top Hits playlist.
playlist_links = [top_playlists['id'][0]]
for playlist_URI in playlist_links:
# playlist_URI = link.split("/")[-1].split("?")[0]
all_tracks = [ # Loop over track ids
playlist_features(i["track"]["id"], i["track"]["artists"][0]["uri"], playlist_URI)
for i in get_playlist_tracks(playlist_URI)
]Putting it all together, the get_playlist_tracks function retrieves basic details for each song in a specified Spotify playlist using its URI. The playlist_features function then iterates through these tracks using their IDs to extract additional information, such as danceability, energy, loudness, speechiness, acousticness, instrumentalness, liveness, valence, tempo, and more. From there, we create a Pandas dataframe by passing in the extracted information.
Loudness Scaled
# Loudness Scaled
from sklearn import preprocessing
scaler = preprocessing.MinMaxScaler()
# scale loudness to fit the same range [0, 1]
loudness2 = df["loudness"].values
loudness_scaled=scaler.fit_transform(loudness2.reshape(-1, 1))
df['loudness_scaled'] = loudness_scaled| name | track_id | album | artist | artist_id | release_date | length | popularity | artist_pop | artist_genres | ... | liveness | loudness | speechiness | tempo | valence | key | mode | time_signature | playlist | loudness_scaled | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Please Please Please | 5N3hjp1WNayUPZrA8kJmJP | Please Please Please | Sabrina Carpenter | 74KM79TiuVKeVCqs8QtB0B | 2024-06-06 | 186365 | 98 | 91 | [pop] | ... | 0.1040 | -6.073 | 0.0540 | 107.071 | 0.579 | 9 | 1 | 4 | Today’s Top Hits | 0.575663 |
| 1 | Si Antes Te Hubiera Conocido | 6WatFBLVB0x077xWeoVc2k | Si Antes Te Hubiera Conocido | KAROL G | 790FomKkXshlbRYZFtlgla | 2024-06-21 | 195824 | 91 | 89 | [reggaeton, reggaeton colombiano, trap latino,... | ... | 0.0678 | -6.795 | 0.0469 | 128.027 | 0.787 | 11 | 1 | 4 | Today’s Top Hits | 0.495503 |
| 2 | BIRDS OF A FEATHER | 6dOtVTDdiauQNBQEDOtlAB | HIT ME HARD AND SOFT | Billie Eilish | 6qqNVTkY8uBg9cP3Jd7DAH | 2024-05-17 | 210373 | 98 | 94 | [art pop, pop] | ... | 0.1170 | -10.171 | 0.0358 | 104.978 | 0.438 | 2 | 1 | 4 | Today’s Top Hits | 0.120684 |
| 3 | Good Luck, Babe! | 0WbMK4wrZ1wFSty9F7FCgu | Good Luck, Babe! | Chappell Roan | 7GlBOeep6PqTfFi59PTUUN | 2024-04-05 | 218423 | 94 | 86 | [indie pop, pov: indie] | ... | 0.0881 | -5.960 | 0.0356 | 116.712 | 0.785 | 11 | 0 | 4 | Today’s Top Hits | 0.588209 |
| 4 | A Bar Song (Tipsy) | 2FQrifJ1N335Ljm3TjTVVf | A Bar Song (Tipsy) | Shaboozey | 3y2cIKLjiOlp1Np37WiUdH | 2024-04-12 | 171291 | 93 | 81 | [pop rap] | ... | 0.0804 | -4.950 | 0.0273 | 81.012 | 0.604 | 9 | 1 | 4 | Today’s Top Hits | 0.700344 |
5 rows × 25 columns