Extracting Spotify Data

Getting started, we want to extract data for a set of tracks within one of Spotify’s top-featured playlists. Leveraging the Spotify Web API, we can seamlessly obtain detailed data for a song, such as the performing artist, the album it belongs to, its release date, popularity, and audio features like danceability, energy, and tempo.

Python libraries like spotipy offer a user-friendly way to interact with the Spotify API, offering a range of functions that streamline tasks like API authentication, retrieving playlist data, and obtaining information about any given song.

Accessing the Spotify Web API

To access data from Spotify, we import the spotipy library and the SpotifyClientCredentials module. Additionally, we utilize the pandas package for data manipulation and display. In order to authenticate our access to the Spotify API, we must provide our client ID and client secret to a client credentials manager. Once authenticated, we can use the spotipy module to interact with the Spotify API and retrieve data.

import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import pandas as pd

client_id = "xxx"
client_secret = "xxx"
my_auth = SpotifyClientCredentials(client_id, client_secret)
sp = spotipy.Spotify(auth_manager=my_auth)

Extracting Tracks From a Playlist

The following script enables the compilation of song and artist data from any Spotify playlist through its URI. To analyze a particular playlist, simply copy the URI from the Spotify Player interface and input it into the function defined below. The get_playlist_tracks method returns a complete list of track IDs and corresponding artists from the selected playlist.

def get_playlist_tracks(playlist_URI):
    tracks = []
    results = sp.playlist_tracks(playlist_URI)
    tracks = results["items"]
    while results["next"]:
        results = sp.next(results)
        tracks.extend(results["items"])
    return tracks

Extracting Features from Tracks

The following script utilizes Spotify’s API to extract further details about each song within the playlist. It obtains metadata such as the track name, the artist it’s sung by, the album it belongs to, the release date, and track features such as danceability, tempo, and popularity.

def playlist_features(id, artist_id, playlist_id):
    meta = sp.track(id)
    audio_features = sp.audio_features(id)
    artist_info = sp.artist(artist_id)
    playlist_info = sp.playlist(playlist_id)

    # print(audio_features)

    if audio_features[0] is None:
        return None
    
    

    name = meta['name']
    track_id = meta['id']
    album = meta['album']['name']
    artist = meta['album']['artists'][0]['name']
    artist_id = meta['album']['artists'][0]['id']
    release_date = meta['album']['release_date']
    length = meta['duration_ms']
    popularity = meta['popularity']

    artist_pop = artist_info["popularity"]
    artist_genres = artist_info["genres"]
    artist_followers = artist_info["followers"]['total']

    acousticness = audio_features[0]['acousticness']
    danceability = audio_features[0]['danceability']
    energy = audio_features[0]['energy']
    instrumentalness = audio_features[0]['instrumentalness']
    liveness = audio_features[0]['liveness']
    loudness = audio_features[0]['loudness']
    speechiness = audio_features[0]['speechiness']
    tempo = audio_features[0]['tempo']
    valence = audio_features[0]['valence']
    key = audio_features[0]['key']
    mode = audio_features[0]['mode']
    time_signature = audio_features[0]['time_signature']
    
    playlist_name = playlist_info['name']

    return [name, track_id, album, artist, artist_id, release_date, length, popularity, 
            artist_pop, artist_genres, artist_followers, acousticness, danceability, 
            energy, instrumentalness, liveness, loudness, speechiness, 
            tempo, valence, key, mode, time_signature, playlist_name]

Choose a specific playlist to analyze by copying the URL from the Spotify Player interface. Using that link, the playlist_tracks method retrieves a list of IDs and corresponding artists for each track from the playlist. Specifically, we analyze Spotify’s Today’s Top Hits playlist.

playlist_links = [top_playlists['id'][0]]

for playlist_URI in playlist_links:
    # playlist_URI = link.split("/")[-1].split("?")[0]
    
    all_tracks = [  # Loop over track ids
    playlist_features(i["track"]["id"], i["track"]["artists"][0]["uri"], playlist_URI)
    for i in get_playlist_tracks(playlist_URI)
]

Putting it all together, the get_playlist_tracks function retrieves basic details for each song in a specified Spotify playlist using its URI. The playlist_features function then iterates through these tracks using their IDs to extract additional information, such as danceability, energy, loudness, speechiness, acousticness, instrumentalness, liveness, valence, tempo, and more. From there, we create a Pandas dataframe by passing in the extracted information.

Loudness Scaled
# Loudness Scaled
from sklearn import preprocessing 

scaler = preprocessing.MinMaxScaler()
# scale loudness to fit the same range [0, 1]
loudness2 = df["loudness"].values
loudness_scaled=scaler.fit_transform(loudness2.reshape(-1, 1))
df['loudness_scaled'] = loudness_scaled
name track_id album artist artist_id release_date length popularity artist_pop artist_genres ... liveness loudness speechiness tempo valence key mode time_signature playlist loudness_scaled
0 Please Please Please 5N3hjp1WNayUPZrA8kJmJP Please Please Please Sabrina Carpenter 74KM79TiuVKeVCqs8QtB0B 2024-06-06 186365 98 91 [pop] ... 0.1040 -6.073 0.0540 107.071 0.579 9 1 4 Today’s Top Hits 0.575663
1 Si Antes Te Hubiera Conocido 6WatFBLVB0x077xWeoVc2k Si Antes Te Hubiera Conocido KAROL G 790FomKkXshlbRYZFtlgla 2024-06-21 195824 91 89 [reggaeton, reggaeton colombiano, trap latino,... ... 0.0678 -6.795 0.0469 128.027 0.787 11 1 4 Today’s Top Hits 0.495503
2 BIRDS OF A FEATHER 6dOtVTDdiauQNBQEDOtlAB HIT ME HARD AND SOFT Billie Eilish 6qqNVTkY8uBg9cP3Jd7DAH 2024-05-17 210373 98 94 [art pop, pop] ... 0.1170 -10.171 0.0358 104.978 0.438 2 1 4 Today’s Top Hits 0.120684
3 Good Luck, Babe! 0WbMK4wrZ1wFSty9F7FCgu Good Luck, Babe! Chappell Roan 7GlBOeep6PqTfFi59PTUUN 2024-04-05 218423 94 86 [indie pop, pov: indie] ... 0.0881 -5.960 0.0356 116.712 0.785 11 0 4 Today’s Top Hits 0.588209
4 A Bar Song (Tipsy) 2FQrifJ1N335Ljm3TjTVVf A Bar Song (Tipsy) Shaboozey 3y2cIKLjiOlp1Np37WiUdH 2024-04-12 171291 93 81 [pop rap] ... 0.0804 -4.950 0.0273 81.012 0.604 9 1 4 Today’s Top Hits 0.700344

5 rows × 25 columns