Genre Classification Model

Objective. Construct a classification model that can accurately predict the genre of a song using a dataset containing Spotify track information, including artist details and genres. The model will utilize the audio features and potentially the lyrics of each track to make its predictions. Additionally, it will be trained using the artist genre list as well as a separate dataset of tracks that have been labeled with their respective genres.

Constructing a classification model to predict the genre of a song involves several steps, including data preprocessing, feature engineering, model selection, training, and evaluation. Here’s a step-by-step guide to achieve this:

genres_v2['genre'].value_counts()
pop         142
acoustic    139
emo         139
chill       139
grunge      134
punk        134
romance     133
rock        132
sad         132
happy       132
piano       131
hip-hop     130
indie       129
dance       127
techno       95
r-n-b        91
edm          84
Name: genre, dtype: int64

Step 1: Data Collection

Gather the necessary datasets: - Spotify API: Use the Spotify API to collect track information, including audio features (e.g., danceability, energy, key, loudness, mode, speechiness, acousticness, instrumentalness, liveness, valence, tempo). - Artist Genre Data: Gather genres associated with each artist from Spotify’s API. - Lyrics Data: Includes the lyrics of the tracks (optional but can enhance the model’s performance). - Genre Labels: Obtain a labeled dataset of tracks with their respective genres for training.

We want to try and predict a song’s genre based off of these audio features. Spotify provides a “genre seed:” an array of genres associated with the song used for the recommendation function of the API. We use the API to search for the top 1000 songs in a given genre, pull the audio features for each song, and add on the genre label.

popularity acousticness danceability energy instrumentalness liveness loudness speechiness tempo valence key mode time_signature
count 2143.000000 2143.000000 2143.000000 2143.000000 2143.000000 2143.000000 2143.000000 2143.000000 2143.000000 2143.000000 2143.000000 2143.000000 2143.000000
mean 28.085394 0.229999 0.559915 0.654700 0.114476 0.188180 -7.523677 0.078464 122.833333 0.438788 5.211386 0.645824 3.929071
std 27.611174 0.312395 0.165951 0.245871 0.265948 0.151159 4.443536 0.078789 28.301679 0.231271 3.584855 0.478375 0.377482
min 0.000000 0.000001 0.062100 0.001500 0.000000 0.021500 -41.446000 0.022600 42.646000 0.027500 0.000000 0.000000 1.000000
25% 0.000000 0.005250 0.446000 0.488500 0.000000 0.095900 -8.931000 0.035500 100.304500 0.255000 2.000000 0.000000 4.000000
50% 26.000000 0.058200 0.559000 0.710000 0.000044 0.126000 -6.340000 0.048200 123.279000 0.415000 5.000000 1.000000 4.000000
75% 53.000000 0.362000 0.678500 0.856000 0.012050 0.234000 -4.699500 0.083600 139.987000 0.610000 8.000000 1.000000 4.000000
max 85.000000 0.996000 0.969000 0.999000 0.978000 0.988000 -1.264000 0.578000 214.008000 0.980000 11.000000 1.000000 5.000000

Step 2: Data Preprocessing

Clean and preprocess the data to prepare it for model training. - Handle Missing Values: Remove or impute missing data. - Normalize/Standardize Features: Normalize or standardize the audio features to ensure they are on a similar scale. - Text Preprocessing for Lyrics: Tokenize, remove stopwords, and potentially use techniques like TF-IDF or word embeddings for the lyrics.

# Normalize audio features
from sklearn.preprocessing import StandardScaler

# Define audio features
audio_features = ['danceability', 'energy', 'key', 'loudness', 'speechiness', 'acousticness',
                  'instrumentalness', 'liveness', 'valence', 'tempo']

# Fit and transform audio features
scaler = StandardScaler()
df[audio_features] = scaler.fit_transform(df[audio_features])

Step 3. Feature Engineering

Create features from the available data. - Audio Features: Use the provided audio features. - Artist Genre Encoding: Encode categorical features (i.e. genres) using one-hot encoding or other suitable methods. - Lyrics Features: Convert lyrics into numerical features using techniques like TF-IDF, Word2Vec, or BERT embeddings. - Extract additional features from lyrics (e.g., sentiment analysis, topic modeling).

Label Encoder

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()  # Encode the target variable
df['genre_encoded'] = le.fit_transform(df['genre'])

Combine and arrange the data to create a final dataset for training the model.

X = df[['danceability', 'energy', 'key', 'loudness', 'speechiness',
        'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo']]
y = df[['genre_encoded']]

Step 4: Model Selection and Training

Choose an appropriate classification algorithm and train the model. - Algorithms: Consider using algorithms like Random Forest, Gradient Boosting, Support Vector Machine (SVM), or Neural Networks. - Cross-Validation: Use cross-validation to tune hyperparameters and avoid overfitting.

Model Training

The following code divides a dataset into training and testing subsets. It divides the input variables and target variables into 80% training and 20% testing groups at random. The descriptive statistics of the training data are then outputted to aid in data exploration and the identification of possible problems.

from sklearn.model_selection import train_test_split

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) # test_size=0.2

Model Fitting

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from xgboost import XGBClassifier
from sklearn.metrics import classification_report, accuracy_score, roc_auc_score
from prettytable import PrettyTable
# Define models
models = {
    "Logistic Regression": LogisticRegression(max_iter=1000, C=0.5, random_state=42),
    "Random Forest": RandomForestClassifier(n_estimators=100, max_depth=7, min_samples_split=5, random_state=42),
    "SVM": SVC(probability=True, random_state=42),
    "Neural Network": MLPClassifier(hidden_layer_sizes=(100,), max_iter=500, random_state=42),
    "KNN": KNeighborsClassifier(n_neighbors=5),
    "Decision Tree": DecisionTreeClassifier(max_depth=7, min_samples_split=5, random_state=42),
    "Gradient Boost": GradientBoostingClassifier(n_estimators=100, learning_rate=0.05, random_state=42),
    "XGB": XGBClassifier(n_estimators=300, random_state=42)
}

Step 6: Model Evaluation

Evaluate the model’s performance using appropriate metrics. - Metrics: Use accuracy, precision, recall, F1-score, and confusion matrix to assess the model. - Validation Set: Use a separate validation set to test the model’s generalization ability.

We can evaluate the performance of different models using multiple criteria: Accuracy, ROC-AUC, Precision, Recall, and F1-Score. Based on the provided metrics, let’s analyze the performance of each model to determine which one is the best:

# Initialize an empty list to store the results
results = []

# Train models and evaluate
for name, model in models.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    y_prob = model.predict_proba(X_test) if hasattr(model, "predict_proba") else None

    report = classification_report(y_test, y_pred, output_dict=True, zero_division=0)

    results.append({
        "Model": name, 
        "Accuracy": accuracy_score(y_test, y_pred), 
        "ROC-AUC": roc_auc_score(y_test, y_prob, multi_class='ovr') if y_prob is not None else None,
        "Precision": report['weighted avg']['precision'], 
        "Recall": report['weighted avg']['recall'], 
        "F1-Score": report['weighted avg']['f1-score']
    })
  Model Accuracy ROC-AUC Precision Recall F1-Score
0 Logistic Regression 0.292910 0.823484 0.322340 0.292910 0.275275
1 Random Forest 0.266791 0.826900 0.257971 0.266791 0.237800
2 SVM 0.270522 0.816350 0.275629 0.270522 0.252128
3 Neural Network 0.268657 0.815905 0.270748 0.268657 0.258513
4 KNN 0.203358 0.662278 0.201823 0.203358 0.191504
5 Decision Tree 0.238806 0.729930 0.262212 0.238806 0.223554
6 Gradient Boost 0.270522 0.808805 0.269965 0.270522 0.263241
7 XGB 0.248134 0.778676 0.249968 0.248134 0.241814

Conclusion

Considering all metrics, Random Forest appears to be the best model overall due to its high performance across multiple metrics (accuracy and ROC-AUC), while the Neural Network also performs well with the highest precision and F1-Score.

These results suggest that the Random Forest, Neural Network, and SVM models are the most effective for this specific task of predicting the genre of a song using the given dataset.


Step 7: Model Deployment

Deploy the model for practical use. - Save the Model: Save the trained model using libraries like joblib or pickle. - API Creation: Create an API using Flask or FastAPI to make predictions on new data.

import joblib

# Train the Random Forest model
random_forest_model = RandomForestClassifier(n_estimators=100, max_depth=7, min_samples_split=5, random_state=42)
random_forest_model.fit(X_train, y_train)

# Save scaler + model for future use
joblib.dump(scaler, 'scaler.pkl')
joblib.dump(random_forest_model, 'random_forest_model.pkl')
['random_forest_model.pkl']
# Evaluate the model
y_pred = random_forest_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Accuracy: 0.2667910447761194

Applying Model to New Data

# Load the trained models + scaler
random_forest_model = joblib.load('random_forest_model.pkl')
scaler = joblib.load('scaler.pkl')
# Load new data
df_new = pd.read_csv("../assets/data/all_tracks+lyrics.csv")

# Extract relevant columns
new_data = df_new[['name', 'danceability', 'energy', 'key', 'loudness', 'speechiness', 
                   'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo']]

# Preprocess new data
X_new_data = new_data.drop(columns=['name'])
X_new_data_scaled = scaler.transform(X_new_data)
# Make predictions
predictions = random_forest_model.predict(X_new_data_scaled)
probabilities = random_forest_model.predict_proba(X_new_data_scaled)

# Decode the predicted genre labels
predictions_label = le.inverse_transform(predictions)

# Display predictions
new_data['Predicted Genre'] = predictions
new_data['Predicted Genre Label'] = predictions_label
new_data['Prediction Probabilities'] = probabilities.tolist()

new_data[['name', 'Predicted Genre Label', 'Predicted Genre', 'Prediction Probabilities']]
name Predicted Genre Label Predicted Genre Prediction Probabilities
0 Please Please Please pop 10 [0.04088154652446443, 0.07714395919805248, 0.0...
1 Si Antes Te Hubiera Conocido pop 10 [0.052278933050807844, 0.06228149646432559, 0....
2 BIRDS OF A FEATHER dance 2 [0.07264102912485536, 0.10432647001182196, 0.1...
3 Good Luck, Babe! pop 10 [0.05870991975676431, 0.09131179743471457, 0.0...
4 A Bar Song (Tipsy) pop 10 [0.04773552198684028, 0.0836338792955852, 0.04...
5 Not Like Us hip-hop 7 [0.02365012599849557, 0.07161269506638383, 0.0...
6 MILLION DOLLAR BABY pop 10 [0.02922418578908816, 0.0647413931106139, 0.08...
7 Too Sweet pop 10 [0.025137350612849513, 0.07401057278171215, 0....
8 Beautiful Things romance 14 [0.05636255764656002, 0.06888807365798695, 0.0...
9 I Had Some Help (Feat. Morgan Wallen) happy 6 [0.0324812151873648, 0.07718756832982102, 0.09...
10 Espresso pop 10 [0.03976771088890484, 0.11091839764030585, 0.0...
11 i like the way you kiss me rock 13 [0.00560173930301027, 0.027934558669221385, 0....
12 Stargazing romance 14 [0.07844059571018996, 0.08715430776260266, 0.0...
13 LUNCH dance 2 [0.06268786025919988, 0.04074153271739373, 0.2...
14 End of Beginning dance 2 [0.058459964608319995, 0.045299973823992, 0.14...
15 we can't be friends (wait for your love) pop 10 [0.04767297734830211, 0.09013230830759586, 0.0...
16 Lose Control romance 14 [0.06999320818385321, 0.09762855667926634, 0.0...
17 Tough acoustic 0 [0.1487665952511997, 0.07956401979979427, 0.02...
18 Austin pop 10 [0.05794414166297137, 0.07773720145272671, 0.0...
19 I Can Do It With a Broken Heart pop 10 [0.04152820229550991, 0.07795210201071162, 0.0...
20 Houdini pop 10 [0.012428955551854977, 0.04178841837693284, 0....
21 Nasty pop 10 [0.07229835155549594, 0.07798138478900962, 0.0...
22 Belong Together pop 10 [0.1184642865957953, 0.08894661601186675, 0.04...
23 Slow It Down romance 14 [0.04562594460281343, 0.06860379702332979, 0.0...
24 HOT TO GO! pop 10 [0.018093243817939027, 0.07192985122486334, 0....
25 GIRLS pop 10 [0.030047792403698946, 0.07162699583368382, 0....
26 greedy pop 10 [0.0391825562304293, 0.07982381429220817, 0.04...
27 Move dance 2 [0.03153192004354079, 0.0632106471284883, 0.27...
28 Fortnight (feat. Post Malone) acoustic 0 [0.2133077064313661, 0.10533632362310526, 0.00...
29 Saturn sad 15 [0.15633979786968363, 0.09415369169525718, 0.0...
30 28 chill 1 [0.08481086165547902, 0.14493631982323849, 0.0...
31 Close To You pop 10 [0.043913102280624665, 0.07931487946844533, 0....
32 the boy is mine pop 10 [0.03326174503752976, 0.07819873548158905, 0.0...
33 Stick Season acoustic 0 [0.15164660763833773, 0.08639492608005701, 0.0...
34 I Don't Wanna Wait pop 10 [0.04034787361025403, 0.07919812766986792, 0.0...
35 Smeraldo Garden Marching Band (feat. Loco) pop 10 [0.012025901882804802, 0.0892535139898251, 0.0...
36 Stumblin' In dance 2 [0.03182662544837536, 0.06430270117639993, 0.1...
37 360 dance 2 [0.08284826354380695, 0.06311558351697714, 0.2...
38 Rockstar pop 10 [0.028486403608907106, 0.06298695375092218, 0....
39 One Of The Girls (with JENNIE, Lily Rose Depp) romance 14 [0.0467947393194539, 0.06589514531102753, 0.03...
40 Scared To Start acoustic 0 [0.1788457069830126, 0.10889137568133479, 0.01...
41 Lies Lies Lies romance 14 [0.06906881049759486, 0.12520424564543942, 0.0...
42 feelslikeimfallinginlove dance 2 [0.06058770720420721, 0.10143051335809605, 0.1...
43 Parking Lot pop 10 [0.04112327314489984, 0.08773701130340866, 0.0...
44 Gata Only pop 10 [0.08002179081376724, 0.080812180724709, 0.048...
45 BAND4BAND (feat. Lil Baby) hip-hop 7 [0.009146801699282305, 0.04043458967408511, 0....
46 Santa pop 10 [0.02718144861517665, 0.08773498147207358, 0.0...
47 Magnetic pop 10 [0.030559852949859363, 0.061780658803238436, 0...
48 Water pop 10 [0.025002878313200597, 0.06980209749484347, 0....
49 Illusion pop 10 [0.022670167757488244, 0.06973317734184772, 0....

import pandas as pd
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import warnings
warnings.simplefilter("ignore")

# import track_data
genres_v2 = pd.read_csv("../assets/data/genre_seeds.csv")

client_id = "bd1c5f1d16b94210bc1776e172cbd264"
client_secret = "b152588a487b4f6e9429bdd1bfd92fb3"
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id, client_secret))


def track_features(id, artist_id, note):
    meta = sp.track(id)
    audio_features = sp.audio_features(id)
    artist_info = sp.artist(artist_id)

    if audio_features[0] is None:
        return None

    name = meta['name']
    track_id = meta['id']
    album = meta['album']['name']
    artist = meta['album']['artists'][0]['name']
    artist_id = meta['album']['artists'][0]['id']
    release_date = meta['album']['release_date']
    length = meta['duration_ms']
    popularity = meta['popularity']

    artist_pop = artist_info["popularity"]
    artist_genres = artist_info["genres"]
    artist_followers = artist_info["followers"]['total']

    acousticness = audio_features[0]['acousticness']
    danceability = audio_features[0]['danceability']
    energy = audio_features[0]['energy']
    instrumentalness = audio_features[0]['instrumentalness']
    liveness = audio_features[0]['liveness']
    loudness = audio_features[0]['loudness']
    speechiness = audio_features[0]['speechiness']
    tempo = audio_features[0]['tempo']
    valence = audio_features[0]['valence']
    key = audio_features[0]['key']
    mode = audio_features[0]['mode']
    time_signature = audio_features[0]['time_signature']

    return [name, track_id, album, artist, artist_id, release_date, length, popularity,
            artist_pop, artist_genres, artist_followers, acousticness, danceability,
            energy, instrumentalness, liveness, loudness, speechiness,
            tempo, valence, key, mode, time_signature, note]


# sp.recommendation_genre_seeds() "trip-hop", "trance"
genre_seeds = ["acoustic", "chill", "dance", "edm", "emo", "grunge", "happy", "hip-hop", "indie",
               "piano", "pop", "punk", "rock", "romance", "sad", "techno", "r-n-b"]

all_genre_seed_tracks = []

for genre in genre_seeds:
    genre_rec = sp.recommendations(seed_genres=[genre])['tracks']

    for song in genre_rec:
        song_id = song['id']
        song_artist_id = song['artists'][0]['id']
        song_audio = track_features(
            id=song_id, artist_id=song_artist_id, note=genre)
        all_genre_seed_tracks.append(song_audio)


df = pd.DataFrame(all_genre_seed_tracks,
                  columns=['name', 'track_id', 'album', 'artist', 'artist_id', 'release_date', 'length', 'popularity',
                           'artist_pop', 'artist_genres', 'artist_followers', 'acousticness', 'danceability',
                           'energy', 'instrumentalness', 'liveness', 'loudness', 'speechiness',
                           'tempo', 'valence', 'key', 'mode', 'time_signature', 'genre'])

df_add = df.append(genres_v2, ignore_index=True)
df_add = df_add.drop_duplicates(subset=['track_id', 'genre'])
#df_add = df_add.drop(columns=['Unnamed: 0.1', 'Unnamed: 0'])

df_add.to_csv("../assets/data/genre_seeds.csv", index=None)