Genre Classification Model

Objective. Construct a classification model that can accurately predict the genre of a song using a dataset containing Spotify track information, including artist details and genres. The model will utilize the audio features and potentially the lyrics of each track to make its predictions. Additionally, it will be trained using the artist genre list as well as a separate dataset of tracks that have been labeled with their respective genres.

Constructing a classification model to predict the genre of a song involves several steps, including data preprocessing, feature engineering, model selection, training, and evaluation. Here’s a step-by-step guide to achieve this:

genres_v2['genre'].value_counts()

pop         142
acoustic    139
emo         139
chill       139
grunge      134
punk        134
romance     133
rock        132
sad         132
happy       132
piano       131
hip-hop     130
indie       129
dance       127
techno       95
r-n-b        91
edm          84
Name: genre, dtype: int64

Step 1: Data Collection

Gather the necessary datasets: - Spotify API: Use the Spotify API to collect track information, including audio features (e.g., danceability, energy, key, loudness, mode, speechiness, acousticness, instrumentalness, liveness, valence, tempo). - Artist Genre Data: Gather genres associated with each artist from Spotify’s API. - Lyrics Data: Includes the lyrics of the tracks (optional but can enhance the model’s performance). - Genre Labels: Obtain a labeled dataset of tracks with their respective genres for training.

We want to try and predict a song’s genre based off of these audio features. Spotify provides a “genre seed:” an array of genres associated with the song used for the recommendation function of the API. We use the API to search for the top 1000 songs in a given genre, pull the audio features for each song, and add on the genre label.

	popularity	acousticness	danceability	energy	instrumentalness	liveness	loudness	speechiness	tempo	valence	key	mode	time_signature
count	2143.000000	2143.000000	2143.000000	2143.000000	2143.000000	2143.000000	2143.000000	2143.000000	2143.000000	2143.000000	2143.000000	2143.000000	2143.000000
mean	28.085394	0.229999	0.559915	0.654700	0.114476	0.188180	-7.523677	0.078464	122.833333	0.438788	5.211386	0.645824	3.929071
std	27.611174	0.312395	0.165951	0.245871	0.265948	0.151159	4.443536	0.078789	28.301679	0.231271	3.584855	0.478375	0.377482
min	0.000000	0.000001	0.062100	0.001500	0.000000	0.021500	-41.446000	0.022600	42.646000	0.027500	0.000000	0.000000	1.000000
25%	0.000000	0.005250	0.446000	0.488500	0.000000	0.095900	-8.931000	0.035500	100.304500	0.255000	2.000000	0.000000	4.000000
50%	26.000000	0.058200	0.559000	0.710000	0.000044	0.126000	-6.340000	0.048200	123.279000	0.415000	5.000000	1.000000	4.000000
75%	53.000000	0.362000	0.678500	0.856000	0.012050	0.234000	-4.699500	0.083600	139.987000	0.610000	8.000000	1.000000	4.000000
max	85.000000	0.996000	0.969000	0.999000	0.978000	0.988000	-1.264000	0.578000	214.008000	0.980000	11.000000	1.000000	5.000000

Step 2: Data Preprocessing

Clean and preprocess the data to prepare it for model training. - Handle Missing Values: Remove or impute missing data. - Normalize/Standardize Features: Normalize or standardize the audio features to ensure they are on a similar scale. - Text Preprocessing for Lyrics: Tokenize, remove stopwords, and potentially use techniques like TF-IDF or word embeddings for the lyrics.

# Normalize audio features
from sklearn.preprocessing import StandardScaler

# Define audio features
audio_features = ['danceability', 'energy', 'key', 'loudness', 'speechiness', 'acousticness',
                  'instrumentalness', 'liveness', 'valence', 'tempo']

# Fit and transform audio features
scaler = StandardScaler()
df[audio_features] = scaler.fit_transform(df[audio_features])

Step 3. Feature Engineering

Create features from the available data. - Audio Features: Use the provided audio features. - Artist Genre Encoding: Encode categorical features (i.e. genres) using one-hot encoding or other suitable methods. - Lyrics Features: Convert lyrics into numerical features using techniques like TF-IDF, Word2Vec, or BERT embeddings. - Extract additional features from lyrics (e.g., sentiment analysis, topic modeling).

Label Encoder

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()  # Encode the target variable
df['genre_encoded'] = le.fit_transform(df['genre'])

Combine and arrange the data to create a final dataset for training the model.

X = df[['danceability', 'energy', 'key', 'loudness', 'speechiness',
        'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo']]
y = df[['genre_encoded']]

Step 4: Model Selection and Training

Choose an appropriate classification algorithm and train the model. - Algorithms: Consider using algorithms like Random Forest, Gradient Boosting, Support Vector Machine (SVM), or Neural Networks. - Cross-Validation: Use cross-validation to tune hyperparameters and avoid overfitting.

Model Training

The following code divides a dataset into training and testing subsets. It divides the input variables and target variables into 80% training and 20% testing groups at random. The descriptive statistics of the training data are then outputted to aid in data exploration and the identification of possible problems.

from sklearn.model_selection import train_test_split

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) # test_size=0.2

Model Fitting

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from xgboost import XGBClassifier
from sklearn.metrics import classification_report, accuracy_score, roc_auc_score
from prettytable import PrettyTable

# Define models
models = {
    "Logistic Regression": LogisticRegression(max_iter=1000, C=0.5, random_state=42),
    "Random Forest": RandomForestClassifier(n_estimators=100, max_depth=7, min_samples_split=5, random_state=42),
    "SVM": SVC(probability=True, random_state=42),
    "Neural Network": MLPClassifier(hidden_layer_sizes=(100,), max_iter=500, random_state=42),
    "KNN": KNeighborsClassifier(n_neighbors=5),
    "Decision Tree": DecisionTreeClassifier(max_depth=7, min_samples_split=5, random_state=42),
    "Gradient Boost": GradientBoostingClassifier(n_estimators=100, learning_rate=0.05, random_state=42),
    "XGB": XGBClassifier(n_estimators=300, random_state=42)
}

Step 6: Model Evaluation

Evaluate the model’s performance using appropriate metrics. - Metrics: Use accuracy, precision, recall, F1-score, and confusion matrix to assess the model. - Validation Set: Use a separate validation set to test the model’s generalization ability.

We can evaluate the performance of different models using multiple criteria: Accuracy, ROC-AUC, Precision, Recall, and F1-Score. Based on the provided metrics, let’s analyze the performance of each model to determine which one is the best:

# Initialize an empty list to store the results
results = []

# Train models and evaluate
for name, model in models.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    y_prob = model.predict_proba(X_test) if hasattr(model, "predict_proba") else None

    report = classification_report(y_test, y_pred, output_dict=True, zero_division=0)

    results.append({
        "Model": name, 
        "Accuracy": accuracy_score(y_test, y_pred), 
        "ROC-AUC": roc_auc_score(y_test, y_prob, multi_class='ovr') if y_prob is not None else None,
        "Precision": report['weighted avg']['precision'], 
        "Recall": report['weighted avg']['recall'], 
        "F1-Score": report['weighted avg']['f1-score']
    })

	Model	Accuracy	ROC-AUC	Precision	Recall	F1-Score
0	Logistic Regression	0.292910	0.823484	0.322340	0.292910	0.275275
1	Random Forest	0.266791	0.826900	0.257971	0.266791	0.237800
2	SVM	0.270522	0.816350	0.275629	0.270522	0.252128
3	Neural Network	0.268657	0.815905	0.270748	0.268657	0.258513
4	KNN	0.203358	0.662278	0.201823	0.203358	0.191504
5	Decision Tree	0.238806	0.729930	0.262212	0.238806	0.223554
6	Gradient Boost	0.270522	0.808805	0.269965	0.270522	0.263241
7	XGB	0.248134	0.778676	0.249968	0.248134	0.241814

Conclusion

Considering all metrics, Random Forest appears to be the best model overall due to its high performance across multiple metrics (accuracy and ROC-AUC), while the Neural Network also performs well with the highest precision and F1-Score.

These results suggest that the Random Forest, Neural Network, and SVM models are the most effective for this specific task of predicting the genre of a song using the given dataset.

Step 7: Model Deployment

Deploy the model for practical use. - Save the Model: Save the trained model using libraries like joblib or pickle. - API Creation: Create an API using Flask or FastAPI to make predictions on new data.

import joblib

# Train the Random Forest model
random_forest_model = RandomForestClassifier(n_estimators=100, max_depth=7, min_samples_split=5, random_state=42)
random_forest_model.fit(X_train, y_train)

# Save scaler + model for future use
joblib.dump(scaler, 'scaler.pkl')
joblib.dump(random_forest_model, 'random_forest_model.pkl')

['random_forest_model.pkl']

# Evaluate the model
y_pred = random_forest_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.2667910447761194

Applying Model to New Data

# Load the trained models + scaler
random_forest_model = joblib.load('random_forest_model.pkl')
scaler = joblib.load('scaler.pkl')

# Load new data
df_new = pd.read_csv("../assets/data/all_tracks+lyrics.csv")

# Extract relevant columns
new_data = df_new[['name', 'danceability', 'energy', 'key', 'loudness', 'speechiness', 
                   'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo']]

# Preprocess new data
X_new_data = new_data.drop(columns=['name'])
X_new_data_scaled = scaler.transform(X_new_data)

# Make predictions
predictions = random_forest_model.predict(X_new_data_scaled)
probabilities = random_forest_model.predict_proba(X_new_data_scaled)

# Decode the predicted genre labels
predictions_label = le.inverse_transform(predictions)

# Display predictions
new_data['Predicted Genre'] = predictions
new_data['Predicted Genre Label'] = predictions_label
new_data['Prediction Probabilities'] = probabilities.tolist()

new_data[['name', 'Predicted Genre Label', 'Predicted Genre', 'Prediction Probabilities']]

	name	Predicted Genre Label	Predicted Genre	Prediction Probabilities
0	Please Please Please	pop	10	[0.04088154652446443, 0.07714395919805248, 0.0...
1	Si Antes Te Hubiera Conocido	pop	10	[0.052278933050807844, 0.06228149646432559, 0....
2	BIRDS OF A FEATHER	dance	2	[0.07264102912485536, 0.10432647001182196, 0.1...
3	Good Luck, Babe!	pop	10	[0.05870991975676431, 0.09131179743471457, 0.0...
4	A Bar Song (Tipsy)	pop	10	[0.04773552198684028, 0.0836338792955852, 0.04...
5	Not Like Us	hip-hop	7	[0.02365012599849557, 0.07161269506638383, 0.0...
6	MILLION DOLLAR BABY	pop	10	[0.02922418578908816, 0.0647413931106139, 0.08...
7	Too Sweet	pop	10	[0.025137350612849513, 0.07401057278171215, 0....
8	Beautiful Things	romance	14	[0.05636255764656002, 0.06888807365798695, 0.0...
9	I Had Some Help (Feat. Morgan Wallen)	happy	6	[0.0324812151873648, 0.07718756832982102, 0.09...
10	Espresso	pop	10	[0.03976771088890484, 0.11091839764030585, 0.0...
11	i like the way you kiss me	rock	13	[0.00560173930301027, 0.027934558669221385, 0....
12	Stargazing	romance	14	[0.07844059571018996, 0.08715430776260266, 0.0...
13	LUNCH	dance	2	[0.06268786025919988, 0.04074153271739373, 0.2...
14	End of Beginning	dance	2	[0.058459964608319995, 0.045299973823992, 0.14...
15	we can't be friends (wait for your love)	pop	10	[0.04767297734830211, 0.09013230830759586, 0.0...
16	Lose Control	romance	14	[0.06999320818385321, 0.09762855667926634, 0.0...
17	Tough	acoustic	0	[0.1487665952511997, 0.07956401979979427, 0.02...
18	Austin	pop	10	[0.05794414166297137, 0.07773720145272671, 0.0...
19	I Can Do It With a Broken Heart	pop	10	[0.04152820229550991, 0.07795210201071162, 0.0...
20	Houdini	pop	10	[0.012428955551854977, 0.04178841837693284, 0....
21	Nasty	pop	10	[0.07229835155549594, 0.07798138478900962, 0.0...
22	Belong Together	pop	10	[0.1184642865957953, 0.08894661601186675, 0.04...
23	Slow It Down	romance	14	[0.04562594460281343, 0.06860379702332979, 0.0...
24	HOT TO GO!	pop	10	[0.018093243817939027, 0.07192985122486334, 0....
25	GIRLS	pop	10	[0.030047792403698946, 0.07162699583368382, 0....
26	greedy	pop	10	[0.0391825562304293, 0.07982381429220817, 0.04...
27	Move	dance	2	[0.03153192004354079, 0.0632106471284883, 0.27...
28	Fortnight (feat. Post Malone)	acoustic	0	[0.2133077064313661, 0.10533632362310526, 0.00...
29	Saturn	sad	15	[0.15633979786968363, 0.09415369169525718, 0.0...
30	28	chill	1	[0.08481086165547902, 0.14493631982323849, 0.0...
31	Close To You	pop	10	[0.043913102280624665, 0.07931487946844533, 0....
32	the boy is mine	pop	10	[0.03326174503752976, 0.07819873548158905, 0.0...
33	Stick Season	acoustic	0	[0.15164660763833773, 0.08639492608005701, 0.0...
34	I Don't Wanna Wait	pop	10	[0.04034787361025403, 0.07919812766986792, 0.0...
35	Smeraldo Garden Marching Band (feat. Loco)	pop	10	[0.012025901882804802, 0.0892535139898251, 0.0...
36	Stumblin' In	dance	2	[0.03182662544837536, 0.06430270117639993, 0.1...
37	360	dance	2	[0.08284826354380695, 0.06311558351697714, 0.2...
38	Rockstar	pop	10	[0.028486403608907106, 0.06298695375092218, 0....
39	One Of The Girls (with JENNIE, Lily Rose Depp)	romance	14	[0.0467947393194539, 0.06589514531102753, 0.03...
40	Scared To Start	acoustic	0	[0.1788457069830126, 0.10889137568133479, 0.01...
41	Lies Lies Lies	romance	14	[0.06906881049759486, 0.12520424564543942, 0.0...
42	feelslikeimfallinginlove	dance	2	[0.06058770720420721, 0.10143051335809605, 0.1...
43	Parking Lot	pop	10	[0.04112327314489984, 0.08773701130340866, 0.0...
44	Gata Only	pop	10	[0.08002179081376724, 0.080812180724709, 0.048...
45	BAND4BAND (feat. Lil Baby)	hip-hop	7	[0.009146801699282305, 0.04043458967408511, 0....
46	Santa	pop	10	[0.02718144861517665, 0.08773498147207358, 0.0...
47	Magnetic	pop	10	[0.030559852949859363, 0.061780658803238436, 0...
48	Water	pop	10	[0.025002878313200597, 0.06980209749484347, 0....
49	Illusion	pop	10	[0.022670167757488244, 0.06973317734184772, 0....

import pandas as pd
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import warnings
warnings.simplefilter("ignore")

# import track_data
genres_v2 = pd.read_csv("../assets/data/genre_seeds.csv")

client_id = "bd1c5f1d16b94210bc1776e172cbd264"
client_secret = "b152588a487b4f6e9429bdd1bfd92fb3"
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id, client_secret))


def track_features(id, artist_id, note):
    meta = sp.track(id)
    audio_features = sp.audio_features(id)
    artist_info = sp.artist(artist_id)

    if audio_features[0] is None:
        return None

    name = meta['name']
    track_id = meta['id']
    album = meta['album']['name']
    artist = meta['album']['artists'][0]['name']
    artist_id = meta['album']['artists'][0]['id']
    release_date = meta['album']['release_date']
    length = meta['duration_ms']
    popularity = meta['popularity']

    artist_pop = artist_info["popularity"]
    artist_genres = artist_info["genres"]
    artist_followers = artist_info["followers"]['total']

    acousticness = audio_features[0]['acousticness']
    danceability = audio_features[0]['danceability']
    energy = audio_features[0]['energy']
    instrumentalness = audio_features[0]['instrumentalness']
    liveness = audio_features[0]['liveness']
    loudness = audio_features[0]['loudness']
    speechiness = audio_features[0]['speechiness']
    tempo = audio_features[0]['tempo']
    valence = audio_features[0]['valence']
    key = audio_features[0]['key']
    mode = audio_features[0]['mode']
    time_signature = audio_features[0]['time_signature']

    return [name, track_id, album, artist, artist_id, release_date, length, popularity,
            artist_pop, artist_genres, artist_followers, acousticness, danceability,
            energy, instrumentalness, liveness, loudness, speechiness,
            tempo, valence, key, mode, time_signature, note]



# sp.recommendation_genre_seeds() "trip-hop", "trance"
genre_seeds = ["acoustic", "chill", "dance", "edm", "emo", "grunge", "happy", "hip-hop", "indie",
               "piano", "pop", "punk", "rock", "romance", "sad", "techno", "r-n-b"]

all_genre_seed_tracks = []

for genre in genre_seeds:
    genre_rec = sp.recommendations(seed_genres=[genre])['tracks']

    for song in genre_rec:
        song_id = song['id']
        song_artist_id = song['artists'][0]['id']
        song_audio = track_features(
            id=song_id, artist_id=song_artist_id, note=genre)
        all_genre_seed_tracks.append(song_audio)


df = pd.DataFrame(all_genre_seed_tracks,
                  columns=['name', 'track_id', 'album', 'artist', 'artist_id', 'release_date', 'length', 'popularity',
                           'artist_pop', 'artist_genres', 'artist_followers', 'acousticness', 'danceability',
                           'energy', 'instrumentalness', 'liveness', 'loudness', 'speechiness',
                           'tempo', 'valence', 'key', 'mode', 'time_signature', 'genre'])


df_add = df.append(genres_v2, ignore_index=True)
df_add = df_add.drop_duplicates(subset=['track_id', 'genre'])
#df_add = df_add.drop(columns=['Unnamed: 0.1', 'Unnamed: 0'])

df_add.to_csv("../assets/data/genre_seeds.csv", index=None)