ENG | Finding duplicates in Spotify playlist
A Python script to find duplicate tracks in your Spotify playlists using the Spotipy library and Pandas.
Managing large playlists on Spotify can be a daunting task, especially when it comes to dealing with duplicate tracks. Whether it’s from different albums, or inadvertent additions (especially from web client), duplicates can clutter your playlists and disrupt your listening experience. In this post, we’ll explore a Python script that leverages the Spotipy library and Pandas to help you identify and manage duplicate tracks in your Spotify playlists, ensuring a streamlined and organized music library.
Prerequisites
Python libraries: Spotipy and Pandas
It needs spotipy
and pandas
modules that can be installed by
Linux:
1
pip install spotipy pandas
Windows:
1
py -m pip install spotipy pandas
Spotify API keys
Visit Spotify development dashboard and create your ClientID and ClientSecret.
Read the doc
Script
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
#!/usr/bin/env python3
# Author: Pavel Perina
# Changelog:
# * 2022-12-28 Initial version
# * 2024-04-26 API update
import spotipy as sp
import pandas as pd
from spotipy.oauth2 import SpotifyClientCredentials
CLIENT_ID="***AddYourOwn***"
CLIENT_SECRET="***AddYourOwn***"
PLAYLIST_LINK="https://open.spotify.com/playlist/6JVCQmscHKSf4q78SnyChq"
# authentication
client_credentials_manager = SpotifyClientCredentials(
client_id=CLIENT_ID,
client_secret=CLIENT_SECRET
)
# Function to extract MetaData from a playlist thats longer than 100 songs
def get_playlist_tracks_more_than_100_songs(playlist_id):
session = sp.Spotify(client_credentials_manager=client_credentials_manager)
results = session.playlist_items(playlist_id)
tracks = results['items']
print(f"Fetching results ({results['total']}): .", end='', flush=True)
while results['next']:
results = session.next(results)
tracks.extend(results['items'])
print(".", end='', flush=True)
results = tracks
print(" done")
print("Creating database")
track_data = []
for track_item in tracks:
track = track_item['track']
if track:
track_info = {
'id': track['id'],
'title': track['name'],
'all_artists': ", ".join([artist['name'] for artist in track['artists']]),
'popularity': track['popularity'],
'release_date': track['album']['release_date'],
'album': track['album']['name'],
'added_at': track_item['added_at'],
}
track_data.append(track_info)
return pd.DataFrame(track_data)
playlist_id = PLAYLIST_LINK.split("/")[-1]
df = get_playlist_tracks_more_than_100_songs(playlist_id)
print("Saving database")
df.to_csv("spotify_rock.csv", sep='\t', encoding="utf-8")
print("Saving duplicates")
g=df.groupby('title')['all_artists'].value_counts()
g[g>1].to_csv("duplicates.csv")
Example outputs
1
2
3
4
5
6
7
id title all_artists popularity release_date album added_at
0 0lcsIpA9Ff2DZufaMIQe4k Crocodile Tears little hurricane 0 2012-05-01 Homewrecker 2015-08-06T15:58:51Z
5 2FbkxLsDnOOJUsp61T8MiA A View To A Kill Duran Duran 41 2005-10-31 Singles Box '81 - '85 2015-08-06T15:58:51Z
9 4PSiPZp8MYMDZzuBhCLgc6 Wasted Years - 1998 Remaster Iron Maiden 0 1986 Somewhere In Time 2015-08-06T15:58:51Z
15 34bdE38G1hhlxZanAEBewY It's No Good Depeche Mode 35 1997-04-14 Ultra 2015-08-06T15:58:51Z
17 3pQNcwnGQycAACpwu7IY3G One Night in Bangkok Murray Head 0 2012-04-17 Retro Disco Cocktail (Party Album) 2015-08-06T15:58:51Z
20 3wfujdbamR3Z46F4xav7LM Drive The Cars 67 1984-03-13 Heartbeat City 2015-08-06T15:58:51Z
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
title,all_artists,count
All of This and Nothing,"Dave Gahan, Soulsavers",2
Angels,Within Temptation,2
Chariots Of Fire,Vangelis,2
Crazy,Aerosmith,2
Dream On,Aerosmith,2
Edge of Seventeen,Stevie Nicks,2
Englishman In New York,Sting,2
Lava,Special Providence,2
Lookin' Out My Back Door,Creedence Clearwater Revival,2
Radar Love,Golden Earring,2
Ruby,Kaiser Chiefs,2
Sky Is Over,Serj Tankian,2
Sleeping Satellite,Tasmin Archer,2
Sunshine Of Your Love,Cream,2
Tell It to My Heart,Taylor Dayne,2
Wonderful Life,Black,2
At this point it’s necessary to go trough spotify playlist, sort it by artist and remove duplicates.
Is it realiable? No, there are always entries such as “Song”, “Song - Remastered 2008”, … and also comparison should be case insensitive, but it’s at least something.