Post

ENG | Finding duplicates in Spotify playlist

A Python script to find duplicate tracks in your Spotify playlists using the Spotipy library and Pandas.

Managing large playlists on Spotify can be a daunting task, especially when it comes to dealing with duplicate tracks. Whether it’s from different albums, or inadvertent additions (especially from web client), duplicates can clutter your playlists and disrupt your listening experience. In this post, we’ll explore a Python script that leverages the Spotipy library and Pandas to help you identify and manage duplicate tracks in your Spotify playlists, ensuring a streamlined and organized music library.

Prerequisites

Python libraries: Spotipy and Pandas

It needs spotipy and pandas modules that can be installed by

Linux:

1
pip install spotipy

Windows:

1
py -m pip install spotipy

Spotify API keys

Visit Spotify development dashboard and create your ClientID and ClientSecret.

Read the doc

Spotipy docs

Script

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
#!/usr/bin/env python3

# Author: Pavel Perina
# Changelog:
# * 2022-12-28 Initial version
# * 2024-04-26 API update

import spotipy as sp
import pandas as pd
from spotipy.oauth2 import SpotifyClientCredentials

CLIENT_ID="***AddYourOwn***"
CLIENT_SECRET="***AddYourOwn***"
PLAYLIST_LINK="https://open.spotify.com/playlist/6JVCQmscHKSf4q78SnyChq"

# authentication
client_credentials_manager = SpotifyClientCredentials(
    client_id=CLIENT_ID,
    client_secret=CLIENT_SECRET
)

# Function to extract MetaData from a playlist thats longer than 100 songs
def get_playlist_tracks_more_than_100_songs(playlist_id):
    session = sp.Spotify(client_credentials_manager=client_credentials_manager)
    results = session.playlist_items(playlist_id)
    tracks = results['items']

    print(f"Fetching results ({results['total']}): .", end='', flush=True)
    while results['next']:
        results = session.next(results)
        tracks.extend(results['items'])
        print(".", end='', flush=True)
    results = tracks
    print(" done")

    print("Creating database")
    track_data = []
    for track_item in tracks:
        track = track_item['track']
        if track:
            track_info = {
                'id': track['id'],
                'title': track['name'],
                'all_artists': ", ".join([artist['name'] for artist in track['artists']]),
                'popularity': track['popularity'],
                'release_date': track['album']['release_date'],
                'album': track['album']['name'],
                'added_at': track_item['added_at'],
            }
            track_data.append(track_info)

    return pd.DataFrame(track_data)

playlist_id = PLAYLIST_LINK.split("/")[-1]
df = get_playlist_tracks_more_than_100_songs(playlist_id)
print("Saving database")
df.to_csv("spotify_rock.csv", sep='\t', encoding="utf-8")

print("Saving duplicates")
g=df.groupby('title')['all_artists'].value_counts()
g[g>1].to_csv("duplicates.csv")

Example outputs

1
2
3
4
5
6
7
	id	title	all_artists	popularity	release_date	album	added_at
0	0lcsIpA9Ff2DZufaMIQe4k	Crocodile Tears	little hurricane	0	2012-05-01	Homewrecker	2015-08-06T15:58:51Z
5	2FbkxLsDnOOJUsp61T8MiA	A View To A Kill	Duran Duran	41	2005-10-31	Singles Box '81 - '85	2015-08-06T15:58:51Z
9	4PSiPZp8MYMDZzuBhCLgc6	Wasted Years - 1998 Remaster	Iron Maiden	0	1986	Somewhere In Time	2015-08-06T15:58:51Z
15	34bdE38G1hhlxZanAEBewY	It's No Good	Depeche Mode	35	1997-04-14	Ultra	2015-08-06T15:58:51Z
17	3pQNcwnGQycAACpwu7IY3G	One Night in Bangkok	Murray Head	0	2012-04-17	Retro Disco Cocktail (Party Album)	2015-08-06T15:58:51Z
20	3wfujdbamR3Z46F4xav7LM	Drive	The Cars	67	1984-03-13	Heartbeat City	2015-08-06T15:58:51Z
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
title,all_artists,count
All of This and Nothing,"Dave Gahan, Soulsavers",2
Angels,Within Temptation,2
Chariots Of Fire,Vangelis,2
Crazy,Aerosmith,2
Dream On,Aerosmith,2
Edge of Seventeen,Stevie Nicks,2
Englishman In New York,Sting,2
Lava,Special Providence,2
Lookin' Out My Back Door,Creedence Clearwater Revival,2
Radar Love,Golden Earring,2
Ruby,Kaiser Chiefs,2
Sky Is Over,Serj Tankian,2
Sleeping Satellite,Tasmin Archer,2
Sunshine Of Your Love,Cream,2
Tell It to My Heart,Taylor Dayne,2
Wonderful Life,Black,2

At this point it’s necessary to go trough spotify playlist, sort it by artist and remove duplicates.

Is it realiable? No, there are always entries such as “Song”, “Song - Remastered 2008”, … and also comparison should be case insensitive, but it’s at least something.

This post is licensed under CC BY 4.0 by the author.