Backing up Spotify
annas-archive.li/blog/backing-up-spotify.html
They downloaded 300 Terra-byte of Spotify song data, metadata and files and share it as a torrent 😲.
3 Comments
Comments from other communities
damn, annas is fucking based. hope they stay safe..
I fucking love annas archive.
Anna does seem rather insane though
Edit: I was thinking of Alexandra Elbakyan, the creator of Sci-Hub
C'mon you gotta say why im curious
iirc her writings gave me the impression that she basically worships communism and famous soviet communist leaders in a vaguely religious way. I can go look again later and elaborate. It could just be the language barrier that gave me that impression.
Edit: I was misremembering Alexandra Elbakyan, the creator of Sci-Hub, as Anna of Anna's Archive. Oops
Down with the bourgeoisie
Eat the rich
Sodomize the land-owners
Impale all people who have more than 25 reál in their pocket
Literally murder all human beings regardless of their political beliefs
Huh, weird. Where can I find the writings?
I went to look for a link for you and found that I was misremembering. I was thinking of Alexandra Elbakyan, the creator of Sci-Hub, one of the libraries that Anna's Archive indexes. In case you're curious about her, here's an archive of her personal page on Sci-Hub
After Meta scraped all their books they have the perfect defense now. All they have to say is "we're training a music AI" and they're apparently untouchable.
Well, they have to say "we're training a music AI" while slipping several million dollars into the pockets of the right people. Rich people don't win legal battles by actually proving what they did isn't illegal, they do it by discreetly paying people to say they did.
Anyone knows if spotify metadata have BPM and keys?
Mashup artist detected
Would love lmao. Just bought a second hand VDJ and I'm starting to experiment with mixxx, and I don't know is the style I like (latincore and adjacents) or if the BPM detected of mixxx isn't that good.
Good on you for starting that up! I wish you much success in your mixing and/or producing journey!
Both. Per the SQL schema printed in the article, table track_audio_features has both fields tempo and key along with many other technicals. Worth checking out, it's near the bottom of the page.
Yes, and it hasn’t been easy to dig up until recently. There were a few ways to search the “hidden” metadata fields that Spotify uses internally. But it definitely hasn’t been easy or straightforward.
Those hidden fields are how Spotify recommends similar artists. You have a few bands on repeat with specific instruments, chord progressions, and singer vocal range? Gee, maybe you’ll enjoy other bands that are similar to that…
Does anyone see the torrent links?
It says,
The data will be released in different stages on our Torrents page:
[X] Metadata (Dec 2025)
[ ] Music files (releasing in order of popularity)
[ ] Additional file metadata (torrent paths and checksums)
[ ] Album art
[ ] .zstdpatch files (to reconstruct original files before we added embedded metadata)
Yeah, I saw that and assumed they'd be torrents of the metadata.
I didn't find it in the blog post but it's listed in the website
https://annas-archive.org/torrents/spotify
Are these the actual music files you can use to play as well?
Would be amazing if it was. I would love to just have Spotify's music on my nas
If your nas has 300tb spare, you could.
How much could it cost $10,000
A RAID6 of 24 * 20TB drives could contain that with both parity and hotswap, with room to spare. Let's say $400 per refurb drive, $2500 rackmount SAS enclosure, $2000 SAS RAID card, $14,100 total. Assuming you already have the server and power and SAS cables.
You could budget this way down. I run 10+2 12TB with Unraid. No reason for a raid card if it's for archive and personal use.
Oh totally, could do SATA instead of SAS too. I used to build out servers and render farms for motion graphics studios that needed the ability for multiple people to be doing high-bandwidth operations on the same network drive, and the above was just kind of the default offering.
100% this. People who store easily replaceable media on RAID are just throwing away money (unless you have a need for faster read/write). If it's your family photos, copy of your in progress thesis, or other irreplaceable piece of info/content go for it.
I have like 40tb Unraid NAS and I get asked pretty much every time I talk to someone about it how I do backups. Easy, I backup my *arr stack databases and in case of a failure I restore them and let it pull down everything over time. Which I have done in the past when I wanted to upgrade quality, easier for me to scrub it all and start over than make upgrade profiles and such.
Or that's what I would have done, now I mostly use DebridService du jour and Stremio :-)
Considering the Price per TB is 10-11 US dollars, it's gonna cost $3500 max
10US dollar per TB?? 🤣🤣 More like 30/35€ per TB for a good graded HDD!
Let's not talk about SSDs or nvme which are more in the 120€/TB.
I always hear people say that storage comes cheap nowaday... I'm still looking for that cheap HDD on amazon... It has been 10 years 🤣🤣
$10/TB is a bit low, but not far off. Serverpartdeals has refurbed enterprise/NAS drives at about $15/TB right now, and thats with the AI pressure driving up prices. I recall seeing 18TB drives around $12/TB a few months back.
The above is well loved vendor in the IT space. Much better place to buy from than Amazon, as they actually guarantee their inventory is legitimate.
It seems this post was locked down for some reason so I couldn't reply sooner.
Here you go -
$10.61/TB - HGST Ultrastar He8 HUH728080AL5200 | 0F23268 | 8TB 7.2K RPM 128MB Cache SAS 12Gb/s 512E 3.5"
$10.83/TB - WD-40 Ultrastar HE12 12TB SAS 512E
$12.65/TB - Seagate Expansion Desktop 26TB, Externe Harde Schijf, 3.5"
$13.63/TB - Seagate Expansion 22TB External Hard Drive HDD - USB 3.0
If it weren't for AI creating a shortage of drives, these prices would've been even cheaper
Is it that cheap now?? I would kill for 10tb
Here you go mate. They dont have 10TB in stock, but the do have 20TB refurbs at around $15/TB.
I'd wager 70% of what's on Spotify is not worth preserving since its AI slop.
I'm not convinced AI slop can compete with the back log of organic slop personally.
But yeah a fuckton is probably slop either way
AI slop is accelerating exponentially for the foreseeable future. It won't take long for world data storage to be a limiting factor.
Interestingly enough, with the data they provide, figuring out how much of it is AI slop wouldn't be that hard I think
They've released torrents of the metadata, and they plan to release the music files, but they haven't yet. They intend to start by offering the downloads as bulk torrents, but they're open to considering implementing the ability to download single songs in the future.
So in short, yes, but you can't download them yet
Not yet, but that’s the end goal. The tricky part is that they’re only offering bulk downloads for now, which means downloading a single artist or album would be difficult/impossible. You’d need to download the entire compressed file of like 300GB of music, then extract the specific songs/artists/albums you wanted. The goal for now is preservation, meaning they want to make the bulk download as easy as possible, to make sure people can preserve it. Once they’ve got that in a pretty good spot, they may look into allowing more granular downloads.

Dns blocked in germany. fun.
Simply choose a private DNS server like mullvad,quad,etc. and it should work..
"Honey, all I need is $10,000 for a server and we'll never pay for Spotify again"
Almost makes me want to get into torrenting again. But dab.yeet.su, squid.wtf, and doubledouble.top usually have me covered with ddl
Now make it streamable and make a stremio-like music client 🤞
I cannot fathom the legal fees that will be incurred if they release 99.6% of Spotify to the public for free. Holy fucking shit.
I wonder why Spotify and not YouTube Music, Tidal or Apple Music all of which are higher quality.
They said this in the linked blog post:
A while ago, we discovered a way to scrape Spotify at scale.
Seems like reason enough to choose to scrape Spotify to me.
Spotify has lossless now. Although if you're listening on anything with Bluetooth then you probably won't notice anyway.
Spotify claims to offer lossless quality on much of their catalog; is this claim false or is there something more I'm missing here?
That's for premium accounts, which they probably aren't scraping with. And I think it's still not FLAC quality
Would be interesting if someone checked what % of that archive is slopified.
Honestly, this is the best time to snapshot it, because even with the slop already there, the exponential increase that's about to happen will absolutely dwarf what's there now.
Ok, how do we download this?
Step 1: Buy £6,000 worth of identical hard drives and a motherboard with 16 SATA ports. Or £12,000 worth and a RAID 1 server rig. Or £24,000 and RAID 6
6k for 10TB?
A Raspberry pi with a sata hat and 16TB hard disk could download and serve this for well under 500 quid.
An off the shelf 4 bay NAS with 4x 6TB drives in raid 5 would give you 15TB or so formatted capacity with redundancy. That would easily be under 2k... Jeez grab 2 and sync contents to a second location.. even a third location.. Still under 6k.
160kbps ogg files is it really worth backing up
Why wouldn't it be?
Archives should be lossless unless there's literally no other source available.
Archiving low quality sources like this just degrades the overall integrity of the whole
Normally you're mostly right, but in this case I have to agree that lossy existence is better than lossless absence. 300TB puts it at the upper limit of pro-sumer capacity, but it's still doable from a personal archive perspective. If you went FLAC lossless, though, you're looking at 3-6PB. That quantity is almost completely unattainable by hobbyists, and presents challenges even for enterprise entities. This archive is the "photo of the original document" for the collection. It's not optimal, and there's a lot of room for improvement, but the alternative is to just not do it at all
They actually have a section where you can generate a magnet link based on how many teribytes of data you have available. here
I dont thinm we need people to be able to download the entire thing. There should be a viable way to partially download the torrent and archive based on your storage availability.
I'd argue that no one is gonna be archiving 300tb either though and will likely be picking and choosing which files to download from the torrents.
What I don't know is if this is how Spotify is storing this music on their end or if they have some other lossless source they pull from. I know Deezer has flacs available for most stuff and mp3 320 (I think) for what isn't lossless.
Not many hobbyists are going to dedicate that much space to bad quality audio, or even have that much space to begin with.
Eh - maybe - there are definitely hoarders with the ability to absorb 300TB. They're not common, but they do exist. There are probably close to zero hoarders that could spare 3PB, especially for a collection that they won't listen to a majority of. It's like saying that it isn't worth digitizing wax tube recordings because the source is so low quality. If preservation is the goal, anything is better than nothing.
Spotify has a lossless quality option in their apps.
The link says "The quality is the original OGG Vorbis at 160kbit/s", so I guess that's what Spotify uses for the "high" desktop/mobile setting described at https://support.spotify.com/us/article/audio-quality/
How many millions would they need to spend on downloading the lossless equivalent of 300 TB of ogg files lol.
160kbps ogg is not exactly low quality. Most people can't tell the difference between 160kbps ogg and lossless, nor do they have the equipment when listen to. And with huge amount of data like this, it might be impossible or too expensive or too time consuming for them to archive in lossless quality.
I agree, archiving audio files should be lossless when possible, but that is not a requirement. 160kbps ogg is "good enough".
I consider anything under 256kbps to be not worth getting unless it's the only ever rip of something that doesn't exist anymore. If its lossy it should be 320kbps mp3 ideally.
I also try to stay away from VBR rips
You just say it should not, but why? As said 160kbp ogg is for most people not distinguishable from uncompressed. I think it is worth archiving this, especially if it is in mass like this. Why do you stay away from VBR?
Archival should be as close to source quality as possible. VBR just adds more noise to the audio whether you can hear it or not. That means copying it to different mediums will eventually start to notice the quality reduction over time.
It's not legal what they are doing, isn't it? Don't they have basically the whole worlds police force after them already? Where are they even hosting?
They just need to say they are using the archive for AI training data. Then it's legal.
This is the view from Belgium indeed.
Edit: alternative link available: https://nl.annas-archive.org/blog/backing-up-spotify.html

Deleted by moderator
VPN usually saves your IP too. And they have to give your IP address if requested by government. This might not be true for all countries and all VPN, but be mindful about this. I wouldn't do anything illegal thinking its safe with a VPN.
It is illegal to distribute these files. And they accept money, so it makes it even worse:
Donate to Anna’s Archive. Any amount helps!
Isn’t that just stealing royalties from the musicians?
ha! no, what royalties? play the song one million times and they get a dollar? spotify can get bent. long live bandcamp.
So let me get this straight: Anna’s Archive taking 100% from artists = good, Bandcamp taking ~20% = good, but Spotify taking ~30% = bad? That suggests the issue isn’t artist pay, it’s just which platform you’ve decided to hate.
And Anna’s Archive’s framing around ‘free access to culture’ seems to mean free for scraping and ideological cover, but for-profit when it’s packaged and sold to AI companies. That’s not anarchy - it’s anarcho-capitalism.
i’m not so sure about your numbers there, friend.
also bands don’t make money on streaming or selling records in stores anymore. they make money selling tickets and product at the shows.
source: i have run a live music venue for 35 years. watching the changes in the business model has been wild.
Not sure what the problem with the numbers is. Piracy = 0% to artists, Bandcamp = around 80% after fees and Visa, Spotify = pays rights holders around 70% of gross revenue, and artists often see 20% or less from labels. Blaming Spotify misses the real problem: labels control the payouts, not the platform.
I get that you promote your business a saviour, but how do people find the artists they want to go see without streaming or distribution platforms?
4grams
Bakkoda
driving_crooner
infinitesunrise
Avid Amoeba
notgold
DavidDoesLemmy
No_Eponym
BlueRingedOctopus
nagaram
irelephant [he/him]
James R Kirk
EccTM
Kilgore Trout
Blackmist
harc
Lyra_Lycan
krolden
thingsiplay
Sibbo
CosmicTurtle0 [he/him]
Arthur Besse
RobotToaster
JoeKrogan
Deleted by moderator
That is truly beautiful
That's huge!
Good morning