Epstein Files Jan 30, 2026 Release - Archived from Justice.gov

submitted by edited

Epstein Files Jan 30, 2026

Data hoarders on reddit have been hard at work archiving the latest Epstein Files release from the U.S. Department of Justice. Below is a compilation of their work with download links.

Please seed all torrent files to distribute and preserve this data.

Ref: https://old.reddit.com/r/DataHoarder/comments/1qrk3qk/epstein_files_datasets_9_10_11_300_gb_lets_keep/

Epstein Files Data Sets 1-8: INTERNET ARCHIVE LINK

Epstein Files Data Set 1 (2.47 GB): TORRENT MAGNET LINK
Epstein Files Data Set 2 (631.6 MB): TORRENT MAGNET LINK
Epstein Files Data Set 3 (599.4 MB): TORRENT MAGNET LINK
Epstein Files Data Set 4 (358.4 MB): TORRENT MAGNET LINK
Epstein Files Data Set 5: (61.5 MB) TORRENT MAGNET LINK
Epstein Files Data Set 6 (53.0 MB): TORRENT MAGNET LINK
Epstein Files Data Set 7 (98.2 MB): TORRENT MAGNET LINK
Epstein Files Data Set 8 (10.67 GB): TORRENT MAGNET LINK


Epstein Files Data Set 9 (Incomplete). Only contains 49 GB* of *180 GB. Multiple reports of cutoff from DOJ server at offset 48995762176.

ORIGINAL JUSTICE DEPARTMENT LINK

  • TORRENT MAGNET LINK (removed due to reports of CSAM)

/u/susadmin's More Complete Data Set 9 (96.25 GB)
De-duplicated merger of (45.63 GB + 86.74 GB) versions
- TORRENT MAGNET LINK (removed due to reports of CSAM)


Epstein Files Data Set 10 (78.64GB)

ORIGINAL JUSTICE DEPARTMENT LINK

  • TORRENT MAGNET LINK (removed due to reports of CSAM)
  • INTERNET ARCHIVE FOLDER (removed due to reports of CSAM)
  • INTERNET ARCHIVE DIRECT LINK (removed due to reports of CSAM)

Epstein Files Data Set 11 (25.55GB)

ORIGINAL JUSTICE DEPARTMENT LINK

SHA1: 574950c0f86765e897268834ac6ef38b370cad2a


Epstein Files Data Set 12 (114.1 MB)

ORIGINAL JUSTICE DEPARTMENT LINK

SHA1*: 20f804ab55687c957fd249cd0d417d5fe7438281
*
MD5*: b1206186332bb1af021e86d68468f9fe
*
SHA256
: b5314b7efca98e25d8b35e4b7fac3ebb3ca2e6cfd0937aa2300ca8b71543bbe2


This list will be edited as more data becomes available, particularly with regard to Data Set 9 (EDIT: NOT ANYMORE)


EDIT [2026-02-02]: After being made aware of potential CSAM in the original Data Set 9 releases and seeing confirmation in the New York Times, I will no longer support any effort to maintain links to archives of it. There is suspicion of CSAM in Data Set 10 as well. I am removing links to both archives.

Some in this thread may be upset by this action. It is right to be distrustful of a government that has not shown signs of integrity. However, I do trust journalists who hold the government accountable.

I am abandoning this project and removing any links to content that commenters here and on reddit have suggested may contain CSAM.

Ref 1: https://www.nytimes.com/2026/02/01/us/nude-photos-epstein-files.html
Ref 2: https://www.404media.co/doj-released-unredacted-nude-images-in-epstein-files

245
291

Log in to comment

245 Comments

I'm in the process of downloading both dataset 9 torrents (45.63 GB + 86.74 GB). I will then compare the filenames in both versions (the 45.63GB version has 201,358 files alone), note any duplicates, and merge all unique files into one folder. I'll upload that as a torrent once it's done so we can get closer to a complete dataset 9 as one file.

  • Edit 31Jan2026 816pm EST - Making progress. I finished downloading both dataset 9s (45.6 GB and the 86.74 GB). The 45.6GB set is 200,000 files and the 86GB set is 500,000 files. I have a .csv of the filenames and sizes of all files in the 45.6GB version. I'm creating the same .csv for the 86GB version now.

  • Edit 31Jan2026 845pm EST -

    • dataset 9 (45.63 GB) = 201357 files
    • dataset 9 (86.74 GB) = 531257 files

    I did an exact filename combined with an exact file size comparison between the two dataset9 versions. I also did an exact filename combined with a fuzzy file size comparison (tolerance of +/- 1KB) between the two dataset9 versions. There were:

    • 201330 exact matches
    • 201330 fuzzy matches (+/- 1KB)

    Meaning there are 201330 duplicate files between the two dataset9 versions.

    These matches were written to a duplicates file. Then, from each dataset9 version, all files/sizes matching the file and size listed in the duplicates file will be moved to a subfolder. Then I'll merge both parent folders into one enormous folder containing all unique files and a folder of duplicates. Finally, compress it, make a torrent, and upload it.


  • Edit 31Jan2026 945pm EST -

    Still moving duplicates into subfolders.


  • Edit 31Jan2026 1027pm EST -

    Going off of xodoh74984's comment (https://lemmy.world/post/42440468/21884588), I'm increasing the rigor of my determination of whether the files that share a filename and size between both version of dataset9 are in fact duplicates. This will be identical to rsync --checksum to verify bit-for-bit that the files are the same by calculating their MD5 hash. This will take a while but is the best way.


  • Edit 01Feb2026 1227am EST -

    Checksum comparison complete. 73 files found that have the same file name and size but different content. Total number of duplicate files = 201257. Merging both dataset versions now, while keeping one subfolder of the duplicates, so nothing is deleted.


  • Edit 01Feb2026 1258am EST -

    Creating the .tar.zst file now. 531285 total files, which includes all unique files between dataset9 (45.6GB) and dataset9 (86.7GB), as well as a subfolder containing the files that were found in both dataset9 versions.


  • Edit 01Feb2026 215am EST -

    I was using wayyyy to high a compression value for no reason (ztsd --ultra --22). Restarted the .tar.zst file creation (with ztsd -12) and it's going 100x faster now. Should be finished within the hour


  • Edit 01Feb2026 311am EST -

    .tar.zst file creation is taking very long. I'm going to let it run overnight - will check back in a few hours. I'm tired boss.


  • EDIT 01Feb2026 831am EST -

COMPLETE!

And then I doxxed myself in the torrent. One moment please while I fix that....


Final magnet link is HERE. GO GO GOOOOOO

I'm seeding @ 55 MB/s. I'm also trying to get into the new r/EpsteinPublicDatasets subreddit to share the torrent there.

Deleted by author

 reply
19

Deleted by moderator

 reply
3

looking forward to your torrent, will seed.

I have several incomplete sets of files from dataset 9 that I downloaded with a scraped set of urls - should I try to get them to you to compare as well?

Yes! I'm not sure the best way to do that - upload them to MEGA and message me a download link?

maybe archive.org? that way they can be torrented if others want to attempt their own merging techniques? either way it will be a long upload, my speed is not especially good. I'm still churning through one set of urls that is 1.2M lines, most are failing but I have 65k from that batch so far.

archive.org is a great idea. Post the link here when you can!

I'll get the first set (42k files in 31G) uploading as soon as I get it zipped up. it's the one least likely to have any new files in it since I started at the beginning like others but it's worth a shot

edit 01FEB2026 1208AM EST - 6.4/30gb uploaded to archive.org

edit 01FEB2026 0430AM EST - 13/30gb uploaded to archive.org; scrape using a different url set going backwards is currently at 75.4k files

edit 01FEB2026 1233PM EST - had an internet outage overnight and lost all progress on the archive.org upload, currently back to 11/30gb. the scrape using a previous url set seems to be getting very few new files now, sitting at 77.9k at the moment

I'm downloading 8-11 now, I'm seeding 1-7+12 now. I've tried checking up on reddit, but every other time i check in the post is nuked or something. My home server never goes down and I'm outside USA. I'm working on the 100GB+ #9 right now and I'll seed whatever you can get up here too.

Thank you so much for keeping us updated!!

Have a good night. I'll be waiting to download it, seed it, make hardcopies and redistribute it.

Please check back in with us

When merging versions of Data Set 9, is there any risk of loss with simply using rsync --checksum to dump all files into one directory?

rsync --checksum is better than my file name + file size comparison, since you are calculating the hash of each file and comparing it to the hash all other files. For example, if there is a file called data1.pdf with size 1024 bytes in dataset9-v1, and another file called data1.pdf with size 1024 bytes in dataset9-v2, but their content is different, my method will still detect them as identical files.

I'm going to modify my script to calculate and compare the hashes of all files that I previously determined to be duplicates. If the hashes of the duplicates in dataset9 (45GB torrent) match the hashes of the duplicates in dataset9 (86GB torrent), then they are in fact duplicates between the two datasets.

Amazing, thank you. That was my thought, check hashes while merging the files to keep any copies that might have been modified by DOJ and discard duplicates even if the duplicates have different metadata, e.g. timestamps.

Thank you so much for re-archiving it in a better format

here is the file contents w/ SHA-256 hashes: deleted this

the original post on reddit was deleted after sharing this https://old.reddit.com/r/DataHoarder/comments/1qsfv3j/epstein_9_10_11_12_reddit_keeps_nuking_thread_we/o2vqgoc/

anyone have the original 186gb magnet link from that thread? someone said reddit keeps nuking it because it implicates reddit admins like spez

This is it, encoded in base 64 format, according to the comment:

bWFnbmV0Oj94dD11cm46YnRpaDo3YWM4Zjc3MTY3OGQxOWM3NWEyNmVhNmMxNGU3ZDRjMDAzZmJmOWI2JmRuPWRhdGFzZXQ5LW1vcmUtY29tcGxldGUudGFyLnpzdCZ4bD05NjE0ODcyNDgzNyZ0cj11ZHAlM0ElMkYlMkZ0cmFja2VyLm9wZW50cmFja3Iub3JnJTNBMTMzNyUyRmFubm91bmNlJnRyPXVkcCUzQSUyRiUyRm9wZW4uZGVtb25paS5jb20lM0ExMzM3JTJGYW5ub3VuY2UmdHI9dWRwJTNBJTJGJTJGZXhvZHVzLmRlc3luYy5jb20lM0E2OTY5JTJGYW5ub3VuY2UmdHI9aHR0cCUzQSUyRiUyRm9wZW4udHJhY2tlci5jbCUzQTEzMzclMkZhbm5vdW5jZSZ0cj11ZHAlM0ElMkYlMkZvcGVuLnN0ZWFsdGguc2klM0E4MCUyRmFubm91bmNlJnRyPXVkcCUzQSUyRiUyRnplcjBkYXkuY2glM0ExMzM3JTJGYW5ub3VuY2UmdHI9dWRwJTNBJTJGJTJGd2Vwem9uZS5uZXQlM0E2OTY5JTJGYW5ub3VuY2UmdHI9dWRwJTNBJTJGJTJGdHJhY2tlcjEubXlwb3JuLmNsdWIlM0E5MzM3JTJGYW5ub3VuY2UmdHI9dWRwJTNBJTJGJTJGdHJhY2tlci50b3JyZW50LmV1Lm9yZyUzQTQ1MSUyRmFubm91bmNlJnRyPXVkcCUzQSUyRiUyRnRyYWNrZXIudGhlb2tzLm5ldCUzQTY5NjklMkZhbm5vdW5jZSZ0cj11ZHAlM0ElMkYlMkZ0cmFja2VyLnNydjAwLmNvbSUzQTY5NjklMkZhbm5vdW5jZSZ0cj11ZHAlM0ElMkYlMkZ0cmFja2VyLnF1LmF4JTNBNjk2OSUyRmFubm91bmNlJnRyPXVkcCUzQSUyRiUyRnRyYWNrZXIuZGxlci5vcmclM0E2OTY5JTJGYW5ub3VuY2UmdHI9dWRwJTNBJTJGJTJGdHJhY2tlci5iaXR0b3IucHclM0ExMzM3JTJGYW5ub3VuY2UmdHI9dWRwJTNBJTJGJTJGdHJhY2tlci5hbGFza2FudGYuY29tJTNBNjk2OSUyRmFubm91bmNlJnRyPXVkcCUzQSUyRiUyRnRyYWNrZXItdWRwLmdiaXR0LmluZm8lM0E4MCUyRmFubm91bmNlJnRyPXVkcCUzQSUyRiUyRnJ1bi5wdWJsaWN0cmFja2VyLnh5eiUzQTY5NjklMkZhbm5vdW5jZSZ0cj11ZHAlM0ElMkYlMkZvcGVudHJhY2tlci5pbyUzQTY5NjklMkZhbm5vdW5jZSZ0cj11ZHAlM0ElMkYlMkZvcGVuLmRzdHVkLmlvJTNBNjk2OSUyRmFubm91bmNlJnRyPWh0dHBzJTNBJTJGJTJGdHJhY2tlci56aHVxaXkuY29tJTNBNDQzJTJGYW5ub3VuY2UmdHI9dWRwJTNBJTJGJTJGdHJhY2tlci5maWxlbWFpbC5jb20lM0E2OTY5JTJGYW5ub3VuY2UmdHI9dWRwJTNBJTJGJTJGdC5vdmVyZmxvdy5iaXolM0E2OTY5JTJGYW5ub3VuY2UmdHI9dWRwJTNBJTJGJTJGbWFydGluLWdlYmhhcmR0LmV1JTNBMjUlMkZhbm5vdW5jZSZ0cj11ZHAlM0ElMkYlMkZldmFuLmltJTNBNjk2OSUyRmFubm91bmNlJnRyPXVkcCUzQSUyRiUyRmQ0MDk2OS5hY29kLnJlZ3J1Y29sby5ydSUzQTY5NjklMkZhbm5vdW5jZSZ0cj11ZHAlM0ElMkYlMkY2YWhkZHV0YjF1Y2MzY3AucnUlM0E2OTY5JTJGYW5ub3VuY2U

Be prepared to wait a while... idk why this person chose xz, it is so slow. I've been just trying to get the tarball out for an hour.

Thank you for the final link, downloading now. Will seed forever if needed.

Deleted by author

 reply
29

this method is not working for me anymore

Deleted by author

 reply
7

I messaged you on the other site; I'm currently getting a Could not determine Content-Length (got None) error

Deleted by author

 reply
2

I also was getting the same error. Going to the link successfully downloads.

Updating the cookies fixed the issue.

Can also confirm, receiving more chunks again.

EDIT: Someone should play around with the retry and backoff settings to see if a certain configuration can avoid being blocked for a longer period of time. IP rotating is too much trouble.

Deleted by author

 reply
3

Nor I. I got a single chunk back before never getting anything again.

I’m using a partial download I already had and not the 48gb version but I will be gathering as many chunks as I can as well. Thanks for making this

by
[deleted]

Deleted by moderator

 reply
2

Is anyone able to get this working again? It seemed to stop. I have updated cookies. If I remove the chunks it seems to start connecting again but when I put them back it runs for a few mins and then kicks the bucket.

Funny how a rag-tag ad-hoc group can seed data so much better than the DOJ. Beautiful to see in action.

The doj could do better, they are ordered not to.

Heads up that the DOJ site is a tar pit, it's going to return 50 files on the page regardless of the page number your on seems like somewhere between 2k-5k pages it just wraps around right now.

Testing page 2000... ✓ 50 new files (out of 50)
Testing page 5000... ○ 0 new files - all duplicates
Testing page 10000... ○ 0 new files - all duplicates
Testing page 20000... ○ 0 new files - all duplicates
Testing page 50000... ○ 0 new files - all duplicates
Testing page 100000... ○ 0 new files - all duplicates

I saw this too; yesterday I tried manually accessing the page to explore just how many there are. Seems like some of the pages are duplicates (I was simply comparing the last listed file name and content between some of the first 10 pages, and even had 1-2 duplications.)

Far as maximum page number goes, if you use the query parameter ?page=200000000 it will still resolve a list of files. — actually crazy.

https://www.justice.gov/epstein/doj-disclosures/data-set-9-files?page=200000000

The last page I got a non-duplicate URL from was 10853 which curiously only had 36 URLs on page. When I browsed directly to page 10853 36 URLs were displayed but then moving back and forth in the page count the tar pit logic must have re-looped there and it went back to 50 Displayed. I ended with 224751 URLs

I was quick to download dataset 12 after it was discovered to exist, and apparently my dataset 12 contains some files that were later removed. Uploaded to IA in case it contains anything that later archivists missed. https://archive.org/details/data-set-12_202602

Specifically doc number 2731361 and others around it were at some point later removed from DoJ, but are still within this early-download DS12. Maybe more, unsure

The files in this (early) dataset 12 are identical to the dataset 12 here, which is the link in the OP. The MD5 hashes are identical.

I shared a .csv file of the calculated MD5 hashes here

I've got that one too, maybe we should compare dataset 12 versions too

Deleted by author

 reply
15

Deleted by author

 reply
6

What version of dataset 9 is this?

Deleted by author

 reply
2

Ah I see now! Sorry, I'm new to this platform and I need to get used to the structure of it.

Thanks

Deleted by author

 reply
3

They're probably too dumb to understand ">" means "greater then" or in your sentence: People are worth more then property / people over property.

They probably read it like "People are property" which would obviously be "=" or "->" instead of ">".

reposting a full magnet list (besides 9) of all the datasets that was on reddit with healthy seeds:

Dataset 1 (2.47GB)

magnet:?xt=urn:btih:4e2fd3707919bebc3177e85498d67cb7474bfd96&dn=DataSet+1&xl=2658494752&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.moeking.me%3A6969%2Fannounce

Dataset 2 (631.6MB)

magnet:?xt=urn:btih:d3ec6b3ea50ddbcf8b6f404f419adc584964418a&dn=DataSet+2&xl=662334369&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.moeking.me%3A6969%2Fannounce

Dataset 3 (599.4MB)

magnet:?xt=urn:btih:27704fe736090510aa9f314f5854691d905d1ff3&dn=DataSet+3&xl=628519331&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.moeking.me%3A6969%2Fannounce

Dataset 4 (358.4MB)

magnet:?xt=urn:btih:4be48044be0e10f719d0de341b7a47ea3e8c3c1a&dn=DataSet+4&xl=375905556&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.moeking.me%3A6969%2Fannounce

Dataset 5 (61.5MB)

magnet:?xt=urn:btih:1deb0669aca054c313493d5f3bf48eed89907470&dn=DataSet+5&xl=64579973&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.moeking.me%3A6969%2Fannounce

Dataset 6 (53.0MB)

magnet:?xt=urn:btih:05e7b8aefd91cefcbe28a8788d3ad4a0db47d5e2&dn=DataSet+6&xl=55600717&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.moeking.me%3A6969%2Fannounce

Dataset 7 (98.2MB)

magnet:?xt=urn:btih:bcd8ec2e697b446661921a729b8c92b689df0360&dn=DataSet+7&xl=103060624&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.moeking.me%3A6969%2Fannounce

Dataset 8 (10.67GB)

magnet:?xt=urn:btih:c3a522d6810ee717a2c7e2ef705163e297d34b72&dn=DataSet%208&xl=11465535175&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.moeking.me%3A6969%2Fannounce

Dataset 10 (78.64GB)

magnet:?xt=urn:btih:d509cc4ca1a415a9ba3b6cb920f67c44aed7fe1f&dn=DataSet%2010.zip&xl=84439381640

Dataset 11 (25.55GB)

magnet:?xt=urn:btih:59975667f8bdd5baf9945b0e2db8a57d52d32957&xt=urn:btmh:12200ab9e7614c13695fe17c71baedec717b6294a34dfa243a614602b87ec06453ad&dn=DataSet%2011.zip&xl=27441913130&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Fopen.demonii.com%3A1337%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Fexodus.desync.com%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=http%3A%2F%2Fopen.tracker.cl%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.srv00.com%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.filemail.com%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.dler.org%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker-udp.gbitt.info%3A80%2Fannounce&tr=udp%3A%2F%2Frun.publictracker.xyz%3A6969%2Fannounce&tr=udp%3A%2F%2Fopen.dstud.io%3A6969%2Fannounce&tr=udp%3A%2F%2Fleet-tracker.moe%3A1337%2Fannounce&tr=https%3A%2F%2Ftracker.zhuqiy.com%3A443%2Fannounce&tr=https%3A%2F%2Ftracker.pmman.tech%3A443%2Fannounce&tr=https%3A%2F%2Ftracker.moeblog.cn%3A443%2Fannounce&tr=https%3A%2F%2Ftracker.alaskantf.com%3A443%2Fannounce&tr=https%3A%2F%2Fshahidrazi.online%3A443%2Fannounce&tr=http%3A%2F%2Fwww.torrentsnipe.info%3A2701%2Fannounce&tr=http%3A%2F%2Fwww.genesis-sp.org%3A2710%2Fannounce

Dataset 12 (114.0MB)

magnet:?xt=urn:btih:EE6D2CE5B222B028173E4DEDC6F74F08AFBBB7A3&dn=DataSet%2012.zip&tr=udp%3a%2f%2ftracker.openbittorrent.com%3a80%2fannounce&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce

Thank you for this!

I've added all magnet links for sets 1-8 to the original post. Magnet links for 9-11 match OP. Magnet link for 12 is different, but we've identified that there are at least two versions. DOJ removed files before the second version was downloaded. OP contains the early version of data set 12.

Does no-one have a CSAM-removed torrent for Data-sets 9 or 10 yet? ... and why is no-one seeding data set 11?

Deleted by author

 reply
1

Thx for posting, seed if you can ppl.

Does anyone have an index of filenames/links from the DOJ website scraped?

Edit, specifically for DataSet 9.

I'm waiting for /u/Kindly_District9380 's version but I've been slowly working backwards on this in the meantime https://archive.org/details/dataset9_url_list

Deleted by author

 reply
5

No worries, thank you!

edit: I'll start on that url list (randomized) tomorrow, my run from the previously generated url list is still going (currently 75.6k files)

by
[deleted]

Deleted by moderator

 reply
1

yeah I'm not the one who generated the url list but I've also been getting a lot without a downloadable document. I'm going to start on one of the url lists posted here soon

by
[deleted]

Deleted by moderator

 reply
2

As far as CSAM and the “don’t go looking for data set 9”…

Look I’ll be straight up.

If I find any CSAM it gets deleted…

But if you believe for 1 second that DOJ didn’t remove delete relevant files because they are protecting people then I have a time share to sell you at a cheap price on a beautiful scenic swamp in Florida…

It's literally left-in on purpose to try to have something over people that download and/or seed the torrents. We need a file-list to know what not to dl/seed, or a new torrent for that set.

I am seeding sets 1-8, 10-12, and the larger set 9. Seedbox is outside the US and has a very fast connection.

I will keep an eye on this post for other sets. 👍

Is there any grunt work that needs to be done? I would like to help out but I'm not sure how to make sure my work isn't redundant. I mean like looking through individual files etc. Is there an organized effort to comb through everything?

by
[deleted]

Deleted by moderator

 reply
8

I don't have a matrix account currently, but would be willing to get one.

by
[deleted]

Deleted by moderator

 reply
3

Do you have a recommendation on provider choice?

by
[deleted]

Deleted by moderator

 reply
4

Ok everyone, I have done a complete indexing of the first 13,000 pages of the DOJ Data Set 9.

KEY FINDING: 3 files are listed but INACCESSIBLE

These appear in DOJ pagination but return error pages - potential evidence of removal:

EFTA00326497

EFTA00326501

EFTA00534391

You can try them yourself (they all fail):

https://www.justice.gov/epstein/files/DataSet%209/EFTA00326497.pdf

The 86GB torrent is 7x more complete than DOJ website

DOJ website exposes: 77,766 files

Torrent contains: 531,256 files

Page Range Min EFTA Max EFTA New Files


0-499 EFTA00039025 EFTA00267311 21,842

500-999 EFTA00267314 EFTA00337032 18,983

1000-1499 EFTA00067524 EFTA00380774 14,396

1500-1999 EFTA00092963 EFTA00413050 2,709

2000-2499 EFTA00083599 EFTA00426736 4,432

2500-2999 EFTA00218527 EFTA00423620 4,515

3000-3499 EFTA00203975 EFTA00539216 2,692

3500-3999 EFTA00137295 EFTA00313715 329

4000-4499 EFTA00078217 EFTA00338754 706

4500-4999 EFTA00338134 EFTA00384534 2,825

5000-5499 EFTA00377742 EFTA00415182 1,353

5500-5999 EFTA00416356 EFTA00432673 1,214

6000-6499 EFTA00213187 EFTA00270156 501

6500-6999 EFTA00068280 EFTA00281003 554

7000-7499 EFTA00154989 EFTA00425720 106

7500-7999 (no new files - all wraps/redundant)

8000-8499 (no new files - all wraps/redundant)

8500-8999 EFTA00168409 EFTA00169291 10

9000-9499 EFTA00154873 EFTA00154974 35

9500-9999 EFTA00139661 EFTA00377759 324

10000-10499 EFTA00140897 EFTA01262781 240

10500-12999 (no new files - all wraps/redundant)

TOTAL UNIQUE FILES: 77,766

Pagination limit discovered: page 184,467,440,737,095,516 (2^64/100)

I searched random pages between 13k and this limit - NO new documents found. The pagination is an infinite loop. All work at: https://github.com/degenai/Dataset9

DOJ Epstein Files: I found what's around those 3 missing files (Part 2)

Follow-up to my Dataset 9 indexing post. I pulled the adjacent files from my local copy of the torrent. What I found is... notable.


TLDR

The 3 missing files aren't random corruption. They all cluster around one event: Epstein's girlfriend Karyna Shuliak leaving St. Thomas (the island) in April 2016. And one of the gaps sits directly next to an email where Epstein recommends her a novel about a sympathetic pedophile—two days before the book was publicly released.


The Big Finding: Duplicate Processing Batches

Two of the missing files (326497 and 534391) are the same document processed twice*—once with redactions, once without—*208,000 files apart in the index.

Redacted Batch Unredacted Batch Content
326494-326496 534388-534390 AmEx travel booking, staff emails
326497 - MISSING 534391 - MISSING ???
326498-326500 Email chain continues
326501 - MISSING ???
326502-326506 Reply + Invoice
534392 Epstein personal email

Random file corruption hitting the same logical document in two separate processing runs, 208,000 positions apart? That's not how corruption works. That's how removal works.


What's Actually In These Files

I pulled everything around the gaps. It's all one email chain from April 10, 2016:

The event: Karyna Shuliak (Epstein's girlfriend) booked on Delta flight from Charlotte Amalie, St. Thomas → JFK on April 13, 2016.

St. Thomas is where you fly in/out to reach Little St. James. She was leaving the island.

The chain:
- 11:31 AM — AmEx Centurion (black card) sends confirmation to lesley.jee@gmail.com*
- 11:33 AM — Lesley Groff (Epstein's executive assistant) forwards to Shuliak, CC's staff
- 11:35 AM — Shuliak replies "Thanks so much"
- 3:52 PM — *
Epstein personally emails Shuliak

- Next day — AmEx sends invoice

The unredacted batch (534xxx) reveals the email addresses that are blacked out in the redacted batch (326xxx):
- Lesley Groff: lesley.jee@gmail.com
- Ann Rodriquez: annrodriquez@yahoo.com
- Bella Klein: bklein575@gmail.com
- Karyna Shuliak: karynashuliak@icloud.com


The Epstein Email (EFTA00534392)

The document immediately after missing file 534391:

From: "jeffrey E." <jeevacation@gmail.com>
To: Karyna Shuliak
Date: Sun, 10 Apr 2016 19:52:13 +0000

order http://softskull.com/dd-product/undone/

He's telling her to buy a book. The same day she's being booked to leave his island.


The Book

"Undone" by John Colapinto (Soft Skull Press)

On-sale date: April 12, 2016
Epstein's email: April 10, 2016

He recommended it two days before public release.

Publisher's description:

"Dez is a former lawyer and teacher—*an ephebophile with a proclivity for teenage girls*, hiding out in a trailer park with his latest conquest, Chloe. Having been in and out of courtrooms (and therapists' offices) for a number of years, *Dez is at odds with a society that persecutes him over his desires.*"

The protagonist is a pedophile who resents society for judging him.

The author (John Colapinto) is a New Yorker staff writer, former Vanity Fair and Rolling Stone contributor. Exactly the media circles Epstein cultivated.


What's Missing

So now we know the context:

  • EFTA00326497 — Between AmEx confirmation and Groff's forward. Probably the PDF ticket attachment referenced in the emails.

  • EFTA00326501 — Between the forward chain and Shuliak's reply. Unknown.

  • EFTA00534391* — *Immediately before Epstein's personal email about the pedo book. Unknown, but its position is notable.


Open Questions

  1. How did Epstein have this book before release? Advance copy? Knows the author?

  2. What is 534391? It sits between staff logistics emails and Epstein's direct correspondence. Another Epstein email? An attachment?

  3. Are there other Shuliak travel records with similar gaps? Is April 2016 unique or part of a pattern?

  4. What else is in the corpus from jeevacation@gmail.com?


Verify It Yourself

Try the DOJ links (all return errors):
- https://www.justice.gov/epstein/files/DataSet%209/EFTA00326497.pdf
- https://www.justice.gov/epstein/files/DataSet%209/EFTA00326501.pdf
- https://www.justice.gov/epstein/files/DataSet%209/EFTA00534391.pdf

Check the torrent: Pull the EFTA numbers I listed. Confirm the gaps. Confirm the adjacencies.

Grep the corpus: Search for "QWURMO" (booking reference), "Shuliak", "jeevacation", "Colapinto"


Summary

Three files missing from 531,256. All three cluster around one girlfriend's April 2016 departure from St. Thomas. Same gaps appear in two processing batches 208,000 files apart. One gap sits adjacent to Epstein personally recommending a novel about a sympathetic pedophile, sent before the book was even publicly available.

This isn't random corruption.

Full analysis + all code: https://github.com/degenai/Dataset9


If anyone has the torrent and wants to grep for Colapinto connections or other Shuliak trips, please do. This is open source for a reason.

by
[deleted]

Deleted by moderator

 reply
2

That is new information! I wasnt even able to get that 'no images produced' page, good to know thank you. I just hit a file corruption error when I tried to dl from the DOJ. Thank you for the information. I guess this means the content is still missing in a way but at least accounted for.

by
[deleted]

Deleted by moderator

 reply
1

Just like I said… In NO way do I trust DOJ… Our only hope is if someone drops the full data set 9 somewhere.

My question is, why is the total download size so large and the range of displayed documents so little? Only 15% of the known documents are individually served on the site, and some arent seen until page 10,000

It's an effort to obscure for sure.

Yup.. hopefully someone is able to get the full zip

That’s why you need the full zip…

by
[deleted]

Deleted by moderator

 reply
2

Oh no...I didn't know this, on one hand now i need to run another scan, but on the other it could reveal something, the torrent has 500k+ files so there is still a gap. I will run the scraper again and do a new analysis in the next day or two.

Deleted by author

 reply
6

just advising you that there is confirmed csam in dataset9-more-complete.tar.zst and probably the other partial dataset9s

This is very concerning. DOJ has stated explicitly that any CSAM was removed before releasing the files. Should I remove the magnet link to the merged Data Set 9 torrent?

I haven't looked inside any of these sets myself. My primary goal has been to get the DOJ data distributed.

Seems like a interesting excuse to use for a reason why they all need removed from public viewing…

Have you actually seen it? Or are you just going off a report?

I am downloading dataset 9 and should have the full 180gb zip done in a day. To confirm, the link on DOJ to the dataset 9 zip is now updated to be clean of CSAM or not? As much as I wish to help the cause, I do not want any of that type of material on my server unless permission has been given to host it for credible researchers only that need access to all files for their investigation, but I have no way of understanding what’s within legal rights to assist with redistributing the files to legitimate investigators and thus my plans to help create a torrent may be squashed. Please let me know.

Amazing - Once you have the 180GB Set 9 downloaded, I'll seed.

At this point, my working assumption is that the version you're downloading should be presumed to be free of CSAM, but we can't know for sure until we check it. In addition, I assume that legitimate files were also removed from the version you're downloading, but the legitimate files are preserved in the archives we already have (along with, tragically, the CSAM.)

I think that after you download the 180GB set, we should compare it to our existing files to identify files that were removed. Then, we can identify which of the removed files were CSAM, and which of the removed files were legitimate. Going to be a hell of a task....

Ok great. As for comparing files. I would likely do a hash check. That shouldn't be difficult to identify truly unique files. It'll take a few days for a decent computer to generate all the hashes but it should be pretty automated. I'll reach out once I have it completed.

Thank you! I'm not very tech savvy, so I'm very little help in this whole process. Please LMK what you find.

someone posted the list of the original links. If it helps to cross reference I can check to see if I have it.

I have it as a text file. Shoot me a DM and I can send it to you.

wondering the same thing myself. Not sure about the latest DS9 dump, but I've definitely seen some of the other leaks that included some CSAM. crazy that DOJ let that out the door. :/

From my understanding nobody knows. The DOJ said it was already removed, but the NYTimes claimed they found 40 images of CSAM. The DOJ said they immediately removed them Saturday, but a lot of files that didn't contain CSAM were also removed. I've extracted the 101GB torrent and haven't come accross any yet, but there's a ton of files in there too. People have yet been able to download the entire ZIP and are trying to scrape everything individually as far as I know.

As for the legality, I'm not a Lawyer and I don't live in the states, but It's all information that's been released to the public by the US DOJ as required by a court order, so it's a call that only you can make. With the amount of data that's already disapeared I'm personally choosing to host it regardless, and I'll seed whatever anyone else can salvage of dataset 9 too.

Good… I don’t trust the what the DOJ says if I see it from my own eyes that’s one thing, and I’ll promptly delete it. But I don’t believe anything the DOJ says.

What’s your method for getting the zip file without being cut off by the CDN?

I have various chunking techniques that I use. I adaptively modify the request size of the chunks as I've noticed at times the CDN will give large amounts then micro amounts. I haven't figured out the exact backoff rate but I have retry mechanisms in place. The CDN is very annoying but so far my methods are working, just slow.

I have tried dozens of different settings. Cookies. Ect ect. I haven’t had much luck

I was being cut off, I manage it with chunking techniques. They unfortunately took down the file so now I have no source to pull from.

Any luck?

yeah still chugging away slowly, it may take me a few days actually, it's quite slow but so far it appears to be getting it.

by
[deleted]

Deleted by moderator

 reply
1

I was, and that is why it was taking so long for me to download as I use my custom downloader which uses various techniques to chunk the download. Unfortunately it seems like they've now removed the file completely so my downloader has no source to pull from and is stopped at 36gb.

I’ve been trying all day to get chunks from that CDN…

I can also help seed. Ive got lots of TB's free.

I'm not sure if it is useful to anyone, but the partial 9 zip from the DOJ website does contain the eDiscovery index files. VOL00009.DAT and VOL00009.OPT which are conveniently at the very start of the zip file. They are text files and it's easy to parse out what files they thought were included in the massive zip file... IDK if you have one from zero hour, but I have the first few GB from the one the CDN occasionally spits out saved if anyone wants them so see what files may be missing from the "index"

Hi, OG 101GB dataset uploaded here. The DAT/OPT files are exactly what I used to fetch the files for this dataset.

I want to go through the other partial dataset 9 zips and check for deltas in the contents of the DAT/OPT files but haven’t had the time yet.

Where can I get the 101GB dataset?

While I feel hopeful that we will be able to reconstruct the archive and create some sort of baseline that can be put back out there, I also cant stop thinking about the "and then what" aspect here. We've see our elected officials do nothing with this info over and over again and I'm worried this is going to repeat itself.

I'm fully open to input on this, but I think having a group path forward is useful here. These are the things I believe we can do to move the needle.

Right Now:
1. Create a clean Data Archive for each of the known datasets (01-12). Something that is actually organized and accessible*. 2. Create a working Archive Directory containing an "itemized" reference list (SQL DB?) the full Data Archive, with each document's listed as a row with certain metadata. Imagining a Github repo that we can all contribute to as we work. -- File number -- Dir. Location -- File type (image, legal record, flight log, email, video, etc.) -- File Status (Redacted *bool*, Missing *bool, Flagged bool
3. Infill any MISSING records where possible.
4. Extract images away from .pdf format, Breakout the "Multi-File" pdfs, renaming images/docs by file number. (I made a quick script that does this reliably well.)
5. Determine which files were left as CSAM and "redact" them ourselves, removing any liability on our part.

What's Next:
Once we have the Archive* and Archive Directory. We can begin safely and confidently walking through the Directory as a group effort and fill in as many files/blanks as possible.
1. Identify and *dedact all documents with garbage redactions, (remember the copy/paste DOJ blunders from December) & Identify poorly positioned redaction bars to uncover obfuscated names
2. LABELING! If we could start adding labels to each document in the form of tags that contain individuals, emails, locations, businesses - This would make it MUCH easier for people to "connect the dots"
2. Event Timeline... This will be hard, but if we can apply a timeline ID to each document, we can put the archive in order of events
3. Create some method for visualizing the timeline, searching, or making connection with labels.

We may not be detectives, legislators, or law men, but we are sleuth nerds, and the best thing we can do is get this data in a place that can allow others to push for justice and put an end to this crap once and for all. Its lofty, I know, but enough is enough.
...Thoughts?

GFD….

My 2 cents. As a father of only daughters…

If we don’t weed out this sick behavior as a society we never will.

My thoughts are enough is enough.

Once the files are gone there is little to 0 chance they are ever public again….

You expect me to believe that a “oh shit we messed up” was accident?

It’s the perfect excuse… so no one looks at the files.

That’s my 2 cents.

I've been thinking a lot about this whole thing.
I don't want to be worried or fearful here - we have done nothing wrong! Anything we have archived was provided to us directly by them in the first place.
There are whispers all over the internet, random torrents being passed around, conspiracies, etc., but what are we actually doing other than freaking ourselves out (myself at least) and going viral with an endless stream of "OMG LOOK AT THIS FILE" videos/posts.

I vote to remove any of the 'concerning' files and backfill with blank placeholder PDFS with justification, then collect everything we have so far, create file hashes, and put out a clean + stable archive on everything we have so far. a safe indexed archive We wipe away any concerns and can proceed methodically through blood trail of documents, resulting in an obvious and accessible collection of evidence. From there we can actually start organizing to create a tool that can be used to crowd source tagging, timestamping, and parsing the data. I'm a developer and am happy to offer my skillset.

Taking a step back - Its fun to do the "digital sleuth" thing for a while, but then what? We have the files..(mostly).. Great. We all have our own lives, jobs, and families, and taking actual time to dig into this and produce a real solution that can actually make a difference is a pretty big ask. That said, this feels like a moment where we finally can make an actual difference and I think its worth committing to. If any of you are interested in helping beyond archival, please lmk.

I just downloaded matrix, but I'm new to this, so I'm not sure how that all works.
Happy to link up via discord, matrix, email, or whatever.

We definitely need a crowdsourced method for going through all the files. I am currently building a solo cytoscape tool to try out making an affiliation graph, but expanding this to be a tool for a community, with authorization to just allow whitelisted individuals work on it, that's beyond my scope and I can't volunteer to make such an important tool, but I am happy to offer my help building it. I can convert my existing tool to a prototype if anyone wants to collaborate with me on it. I am an amateur, but I will spend all the Cursor Credits on this.

So I know how to do a lot of this and bring something significant insofar as an understanding of both the gravity and volume of things here. Looking through the way everything and anything that has been released has been organized, well, it's not. This isn't how an evidence production should ever look.

There is a way to best organize this and to do so how it would be expected for the presentation of a catalog of digital evidence. I'm aware of this because I've done it for years.

But almost if not maybe even more important is that while there are monsters still hidden in these documents, whether released or still held back, there is something else to consider.

Those who are involved and know who the monsters are and can never forget them. Ever.

I took an interest in this specifically because I felt a moral obligation as someone who has been personally affected in this way just not by these specific monsters. However what I do know is the very structure that allows them to roam free, unscathed, even able to sleep at night. What failed to protect those who were harmed also failed me and when I do sleep it is the nightmare that also can never be forgotten.

This resulted in learning how to spot their fuck ups because I knew what they were and had no reason to trust that it would fix itself. With that said the insight of someone who understands this through unfortunate lived experience provides something that cannot be learned and something I hope others will never be forced to.

I have msged a few people. One responded. Just trust me when I say that if you are to work collaboratively, have someone who understands the pain you are just going to be reading.

I will help where it's needed and it's needed.

Holy shit

The entire Court Records and FOIA page is completely gone too! Fuckers!

Have a scraper running on web.archive.org pulling all previously posted Court-Records and FOIA (docs,audio,etc.) from Jan 30th

I have backups of both of these. i can make a torrent in a bit

I told you…

We need dataset 9…

Here is the download link for a text file that has all the original URL's https://wormhole.app/PpjJ3P#SFfAOKm1bnCyi-h2YroRyA
The link will only last for 24 hours.

I have never made a torrent file before so feel free to correct me if it doesn't work. Here is the magnet link for this as a torrent file so its up for more than an hour
magnet:?xt=urn:btih:694535d1e3879e899a53647769f1975276723db7&xt=urn:btmh:12207cf818f0f0110ca5e44614f2c65e016eca2fe7bc569810f9fb25e80ff608fc9b&dn=DOJ%20Epstein%20file%20urls.txt&xl=81991719&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce

What does this contain? anything new?

Its the URL's of the original dataset 9. It was posted on the original reddit post.

please post again. thank you.

its a file list but not the actual files tho.

For anyone looking into doing some OSINT work, this is an epic file EFTA00809187

It contains lists of ALL know JE emails, usernames, websites, social medias, etc from that time

nah, i didn't hear anything back

EFTA00809187
Did that guy from pastebin with the complete file DS9file ever answer you?

I’ve been working on a structured inventory of the datasets with a slightly different angle: rather than maximizing scrape coverage, I’m focusing on understanding what’s present vs. what appears to be structurally missing based on filename patterns, numeric continuity, file sizes, and anchor adjacency.

For Dataset 9 specifically, collapsing hundreds of thousands of files down into a small number of high-confidence “missing blocks” has been useful for auditing completeness once large merged sets (like yours) exist. The goal isn’t to assume missing content, but to identify ranges where the structure strongly suggests attachments or exhibits likely existed.

If anyone else here is doing similar inventory or diff work, I’d be interested in comparing methodology and sanity-checking assumptions. No requests for files (yet) Just notes on structure and verification

Keep in mind when looking at the file names the File name is the name of the first page of the document each page in the document is part of the numbering scheme.

EFTA00039025.pdf

EFTA00039026
...

...
EFTA00039152

Just tested whether numeric gaps represent missing files or page-level numbering. In at least one major Dataset 9 block, the adjacent PDF’s page count exactly matches the numeric span, indicating page bundling rather than missing documents. I'm incorporating page counts into the audit model to distinguish the two.”

Thanks so much for setting that straight.

Take a minute to look at the eDiscovery database in the zip, it lays out each page.

Deleted by moderator

 reply
4
by
[deleted]

Deleted by moderator

 reply
2
by
[deleted]

Deleted by moderator

 reply
4

Same. Every damn time.

by
[deleted]

Deleted by moderator

 reply
1
by
[deleted]

Deleted by moderator

 reply
1

F if I know I’ve been messing with it for days. I’ve tried chunking. Different scripts. Different cookies.

Check Available Pieces for the torrents. My guess is that you'll see half of them are missing and UNavailable.

someone on reddit ( u/FuckThisSite3 ) posted a more complete DataSet 9:

I assembled a tar file with all I got from dataset-9.

Magnet link: magnet:?xt=urn:btih:5b50564ee995a54009fec387c97f9465eb18ba00&dn=dataset-9_by_fuckthissite3.tar&xl=148072017920

SHA256: 5adc043bcf94304024d718e57267c1aa009d782835f6adbe6ad7fdbb763f15c5

The tar contains 254,477 files which is 148072017920 bytes (148.07GB/137.9GiB)

Seeding 1 node, 3 on the way
EDIT: 3 running, 4th one planned to be temporary but should soon be up

While it's bigger in size, this one seems to be missing a ton of files? I grabbed the earlier collections and my file count is at 531,256. I'll have to compare when I finish downloading.

Hello… I followed the breadcrumbs from Reddit. So I have the dataset 9 48GB torrent downloaded. I been trying to get the chunk script someone dropped below in python to yield results.

I did my cookie. Exported to Netscape. Yadda yadda.

I can seemingly connect if I start at chunk 00000000.

But the min I try to connect at the chunk number of the bin file from 48GB torrent It just connects once and then fails over and over.

Has anyone found a magic formula to get more from dataset 9?

Hi every one, maybe I'm a bit late to this, but I wanted to share my findings.
I parsed every page up to 40k in DS9 3 times and results matched by distribution with PeoplesElbow findings (no content after page 14k and a lot of dublications) BUT I parsed 4 times more unique urls 246_079 (still 2x short of official size).
And a strange thing is that on second pass (one day after the first one) I started receiving new urls on old pages.

Here is stat by file type:

 count  | file type 
--------+------
      1 | ts
      8 | mov
    236 | mp4
 244326 | pdf
     73 | m4a
      1 | vob
      1 | docx
      1 | doc
      9 | m4v
   1422 | avi
      1 | wmv

Nice work man! I also discovered something yesterday that I think is worth pointing out.

DUPLICATE FILES:
Within the datasets, there are often emails, doc scans, etc that are duplicate entries. (Im not talking about multi torrent stitching, but actual duplicate documents within the raw dataset.)
*These duplicates mustbe preserved. * When looking at two copies of the same duplicate file, I found that sometimes the redactions are in different places! This can be used to extract more info later down the road.

Can you make a torrent of the new files if you find any?

Finally got my hands on original DS9 OPT file and I have started downloading files from it. Don't know how long it will take.
Also made a git with stats and index files from doj website and opt from archive: https://github.com/ArzymKoteyko/JEDatasets
In short the only difference is that I got additional 1753 links to video files and a strange .docx file with size of 0 bytes [EFTA00335487.docx].

So it turns out there's a pile of videos if you rename files from pdf to mp4. There's some video torrents for anyone who wants to back them up (datasets 8-12)

Reddit post: https://www.reddit.com/r/Epstein/comments/1qx81dj/type_this_in_and_change_pdf_to_mp4
Fediverse post: https://lemmy.world/post/42756746

I can work on a script that crawls/detects which of these files are like this.
I looked at the hexadecimal data of one of the examples from the reddit thread and there might be some indicators. I wasnt able to get a video to plat, but

Just from this file I can see that they created/edited these pdfs using something called reportlab.com, edited on 12/23/2025. Theres also some text refering to Kids? "Count 1/ Kids [ 3 0 R ]" This is odd.
screenshot

File EFTS0024813.pdf/mp4?
(Dataset 8, folder 0005)

I assume you mean EFTA00024813?

If so while it doesn't show on the website it is included in dataset 8 if you just download the complete set.

It is a .xlsx file.

Its in DataSet 8\VOL00008\NATIVES\0001

This is the case for most of these files, dataset 9 might be missing a few though.

Fediverse post I linked has a similar script already made

I'm still seeding the partial Dataset 9 (45.63GB and 89.54GB) and all the other datasets. Is there a newer dataset 9 available?

Do people here have the partial dataset 9? or are you all missing the entire set?
There is a magnet link floating around for ~100GB of it, the one removed in the OP

I am trying to figure out exactly how many files dataset 9 is supposed to have in it.
Before the zip file went dark, I was able to download about 2GB of it. This was today, maybe not the original zip file from jan 30th
In the head of the zip file is an index file, VOL00009.OPT, you don't need the full download in order to read this index file.
The index file says there are 531,307 pdfs
the 100GB torrent has 531,256, it's missing 51 pdfs.
I checked the 51 file names and they no longer exist as individual files on the DOJ website either. I'm assuming these are the CSAM.

note that the 3M number of released documents != 3M pdfs. each pdf page is counted as a "document". dataset 9 contains 1,223,757 documents, and according to the index, we are missing only 51 documents, they are not multipage.
In total, I have 2,731,789 documents from datasets 1-12, short of the 3M number. the index I got also was not missing document ranges

it's curious that the zip file had an extra 80GB when only 51 documents are missing. I'm currently scraping links from the DOJ webpage to double check the filenames

thats pretty cool...

Can you send me a DM of the 51? if i come across one and it isnt some sketchy porn i'll let u know

i analyzed with AI my 36gb~ that I was able to download before they erased the zip file from the server.

Complete Volume Analysis

  Based on the OPT metadata file, here's what VOL00009 was supposed to contain:

  Full Volume Specifications

  - Total Bates-numbered pages: 1,223,757 pages
  - Total unique PDF files: 531,307 individual PDFs
  - Bates number range: EFTA00039025 to EFTA01262781
  - Subdirectory structure: IMAGES\0001\ through IMAGES\0532\ (532 folders)
  - Expected size: ~180 GB (based on your download info)

  What You Actually Got

  - PDF files received: 90,982 files
  - Subdirectories: 91 folders (0001 through ~0091)
  - Current size: 37 GB
  - Percentage received: ~17% of the files (91 out of 532 folders)

  The Math

  Expected:  531,307 PDF files / 180 GB / 532 folders
  Received:   90,982 PDF files /  37 GB /  91 folders
  Missing:   440,325 PDF files / 143 GB / 441 folders

  ★ Insight ─────────────────────────────────────
  You got approximately the first 17% of the volume before the server deleted it. The good news is that the DAT/OPT index files are complete, so you have a full manifest of what should be there. This means:
  - You know exactly which documents are missing (folders 0092-0532)

I haven't looked into downloading the partials from archive.org yet to see if I have any useful files that archive.org doesn't have yet from dataset 9.

So what's the consensus on what to do about all the fully uncensored CSAM the DOJ released on the 30th?
Much of it has been removed as of today, but that shit is still fully up on archive.org.. 🙄...Not Great...

My two cents I have nothing but daughter..all my children are just daughters…

We don’t take care of this weird sexual abuse problem now between authority, figures and other things like that. We never will…

I don’t think I could sleep at night if I didn’t do my due diligence because someday time will just move on and all of us will be too old to do something about it..

We either take care if this now or we never will as a society …

You think about it do you ever think there will be another point in the future to root out this kind of evil?

So I say release the files and let the chips fall where they fall but that’s just my two cents..

Would be one thing if this entire process felt like we could really trust justice to do the right thing…

Just look over there in the Epstein form on Reddit. They are all kinds of pictures and names of really really wealthy people that can just easily buy their way out of trouble..

Hey that makes sense to me man.

I think there will be plenty of falling chips in the coming weeks. Once the data is aggregated and truly accessible searchable.. someone is going to make some AI something that can connect the dots faster than the justice system - because my god is it slow as molasses.

I'm so tired of waiting around.

It's "great" that the DOJ removed CSAM at the same time as they were removing perfectly legitimate files that are in the public interest. That's just really smart. Puts us all in a hell of a bind.

I can't speak for others, but I'll plan to preserve the 87GB Set 9, the 90GB Set 9, and Set 10, until we can get an updated "complete" (current) Set 9 that can be presumed to be free of CSAM. After that, we can try to identify the legitimate files that are missing from the "complete" Set 9, and preserve those while purging the CSAM.

This seems like a valid plan - although I'm not that confident in the 'purge'.
It might be good to redact those images ourselves and then nobody is pressed to store them. Better to have a confidently safe dataset that can be passed around safely.

Also, It looks like they went back and repaired the shitty text redactions on docs that were released late 2025 from what I can tell.
I ran a script that auto detects and removes "fake" redactions and its not getting any hits anymore. even on files that it flagged in the past. They are definitely trying to cover their tracts* by the day*

Where can I get magnet links or torrents for this 87GB and 90GB sets?

90GB

The 90 GB is a de-duplicated merge of the 87 GB and 48 GB incomplete downloads.

Here's the magnet link for the 90 GB file.

And have you seen this or are you speculating?

Without a timestamp on the photo its impossible to be 100% but it was obvious enough for me to ask the question. :/
It seems like it was a mistake on their part because everything else has heavily redacted nudity. You can also see references in the internal memo docs preceding the content.

There’s a lawsuit to try to have a judge give an injunction to the file release. There isn’t a lot of time left…

Once those files go away, do you honestly think anybody who will ever get to see them again?

Someone mentioned a matrix group. Can they DM and invite me. I want to help. Thx

@wild_cow_5769:matrix.org If someone has a group working on finding the dataset.

There are billions of people on earth. Someone downloaded dataset 9 before the link was taken down. We just have to find them :)

where did the party move?

It’s still here. No one dropped a complete dataset 9 yet tho..

Hasn't moved AFAIK, just going slowly.

This entire thing smells funny. Even OP turned ghost on the threat of suspect images that no one has seen…

Ask yourself. How did the times or whoever came up with this narrative even find these “suspect” images in a few hours when it seems no one in the world came even download the zip…

A person made a website just to host links and thumbnails for a better interface to the videos on the DoJ website.

They deleted everything including their account the same day.

Everyone. I know website is showing all blank. This is unfortunately the end of my little project. Due to certain circumstances, I had to take it down. Thank you everyone for supporting me and my effort.

Edit: Link

Link is dead

It still works for me. I can only see the comments on the post since it was deleted, but that’s what’s important here.

OP’s last message

Scary Comment 1

Scary Comment 2

The BBC is now reporting that "thousands" of documents have been removed because the DOJ improperly redacted information that can be used to identify the victims: https://www.bbc.com/news/articles/cn0k65pnxjxo

Epstein Files - Complete Dataset Audit Report

Generated: 2026-02-16 | Scope: Datasets 1–12 (VOL00001–VOL00012) | Total Size: ~220 GB


Background

The Epstein Files consist of 12 datasets of court-released documents, each containing PDF files identified by EFTA document IDs. These datasets were collected from links shared throughout this Lemmy thread, with Dataset 9 cross-referenced against a partial copy we had downloaded independently.

Each dataset includes OPT/DAT index files — the official Opticon load files used in e-discovery — which serve as the authoritative manifest of what each dataset should contain. This audit was compiled to:

  1. Verify completeness — compare every dataset against its OPT index to identify missing files
  2. Validate file integrity — confirm that all files are genuinely the file types they claim to be, not just by extension but by parsing their internal structure
  3. Detect duplicates — identify any byte-identical files within or across datasets
  4. Generate checksums — produce SHA256 hashes for every file to enable downstream integrity verification

Executive Summary

Metric Value
Total Unique Files 1,380,939
Total Document IDs (OPT) 2,731,789
Missing Files 25 (Dataset 9 only)
Corrupt PDFs 3 (Dataset 9 only)
Duplicates (intra + cross-dataset) 0
Mislabeled Files 0
Overall Completeness 99.998%

Dataset Overview

EPSTEIN FILES - DATASET SUMMARY ┌─────────┬──────────┬───────────┬──────────┬─────────┬─────────┬─────────┐ │ Dataset │ Volume │ Files │ Expected │ Missing │ Corrupt │ Size │ ├─────────┼──────────┼───────────┼──────────┼─────────┼─────────┼─────────┤ │ 1 │ VOL00001 │ 3,158 │ 3,158 │ 0 │ 0 │ 2.5 GB │ │ 2 │ VOL00002 │ 574 │ 574 │ 0 │ 0 │ 633 MB │ │ 3 │ VOL00003 │ 67 │ 67 │ 0 │ 0 │ 600 MB │ │ 4 │ VOL00004 │ 152 │ 152 │ 0 │ 0 │ 359 MB │ │ 5 │ VOL00005 │ 120 │ 120 │ 0 │ 0 │ 62 MB │ │ 6 │ VOL00006 │ 13 │ 13 │ 0 │ 0 │ 53 MB │ │ 7 │ VOL00007 │ 17 │ 17 │ 0 │ 0 │ 98 MB │ │ 8 │ VOL00008 │ 10,595 │ 10,595 │ 0 │ 0 │ 11 GB │ │ 9 │ VOL00009 │ 531,282 │ 531,307 │ 25 │ 3 │ 96 GB │ │ 10 │ VOL00010 │ 503,154 │ 503,154 │ 0 │ 0 │ 82 GB │ │ 11 │ VOL00011 │ 331,655 │ 331,655 │ 0 │ 0 │ 27 GB │ │ 12 │ VOL00012 │ 152 │ 152 │ 0 │ 0 │ 120 MB │ ├─────────┼──────────┼───────────┼──────────┼─────────┼─────────┼─────────┤ │ TOTAL │ │1,380,939 │1,380,964 │ 25 │ 3 │ ~220 GB │ └─────────┴──────────┴───────────┴──────────┴─────────┴─────────┴─────────┘

Notes

  • DS1: Two identical copies found (6,316 files on disk). Byte-for-byte identical via SHA256. Table above reflects one copy (3,158). One copy is redundant.
  • DS2: 699 document IDs map to 574 files (multi-page PDFs)
  • DS3: 1,847 document IDs across 67 files (~28 pages/doc avg)
  • DS5: 1:1 document-to-file ratio (single-page PDFs)
  • DS6: Smallest dataset by file count. ~37 pages/doc avg.
  • DS9: Largest dataset. 25 missing from OPT index, 3 structurally corrupt.
  • DS10: Second largest. 950,101 document IDs across 503,154 files.
  • DS11: Third largest. 517,382 document IDs across 331,655 files.

Dataset 9 — Missing Files (25)

EFTA00709804 EFTA00823221 EFTA00932520 EFTA00709805 EFTA00823319 EFTA00932521 EFTA00709806 EFTA00877475 EFTA00932522 EFTA00709807 EFTA00892252 EFTA00932523 EFTA00770595 EFTA00901740 EFTA00984666 EFTA00774768 EFTA00912980 EFTA00984668 EFTA00823190 EFTA00919433 EFTA01135215 EFTA00823191 EFTA00919434 EFTA01135708 EFTA00823192

Dataset 9 — Corrupted Files (3)

File Size Error
EFTA00645624.pdf 35 KB Missing trailer dictionary, broken xref table
EFTA01175426.pdf 827 KB Invalid xref entries, no page tree (0 pages)
EFTA01220934.pdf 1.1 MB Missing trailer dictionary, broken xref table

Valid %PDF- headers but cannot be rendered due to structural corruption. Likely corrupted during original document production or transfer.


File Type Verification

Two levels of verification performed on all 1,380,939 files:

  1. Magic Byte Detection* (file command) — All files contain valid %PDF- headers. *0 mislabeled.
  2. Deep PDF Validation* (pdfinfo, poppler 26.02.0) — Parsed xref tables, trailer dictionaries, and page trees. *3 structurally corrupt (Dataset 9 only).

Duplicate Analysis

  • Within Datasets: 0 intra-dataset hash duplicates across all 12 datasets.
  • Cross-Dataset: All 1,380,939 SHA256 hashes compared. 0 cross-dataset duplicates — every file is unique.
  • Dataset 1 Two Copies: Both copies byte-for-byte identical (SHA256 verified). One is redundant (~2.5 GB).

Integrity Verification

SHA256 checksums were generated for every file across all 12 datasets. Individual checksum files are available per dataset:

File Hashes Size
dataset_1_SHA256SUMS.txt 3,158 256 KB
dataset_2_SHA256SUMS.txt 574 47 KB
dataset_3_SHA256SUMS.txt 67 5.4 KB
dataset_4_SHA256SUMS.txt 152 12 KB
dataset_5_SHA256SUMS.txt 120 9.7 KB
dataset_6_SHA256SUMS.txt 13 1.1 KB
dataset_7_SHA256SUMS.txt 17 1.4 KB
dataset_8_SHA256SUMS.txt 10,595 859 KB
dataset_9_SHA256SUMS.txt 531,282 42 MB
dataset_10_SHA256SUMS.txt 503,154 40 MB
dataset_11_SHA256SUMS.txt 331,655 26 MB
dataset_12_SHA256SUMS.txt 152 12 KB

To verify any file against its checksum:
bash shasum -a 256 <filename>

If you'd like access to the SHA256 checksum files or can help host them, send me a DM.


Methodology

1. Hash Generation: SHA256 checksums via shasum -a 256 with 8-thread parallel processing
2. OPT Index Comparison: Each dataset's OPT load file parsed for expected file paths, compared against files on disk
3. Intra-Dataset Duplicate Detection: SHA256 hashes compared within each dataset
4. Cross-Dataset Duplicate Detection: All 1,380,939 hashes compared across all 12 datasets
5. File Type Verification (Level 1): Magic byte detection via file command
6. Deep PDF Validation (Level 2): Structure validation via pdfinfo (poppler 26.02.0) — xref tables, trailer dictionaries, page trees
7. Cross-Copy Comparison: Dataset 1's two copies compared via full SHA256 diff


Recommendations

  1. Remove Dataset 1 duplicate copy — saves ~2.5 GB
  2. Document the 25 missing Dataset 9 files — community assistance may help locate these
  3. Preserve OPT/DAT index files — authoritative record of expected contents
  4. Distribute SHA256SUMS.txt files — for downstream integrity verification

Report generated as part of the Epstein Files preservation and verification project.

Has anyone made a Dataset 9 and 10 torrent file without the files in it that the NYT reported as potentially CSAM?

I don't think anyone knows for sure what files those are. It would've been helpful if NYT published the file names. But maybe NYT isn't sure themselves as they wrote some of the images are "possibly" of teenagers.

To be on the safe side, I guess you could just remove all nude images from the dataset. It is a ton of images to go through though, hundreds of thousands.

The site epsteinfilez.com claims to have the full Dataset 9. Can't find a way to download it directly from them, since the site is only set up for searching. Perhaps if we asked nicely?

Hi, i am the admin of epsteinfilez.com . I have never claimed that i have the full Dataset 9. The banner says that i have 101GB of Dataset 9, the one that is also shared here with the magnet link.

The flashing banner at the top says that it includes 101GB of Data Set 9. Unfortunately, I think they just grabbed the larger of the two torrents.

Exactly! It clearly says that i used the 101GB magnet link :)

Deleted by author

 reply
5

Deleted by author

 reply
6
by
[deleted]

Deleted by author

 reply
2

Deleted by author

 reply
2

Can anybody verify these hashes?

https://03c.de/?30a9ce3df3d88c3c#A6EKCNKa1NtfJShxAqMRkbVQewhJ2H2n4DfL6YhRSmUa

EDIT: The reports about some of the victims having their personal data left unredacted in the files kind of makes me not want to download or seed them. I mean... not that it makes much of a difference now, but it just feels wrong.

Theoretically speaking, if a website has the archives, what is stopping people from downloading each file on a page by page bases from the archive?

Edit: Never mind to this I saw a full list of URLs that arhive managed to save and it is missing a lot.

nothing, but event the archived pages arent 100% because some of the files were "faked" in the paginated file lists on the DOJ site.
it does work well enough though. I did this to recover all the court records and FOIA files

for DS9, does anyone have the following files:

  EFTA00709804
  EFTA00709805
  EFTA00709806
  EFTA00709807
  EFTA00770595
  EFTA00774768
  EFTA00823190
  EFTA00823191
  EFTA00823192
  EFTA00823221
  EFTA00823319
  EFTA00877475
  EFTA00892252
  EFTA00901740
  EFTA00912980
  EFTA00919433
  EFTA00919434
  EFTA00932520
  EFTA00932521
  EFTA00932522
  EFTA00932523
  EFTA00984666
  EFTA00984668
  EFTA01135215
  EFTA01135708

If so, please DM me them and then I can include them in my master archive.

A consolidated (and structured) torrent file has been released: https://github.com/yung-megafone/Epstein-Files/issues/1#issuecomment-3860836655

Currently clearing data from my seedbox to get this added.

In it with 4 nodes still :)

I don't see this posted here yet. Below is the Github repo link for an index of torrent files, DOJ links, and mirrors, for every dataset.
https://github.com/yung-megafone/Epstein-Files

I am seeding 8, 10, 11, 12 (the ones only available as .zip on justice.gov) for the forseable future (as well as the partials of set 9).
I'm looking "everywhere" hoping for some success on part 9 and will be pushing that one until bandwidth dies or until a dozen or so seeders are on - whenever the complete bundle is assembled.
Hoping for some good news soon, things seem to be nuked very rapidly now.

I also read that the court documents and one other page was taken down - I have those files but they are not sorted by page, just thrown in a bulk download directory as I had a feeling this would happen and I wanted to pull them quickly.
If there's any use for them anyway I put them on Mega and Gofile a few days ago and they've not been taken down so far;

https://gofile.io/d/dff931d5-a646-46f1-b34e-079798f508a2
https://mega.nz/folder/XVMCgLLR#EKVS8Sfiry-VtVAxZ7q_Ig

It's most likely files that "everyone" already has but better one mirror too much than one less.

Thanks for the links. I downloaded the docs and will add them to the pile.

Also seeding (but will probably not for very long unless seeders start dropping, they are all at 300-1200 ATM)
59975667f8bdd5baf9945b0e2db8a57d52d32957
0a3d4b84a77bd982c9c2761f40944402b94f9c64
7ac8f771678d19c75a26ea6c14e7d4c003fbf9b6
c3a522d6810ee717a2c7e2ef705163e297d34b72
d509cc4ca1a415a9ba3b6cb920f67c44aed7fe1f
e618654607f2c34a41c88458bf2fcdfa86a52174
acb9cb1741502c7dc09460e4fb7b44eac8022906

Trying to pull
c100b1b7c4b1e662dd8adc79ae3e42eef6080aee (reduntant limited dataset for that GitHub relations chart)

Pulling
f5cbe5026b1f86617c520d0a9cd610d6254cbe85 (just listed on the GitHub repo that lists the same magnets as here - will probably become 2nd seeder in an hour or two and will stay seeding on that one for at least a week or until the swarm looks healthy by the dozens or so.)

Will continue to monitor whatever progress is being made here. I should also have a small subset of DS9 but it will likely only be the first 200 files or so at most. Needless to say I will compare against the existing torrents just in case.

Thanks everyone for your hard work, this is exactly why I started hoarding :)

EDIT: The last magnet ID I listed is the summarized torrent from the repo linked by Nomad64.

For those curious, here's the NYTimes article where they report on the CSAM in the publicly-released files: https://www.nytimes.com/2026/02/01/us/nude-photos-epstein-files.html (behind paywall.)

NYTimes says that they discovered the CSAM on Friday and notified the DOJ on Saturday, and the DOJ was diligent in removing the files NYTimes had flagged.

NYTimes does not say that the material is in Dataset 9 specifically, but we observed that the DOJ was removing files from Dataset 9 on Saturday and not other datasets, so the server behavior would be consistent with CSAM in Dataset 9.

That sounds bad and it must be awful for the victims. Still, the evidence must be preserved. The administration can't be trusted to do so. The stakes are too high.

And even though removing CSAM might be the official tagline, I have my doubts that that's the only stuff that is getting redacted/removed.

100% - That's why I haven't deleted my copy of Set 9. I have no plans to unzip it, and I'm glad that DOJ is removing the CSAM now, but I'm going to hold onto the set to preserve the valuable docs that DOJ is removing.

https://archive.org/details/ds-9-efta-gap-repair

Repaired gaps in from Partial Dataset 9
EFTA00593870.pdf
EFTA00595160.pdf
EFTA00595410.pdf
EFTA00595694.pdf
EFTA00595820.pdf
EFTA00597207.pdf
EFTA00605675.pdf
EFTA00645624.pdf
EFTA00774768.pdf
EFTA01175426.pdf
EFTA01220934.pdf

is anyone else having issues getting dataset 10 11* to start downloading? it has been sittiing at 0 percent for a day while everything else is done and seeding. it shows connections to peers, rechecking does nothing, deleting and re-adding does nothing, asking tracker for more peers does nothing

I have been seeding all of the datasets since Sunday. The copy of set 9 has been the busiest, with set 10 a distant second. I plan on seeding them for quite a while yet, and also picking up a consolidated torrent when that becomes available.

Hopefully you are able to get connected via the Swarm.

is there something I am missing on why it isn't connected given how much time and attempt to redo it? is it just an eventually thing?

I'm getting errors for 1 and 8, all the rest went smooth.

i am not seeing any errors, has just been stuck on downloading status with nothing going through. I originally added everything around the same time and all the other ones went through fine. I figured it was bugged or something so removed then readded it several times to no avail. I am not sure what else to try

its really strange because on my other machine, everythings going fine

read the OP re. DS9 and DS10

regardless of OP removing the magnet links or not, the torrents are still out there and that shouldn't stop it. secondly, I meant 11

by
[deleted]

Deleted by moderator

 reply
1

transmission

by
[deleted]

Deleted by moderator

 reply
1

some bad news, it looks like the data 9 zip file link doesn't work anymore. They appear to have removed the file so my download stopped at 36gb. I'm not familiar with their site so is this normal for them to remove the files and maybe put them back again once they've reorganized them and at the same link location? or are we having to do the scrape of each pdf like another user has been doing?

All the zip files are gone on the DOJ website. The links are gone.

this is ridiculous. Good thing we got in when we did!

Does anyone have the OTHER data sets from before? Ive been lasered in on the DS1-DS12 but havent looked at the other documents at all

All the zip download links are gone on the DOJ website.

It’s only a matter of time before all the files just go poof.

Let me ask a question.

For all the folks saying there are news reports of CSAM… Does that mean the news outlets got the full zip? How did they get it? No one else seems to be able to get it. Were they given it fist?

If they don’t have the zip how did they even find it within hours of the files being released?

Did they provide proof where they redacted the “danger” and said look… here is the proof?

Seems rather suspect…

Considering the massive effort of regular to comb through the files I would think the outcry would be gigantic….

by
[deleted]

Deleted by moderator

 reply
3

Yes…. It’s just an excuse to pull the files back and go after anyone who has them.

We didn't have trouble getting Datasets 10, 11, or 12. I think Dataset 9 was probably delivered fine on Friday, so the NYTimes was able to grab a complete copy. Then, NYTimes started reporting the abusive material, which prompted the DOJ to yoink the ZIP, and it's been screwy ever since.

I saw a post from a random Redditor confirming that they found abusive material, if that's the concern. I doubt that the reports are fabricated, but I also agree that the reports are a great excuse for the DOJ to remove legitimate files.

I'm not sure of the exact files that were reported by the NYT, but there certainly were some concerning images in the initial Jan 30 release, however it was certainly more than the reported 40. I saw others as well but I don't remember what the file numbers we're.

spoiler

[246249_247010]

From my own observation timeline on the images in question:
Jan 30: Images were accessible through DOJ directly. File numbers wereskipped* in the list, but were manually reachable through URL. All these photos were fully unredacted (uncensored).
Feb 1: *
Images were NOT accessible through DOJ anymore, returns "Page not found". However images were (and still are) snapshotted via web.archive.org.
Feb 2: Downloading the 87GB Set 9 appeared contain these images as well, meaning we likely all have them on our computers. yikes

These files were scrubbed from the DOJ website, along with many others.

I found many of the scrubbed files by parsing through the lists and finding large gaps in file numbers, where the preceding file did not contain multiple images/documents in one pdf. There are also tons of internal memos in the dataset that precede file groups and talk about the content ahead*. These memos surrounded files that seemed like they were *meant* to be redacted, so its worth poking around. I didn't go nuts, but things I found around these that interesting and *were also removed*:
- [EFTA00276493]: internal memo referring to Clinton photographed with "nude Gretchen".
- [EFTA00273790-EFTA276487]: *(removed)
looks like arial Lidar scans of the full estate?
- [EFTA00276220]: (removed) panoramic Infrared / xray-ray scan of a room

One Redditor said that they reported more than 500 nude images to the DOJ, all from Dataset 9.

I’m still waiting for just the first zip file to uncompress and it’s been HOURS. The ONLY reasonable explanation to bolster the NYT claim is that they put “AI” on the datasets running on a supercomputer, and “caught” the DOJ distributing CP! Show us the proof NYT! (redact faces and genitalia and show the images!) Then: CONVICT THEM ALL! LIFE IN PRISON FOR THE ENTIRE DOJ!!! ;-P

Or… wealthy people wanting the files off the internet.

Quite a few bugs in the script:

│ 1886339.3% (173,015,040 / 9,172 bytes) │

The most important thing, reading people that just post they executed the code and did not understand what is happening.

Copy paste the code into an AI thingy and ask it if it is safe.

Obviously OP has vibed this together too, but the vectors of attack are multiple.

by
[deleted]

Deleted by moderator

 reply
1

In regard to Dataset 9, it's currently being shared on Dread (forum).

I have no idea if it's legit or not, and Idc to find out after reading about what's in it from NYT.

this dude on pastebin posted his filetree in his epstein ubuntu env - i have a high confidence in whatever lives in his DataSet9Complete.zip file haha

No doubt. High confidence…. :)

Bro is about to be deported by ICE 

Deleted by author

 reply
0

Cant add you. Sent you a dm to get the deets.