the wayback machine how dose it work?

thrasher_565 · Jan 11, 2026

so how dose the wayback machine work? is p to p? or stuff saved on servers then some times the servers turned off? how dose stuff get lost?
for d2 stuff its a pain and have to keep refreshing to get some things to work. d2 wikke too is bad too...

fearedbliss · Jan 11, 2026

It's not P2P. All of the stuff is saved on the Internet Archives' servers. So centralized. In terms of things getting lost, I believe Internet Archive is just periodically crawling the internet and taking snapshots of whatever they can* at that point in time. So basic html pages, text, images, and I'm sure they have some sort of algorithm to determine if they should increase the frequency of re-visiting a specific website or not, or if they should take another snapshot when they do visit a website (to save space). So if that's the case, if Internet Archive decided that the they will archive Arreat Summit (or the previous The Chaos Sanctuary), and then their algorithm determines that the bot should re-visit the website in 1 month, that would mean that anything that appeared and disappeared between those two snapshots would _not_ be backed up, and thus lost in time. There can be other instances where data can be lost (like if they potentially ran some cleanups on their side, or if there is bit rot and some files get corrupted on their servers). This is an over-simplication and my idea of how they are doing this.

thrasher_565 · Jan 11, 2026

fearedbliss said:
It's not P2P. All of the stuff is saved on the Internet Archives' servers. So centralized. In terms of things getting lost, I believe Internet Archive is just periodically crawling the internet and taking snapshots of whatever they can* at that point in time. So basic html pages, text, images, and I'm sure they have some sort of algorithm to determine if they should increase the frequency of re-visiting a specific website or not, or if they should take another snapshot when they do visit a website (to save space). So if that's the case, if Internet Archive decided that the they will archive Arreat Summit (or the previous The Chaos Sanctuary), and then their algorithm determines that the bot should re-visit the website in 1 month, that would mean that anything that appeared and disappeared between those two snapshots would _not_ be backed up, and thus lost in time. There can be other instances where data can be lost (like if they potentially ran some cleanups on their side, or if there is bit rot and some files get corrupted on their servers). This is an over-simplication and my idea of how they are doing this.

is there a way to like download a page for offline?

the wayback machine how dose it work?

thrasher_565

Active member

fearedbliss

Administrator

thrasher_565

Active member