Best Self-Hosted Archiving Tools in 2026
Quick Picks
| Use Case | Best Choice | Why |
|---|---|---|
| General web archiving | ArchiveBox | Saves any URL in multiple formats (HTML, PDF, screenshot, WARC) |
| Offline reference libraries | Kiwix | Serves Wikipedia, Arch Wiki, Stack Exchange, and thousands more |
| Lightest resource usage | Kiwix | 128 MB RAM, zero dependencies, static content serving |
| Bookmark preservation | ArchiveBox | Import from bookmarks, RSS feeds, or browser history |
The Full Ranking
1. ArchiveBox — Best Overall Web Archiver
ArchiveBox is a self-hosted personal Wayback Machine. Feed it URLs from bookmarks, RSS feeds, or browser history, and it saves complete snapshots in multiple formats — raw HTML, cleaned HTML, PDF, screenshot, WARC, and plain text. Each archived page gets a searchable entry in the web UI.
ArchiveBox handles JavaScript-heavy sites by rendering them through Chromium (via Playwright). This means modern SPAs, paywalled articles (if you’re logged in), and dynamic content all get properly archived. The WARC output is the gold standard for digital preservation — the same format used by the Internet Archive.
Pros:
- Archives any public URL in 6+ formats simultaneously
- JavaScript rendering via Chromium captures modern web pages
- Searchable web UI with timeline view
- REST API for programmatic archiving
- Imports from bookmarks (Netscape format), RSS, Pinboard, Pocket, browser history
- WARC output for long-term preservation
Cons:
- Resource-heavy during archiving (Chromium uses 1–2 GB RAM)
- Initial setup requires admin user creation and format configuration
- Archiving speed depends on target site response time
Best for: Anyone who wants permanent offline copies of web pages, articles, or research. The “link rot insurance” tool.
[Read our full guide: How to Self-Host ArchiveBox]
2. Kiwix — Best for Offline Libraries
Kiwix serves pre-built ZIM archives of entire websites. The Kiwix Foundation maintains a library of thousands of ZIM files — Wikipedia in 300+ languages, Arch Wiki, Project Gutenberg, Stack Exchange, TED Talks, WikiHow, and more. Download the files you want, point Kiwix at them, and browse everything offline through a clean web interface.
Kiwix is extraordinarily lightweight. The server uses 128–256 MB of RAM, has zero external dependencies (no database, no Chromium), and runs on hardware as modest as a Raspberry Pi 3. It was designed for schools and libraries in areas without reliable internet, which means the software is rock-solid and optimized for minimal resources.
Pros:
- Thousands of pre-built ZIM archives available (Wikipedia, Arch Wiki, Stack Exchange, etc.)
- Ultra-lightweight: 128 MB RAM, runs on Raspberry Pi
- Zero configuration — point at ZIM files and start
- Full-text search built into ZIM format
- Multi-architecture support (amd64, arm64, armv7, armv6)
- No internet required after initial ZIM download
Cons:
- Cannot archive custom URLs (pre-built content only)
- ZIM library is curated — not every website is available
- No API for programmatic control
- Large ZIM files require significant disk space (full Wikipedia: 100+ GB)
Best for: Offline reference libraries. Schools, homelab knowledge bases, disaster preparedness, or anyone who wants Wikipedia available without internet.
[Read our full guide: How to Self-Host Kiwix]
Comparison Table
| Feature | ArchiveBox | Kiwix |
|---|---|---|
| Primary purpose | Archive specific URLs | Serve pre-built site archives |
| Content source | Any public URL | Kiwix Foundation ZIM library |
| Output formats | HTML, PDF, screenshot, WARC, text | ZIM (browsable via HTTP) |
| JavaScript rendering | Yes (Chromium) | N/A (pre-rendered) |
| Full-text search | Yes | Yes |
| RAM (idle) | 300–500 MB | 128–256 MB |
| RAM (active) | 1–2 GB | 256–512 MB |
| Docker image | archivebox/archivebox:0.8.5rc52 | ghcr.io/kiwix/kiwix-tools:3.8.1 |
| Runs on Raspberry Pi | Possible but slow | Yes, designed for it |
| License | MIT | GPL-3.0 |
How to Choose
Want to save your own bookmarks and web pages? → ArchiveBox. It’s the only self-hosted tool that properly archives arbitrary URLs with JavaScript rendering.
Want offline Wikipedia and reference sites? → Kiwix. Nothing else serves pre-built website archives this efficiently.
Want both? Run them together. Combined idle RAM is under 1 GB. ArchiveBox handles your personal web archiving, Kiwix handles reference libraries.
Honorable Mentions
Wallabag (guide) and Linkwarden (guide) are bookmark managers with article saving — they extract and save article content but don’t do full-page archiving (no screenshots, no WARC, limited JavaScript rendering). If you just want to save articles to read later, they’re simpler alternatives to ArchiveBox.
Paperless-ngx (guide) handles document archiving (PDFs, scanned documents) rather than web archiving. Different use case but complementary — ArchiveBox for web pages, Paperless-ngx for documents.
Frequently Asked Questions
How much disk space does web archiving require?
It depends on what you archive. ArchiveBox saves each URL in multiple formats (HTML, PDF, screenshot, WARC) — a typical web page uses 5–20 MB across all formats. Archiving 1,000 pages requires roughly 10–20 GB. Kiwix ZIM files vary widely: English Wikipedia is ~100 GB, Stack Overflow is ~40 GB, but smaller reference sites are under 1 GB each.
Can ArchiveBox archive paywalled articles?
Yes, if you’re logged in. ArchiveBox uses Chromium for rendering, so if you configure it with your session cookies (via the CHROME_USER_DATA_DIR setting), it can access and archive content behind paywalls. This works for most sites but not those with aggressive anti-scraping measures.
What’s the difference between web archiving and bookmarking?
Bookmarking saves a link. Web archiving saves the actual page content. If the original site goes down, changes, or disappears, a bookmark is useless — an archive preserves the content permanently. Tools like Wallabag and Linkwarden sit in between: they save article text but don’t capture full-page screenshots, WARC files, or JavaScript-rendered content.
Can I run Kiwix on a Raspberry Pi?
Yes. Kiwix was designed for exactly this use case — providing offline reference libraries in resource-constrained environments. It runs on a Raspberry Pi 3 or newer with 128 MB RAM. The main constraint is storage: you need enough disk space for the ZIM files you want to serve.
Is WARC the best format for long-term preservation?
WARC (Web ARChive) is the gold standard. It’s the same format used by the Internet Archive’s Wayback Machine and is an ISO standard (ISO 28500:2017). WARC captures the complete HTTP exchange including headers, cookies, and redirects. For maximum preservation, ArchiveBox’s default of saving in multiple formats simultaneously (HTML + PDF + screenshot + WARC) is ideal.
Can ArchiveBox archive an entire website recursively?
Yes. Use the --depth flag to crawl linked pages recursively. archivebox add --depth=1 https://example.com archives the page plus all directly linked pages. Be cautious with higher depth values — crawling a large site can consume significant disk space and take hours.
How do I access my archives if the server goes down?
ArchiveBox stores archives as static files (HTML, PDF, screenshots) that are readable without the application running. You can browse the archive directory directly. WARC files can be replayed with tools like replayweb.page. Kiwix ZIM files can be opened with the Kiwix desktop reader on any platform.
Related
Get self-hosting tips in your inbox
Get the Docker Compose configs, hardware picks, and setup shortcuts we don't put in articles. Weekly. No spam.
Comments