<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Browsertrix Crawler on Cloudron]]></title><description><![CDATA[<p dir="auto">"Browsertrix Crawler is a simplified (Chrome) browser-based high-fidelity crawling system, designed to run a complex, customizable browser-based crawl in a single Docker container. Browsertrix Crawler uses puppeteer-cluster and puppeteer to control one or more browsers in parallel."</p>
<p dir="auto">One brilliant use for this tool is demonstrated by Zimit, which enables you to nominate a website,  crawl it and then archive its webpages into .zim file which is readable offline in Kiwix.</p>
<p dir="auto"><a href="https://github.com/webrecorder/browsertrix-crawler" target="_blank" rel="noopener noreferrer nofollow ugc">https://github.com/webrecorder/browsertrix-crawler</a><br />
GPL v3<br />
Docker Image is available</p>
<p dir="auto"><a href="https://webrecorder.net/tools#browsertrix" target="_blank" rel="noopener noreferrer nofollow ugc">https://webrecorder.net/tools#browsertrix</a></p>
<p dir="auto">Zimit:<br />
<a href="https://youzim.it" target="_blank" rel="noopener noreferrer nofollow ugc">https://youzim.it</a><br />
Kiwix:<br />
<a href="https://kiwix.org" target="_blank" rel="noopener noreferrer nofollow ugc">https://kiwix.org</a><br />
.zim<br />
<a href="https://www.openzim.org/wiki/ZIM_file_format" target="_blank" rel="noopener noreferrer nofollow ugc">https://www.openzim.org/wiki/ZIM_file_format</a></p>
<p dir="auto">This is a complex piece of software and the busy maintainer would like to help make it easier to use. It might be possible for open instances of Browsertix-Crawler to help scale-up the power of a crawl on larger websites. I suppose it might be possible to share results of crawls between co-operating instances too, at some stage.</p>
<p dir="auto">For example, you can create a dump of all of Wikipedia, about 60GB, compressed into a .zim, and then browse it offline.</p>
]]></description><link>https://forum.cloudron.io/topic/6665/browsertrix-crawler-on-cloudron</link><generator>RSS for Node</generator><lastBuildDate>Fri, 15 May 2026 13:48:31 GMT</lastBuildDate><atom:link href="https://forum.cloudron.io/topic/6665.rss" rel="self" type="application/rss+xml"/><pubDate>Sat, 19 Mar 2022 01:37:41 GMT</pubDate><ttl>60</ttl></channel></rss>