Browsertrix Crawler on Cloudron
"Browsertrix Crawler is a simplified (Chrome) browser-based high-fidelity crawling system, designed to run a complex, customizable browser-based crawl in a single Docker container. Browsertrix Crawler uses puppeteer-cluster and puppeteer to control one or more browsers in parallel."
One brilliant use for this tool is demonstrated by Zimit, which enables you to nominate a website, crawl it and then archive its webpages into .zim file which is readable offline in Kiwix.
Docker Image is available
This is a complex piece of software and the busy maintainer would like to help make it easier to use. It might be possible for open instances of Browsertix-Crawler to help scale-up the power of a crawl on larger websites. I suppose it might be possible to share results of crawls between co-operating instances too, at some stage.
For example, you can create a dump of all of Wikipedia, about 60GB, compressed into a .zim, and then browse it offline.