Firecrawl on Cloudron - Turn any site into LLM data by web scraping
-
[EDITED by Mod]
- Main Page: https://www.firecrawl.dev
- Git: https://github.com/mendableai/firecrawl
- Licence: GNU Affero General Public License v3.0
- Docker: Yes https://github.com/mendableai/firecrawl/blob/main/docker-compose.yaml
- Demo: https://www.firecrawl.dev/playground?url=https%3A%2F%2Fcloudron.io&mode=scrape
-
Summary:
Firecrawl (https://www.firecrawl.dev) is a web scraping tool that prepares data in LLM-readable format that can be self-hosted.
Crawl and convert any website into LLM-ready markdown or structured data. Built by Mendable.ai and the Firecrawl community. Includes powerful scraping, crawling and data extraction capabilities. -
This repository is in its early development stages. We are still merging custom modules in the mono repo. It's not completely yet ready for full self-host deployment, but you can already run it locally.
- Notes:
Cloudron doesn't have a self-hosted scraper yet, so maybe this could be a good addition.
Here is the self-hosting guide: https://github.com/mendableai/firecrawl/blob/main/SELF_HOST.md
- Alternative to / Libhunt link: e.g.
- Screenshots:
-
Hey, @ekevu123, thank you for this brilliant app wish! I really hope this is supported on Cloudron soon.
I have heavily edited your initial post to try and use the new template that is being developed for the App Wishlist forum.
- What do you think about the new appearance of your post?
- Is it very objectionable to have somebody mod your post like this?
- Do you have any suggestions about doing this in the future?
Thanks!
-
@ekevu123 The template is posted at https://forum.cloudron.io/topic/12472/please-use-this-template-to-make-an-app-wishlist-request by @LoudLemur . Looks like a good idea to have posts (in this category) formatted a certain way. For other part of the forum, generally moderators don't edit posts (only obvious typos and language).
I am hoping people don't consider it rude if moderators edit the posts in the App Requests Category alone. Besides, the original poster gets reputation (the up arrow) anyway.
-
As I said, I would mark it accordingly, then it should be fine. I didn't know about the template, I will try to use it next time.
-
Definitely needed! This would catapult the possibilities for N8N & Cloudron being able to leverage it's capabilities in big ways!
-
I second this. Firecrawl would be a great addition to the App Library.
-
Has anyone used Firecrawl self-hosted? They describe as a main difference of their self-hosted vs. cloud variant that the cloud version rotates IP addresses, so it gets better around blockers. I have never used Firecrawl self-hosted, has this been an issue to anyone?
-
A new release of Firecrawl came available:
https://github.com/mendableai/firecrawl/releases/tag/v1.11.0There are lots of new improvements:
Firecrawl v1.11.0 is here!
Major Features
Launched our Firecrawl Index
Speed up scrapes 5x if opted in
Improved Activity Logs
View webhook events
Active crawl management
Fire Enrich Example (Open Source Clay)
Community Java SDK
and a lot more
Features
Improved Playwright tests and webhook test coverage
Added GET /crawl/ongoing endpoint
Introduced tag support in change tracking
Added integration field to jobs and propagated through queue worker
Parallel testing for runpod v2 and updated mu
Ported queryIndexAtSplitLevel to RPC
Enhanced SDK with index and missing parameters
Removed redundant GCS check to improve performance
Added credits_billed field across pipeline
Enabled domain-level index splitting for better map querying
Used index in search and extract operations
Removed unused index columns
Fixes & Improvements
Fixed crawl pre-finishing logic
Refactored callWebhook and added logging
Improved index testing (FIR-2214)
Fixed JS SDK tests
Clarified scrape options usage in README
Fixed missing PLAYWRIGHT_MICROSERVICE_URL in env example
Improved concurrency limit notification emails
Removed query param sanitization that broke extract