Installing Gotenberg and Tika was simpler than I expected (if you have spare VPSes)!
-
Someone HAS shared this info already, but I wanted to share it again.
I was really wanting to upload Word files. But obviously they won't get processed unless you have Gotenbern and Tika installed. Thanks to the post above from @ChristopherMag I thought I'd try to get those installed.
My challenges: I don't want to install on the VPS with Cloudron because I don't know if that will mess it up. Conceptually I also wasn't sure what address to use for a same-VPS-install-as-Cloudron since they wouldn't be in the same Docker world ( I don't know what it's called) as Cloudron. So, I looked at my other VPS.
VPS 2 runs CapRover, and CapRover offers Gotenberg. My first attempt failed due to a 504 error, which is some kind of timeout error. So I lengthened the access times and wait times for Gotenberg in the CapRover install form, and the second time it worked. I then copied and pasted the four lines that @ChristopherMag had in his post and edited it for my Gotenberg url*.
Then, I needed another VPS since the RAM on VPS 2 was close to max with that latest app. Fortunately, I have other VPSes! One of them is running a LAMP setup plus a Presearch install. I figured that adding Tika to that should be fine, and if something borked up, it wouldn't be so tragic a loss. I followed https://github.com/apache/tika-docker, running
docker run -d -p 127.0.0.1:9998:9998 apache/tika:<tag>(replacing localhost with my VPS' IP). It worked. Then I entered that url in the paperless-ngx paperless.conf. Voila! It all works.- However, I had tried to install Gotenberg using Easypanel (on yet another VPS - yeah, I'm a LEB fan), and when I entered the url from that I used the :3000. But it never worked. The Easypanel dashboard made it seem like I needed to have port 3000 as part of the url for paperless-ngx, but it never worked.
So when I found I could use CapRover for Gotenberg, I saw that it's dashboard just gave the domain as the url, minus the port. So, I thought, "OK, I will try that." Now, in my paperless.conf, the Tika url includes the port 9998, but the Gotenberg url doesn't, and it works. I wonder if I needed to have had the port when using Easypanel, but I am not going to try because right now it's all working.
I guess I'm surprised because in the past when I've tried to install more than one thing using Docker, by hand, it never worked. I always had to connect them somehow (I think it was more docker-compose at the time), and I could never figure it out. But maybe I understand it a bit better now, and I was pretty sure that plopping in Tika beside Presearch shouldn't mess anything up. I'm glad I could use CapRover though and Gotenberg has more working parts than Tika to function. My only concern is a warning from the Tika page,
In the example above, we recommend binding the server to localhost because Docker alters iptables and may expose your tika-server to the internet. If you are confident that your tika-server is on an isolated network you can simply run:I need to do some reading to see if using the IP of the VPS might weaken that server somehow. -
Someone HAS shared this info already, but I wanted to share it again.
I was really wanting to upload Word files. But obviously they won't get processed unless you have Gotenbern and Tika installed. Thanks to the post above from @ChristopherMag I thought I'd try to get those installed.
My challenges: I don't want to install on the VPS with Cloudron because I don't know if that will mess it up. Conceptually I also wasn't sure what address to use for a same-VPS-install-as-Cloudron since they wouldn't be in the same Docker world ( I don't know what it's called) as Cloudron. So, I looked at my other VPS.
VPS 2 runs CapRover, and CapRover offers Gotenberg. My first attempt failed due to a 504 error, which is some kind of timeout error. So I lengthened the access times and wait times for Gotenberg in the CapRover install form, and the second time it worked. I then copied and pasted the four lines that @ChristopherMag had in his post and edited it for my Gotenberg url*.
Then, I needed another VPS since the RAM on VPS 2 was close to max with that latest app. Fortunately, I have other VPSes! One of them is running a LAMP setup plus a Presearch install. I figured that adding Tika to that should be fine, and if something borked up, it wouldn't be so tragic a loss. I followed https://github.com/apache/tika-docker, running
docker run -d -p 127.0.0.1:9998:9998 apache/tika:<tag>(replacing localhost with my VPS' IP). It worked. Then I entered that url in the paperless-ngx paperless.conf. Voila! It all works.- However, I had tried to install Gotenberg using Easypanel (on yet another VPS - yeah, I'm a LEB fan), and when I entered the url from that I used the :3000. But it never worked. The Easypanel dashboard made it seem like I needed to have port 3000 as part of the url for paperless-ngx, but it never worked.
So when I found I could use CapRover for Gotenberg, I saw that it's dashboard just gave the domain as the url, minus the port. So, I thought, "OK, I will try that." Now, in my paperless.conf, the Tika url includes the port 9998, but the Gotenberg url doesn't, and it works. I wonder if I needed to have had the port when using Easypanel, but I am not going to try because right now it's all working.
I guess I'm surprised because in the past when I've tried to install more than one thing using Docker, by hand, it never worked. I always had to connect them somehow (I think it was more docker-compose at the time), and I could never figure it out. But maybe I understand it a bit better now, and I was pretty sure that plopping in Tika beside Presearch shouldn't mess anything up. I'm glad I could use CapRover though and Gotenberg has more working parts than Tika to function. My only concern is a warning from the Tika page,
In the example above, we recommend binding the server to localhost because Docker alters iptables and may expose your tika-server to the internet. If you are confident that your tika-server is on an isolated network you can simply run:I need to do some reading to see if using the IP of the VPS might weaken that server somehow. -
Someone HAS shared this info already, but I wanted to share it again.
I was really wanting to upload Word files. But obviously they won't get processed unless you have Gotenbern and Tika installed. Thanks to the post above from @ChristopherMag I thought I'd try to get those installed.
My challenges: I don't want to install on the VPS with Cloudron because I don't know if that will mess it up. Conceptually I also wasn't sure what address to use for a same-VPS-install-as-Cloudron since they wouldn't be in the same Docker world ( I don't know what it's called) as Cloudron. So, I looked at my other VPS.
VPS 2 runs CapRover, and CapRover offers Gotenberg. My first attempt failed due to a 504 error, which is some kind of timeout error. So I lengthened the access times and wait times for Gotenberg in the CapRover install form, and the second time it worked. I then copied and pasted the four lines that @ChristopherMag had in his post and edited it for my Gotenberg url*.
Then, I needed another VPS since the RAM on VPS 2 was close to max with that latest app. Fortunately, I have other VPSes! One of them is running a LAMP setup plus a Presearch install. I figured that adding Tika to that should be fine, and if something borked up, it wouldn't be so tragic a loss. I followed https://github.com/apache/tika-docker, running
docker run -d -p 127.0.0.1:9998:9998 apache/tika:<tag>(replacing localhost with my VPS' IP). It worked. Then I entered that url in the paperless-ngx paperless.conf. Voila! It all works.- However, I had tried to install Gotenberg using Easypanel (on yet another VPS - yeah, I'm a LEB fan), and when I entered the url from that I used the :3000. But it never worked. The Easypanel dashboard made it seem like I needed to have port 3000 as part of the url for paperless-ngx, but it never worked.
So when I found I could use CapRover for Gotenberg, I saw that it's dashboard just gave the domain as the url, minus the port. So, I thought, "OK, I will try that." Now, in my paperless.conf, the Tika url includes the port 9998, but the Gotenberg url doesn't, and it works. I wonder if I needed to have had the port when using Easypanel, but I am not going to try because right now it's all working.
I guess I'm surprised because in the past when I've tried to install more than one thing using Docker, by hand, it never worked. I always had to connect them somehow (I think it was more docker-compose at the time), and I could never figure it out. But maybe I understand it a bit better now, and I was pretty sure that plopping in Tika beside Presearch shouldn't mess anything up. I'm glad I could use CapRover though and Gotenberg has more working parts than Tika to function. My only concern is a warning from the Tika page,
In the example above, we recommend binding the server to localhost because Docker alters iptables and may expose your tika-server to the internet. If you are confident that your tika-server is on an isolated network you can simply run:I need to do some reading to see if using the IP of the VPS might weaken that server somehow.@scooke maybe you can clarify one thing which is not clear in my mind
I put PDF and JPG/PNG into Paperless because these formats are usually not edited, they're semi-frozen.
XLS(X) and DOC(X) are often more living documents with edits, especially XLS(X). Does that mean you re-upload into Paperless when you made a local edit ? And delete the old one ? Or you only upload MS documents which are "finished" and won't change ?
I think Paperless is great and use it for "documents of record", invoices, agreements etc. I tend to think Nextcloud (or Seafile in my case) is more appropriate for living documents.
Interested in your and other views.
-
@scooke maybe you can clarify one thing which is not clear in my mind
I put PDF and JPG/PNG into Paperless because these formats are usually not edited, they're semi-frozen.
XLS(X) and DOC(X) are often more living documents with edits, especially XLS(X). Does that mean you re-upload into Paperless when you made a local edit ? And delete the old one ? Or you only upload MS documents which are "finished" and won't change ?
I think Paperless is great and use it for "documents of record", invoices, agreements etc. I tend to think Nextcloud (or Seafile in my case) is more appropriate for living documents.
Interested in your and other views.
@timconsidine Yes, definitely for "finished" documents.
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login