indexing of office documents?
-
@girish said in indexing of office documents?:
I think we have to investigate what nextcloud needs for FTS as well
On that, I spotted recently that they do now at least mention Solr under Platform Apps over on https://github.com/nextcloud/fulltextsearch but as far as I can tell from the linked wiki https://github.com/nextcloud/fulltextsearch/wiki to date there is still only an Elastic Search Platform App.
-
@ChristopherMag do you or anyone else have experience of using this kind of setup for Mac OS document formats like Pages and Numbers ?
-
@timconsidine It looks like Apache Tika supports the document formats from the iWork suite like pages.
I tried to upload a
.pages
file to paperless-ngx with Tika and gotenberg configured and paperless popped up a failure message with the the errorFile type application/zip not supported
.I believe this signals.py file in the paperless-ngx project would need to add support for the various iWork software suite formats to resolve this error and get this working assuming you already have Tika and gotenberg setup and working with paperless-ngx.
You could probably open a github issue in the paperless-ngx repository on Github and see if they can assist with adding support fort his.
-
Any updates on this?
-
@necrevistonnezr nothing yet...
-
Hi, is there an update on this? I tried @ChristopherMag 's suggestion. However, I get connection refused for Tika. Is this something to do with iptables? How can I allow connection to the container for paperless app to access?
EDIT: I must say that I have installed docker - tika and gotenberg in the same system as Cloudron.
-
I don't think we have an update on this yet. Possibly your containers are not within the same docker network on the system? Either way adding docker container on the side of Cloudron will break on Cloudron updates, so this is not very useful to investigate as such. Have you instead tried to run the required services on a separate isolated server instead?
-
@nebulon , thanks for your reply. Tried both ways, containers outside and inside Cloudron network. Good to know doing the later will break updates. Removed those containers now. Is it difficult to pre-install these containers via the app itself? Alternatively, may be provide them as separate installations as separate Cloudron apps which can be linked to paperless?
-
Fyi gotenberg publishes cloudron specific images now. Not sure the history of how or why that was started but I would assume those are meant to be used as a cloudron app though I don't see any app in the app store for it.
PS, don't use these images for your own gotenberg instance that your integrating with paperless, they exist hopefully to make it easier one day to run gotenberg on cloudron directly.
-
@ChristopherMag It says ‘cloudrun’ - sure it’s just a typo or does it mean something like ‘cloud-run’?
-
@necrevistonnezr Wow, your right, those images probably have nothing to do with cloudron!
Thanks for pointing that out, for all others, please disregard my previous comment.