PDF to Text/RTF fails
-
-
@girish There is also a problem when converting any file to PDF.
java.io.IOException: Command process failed with exit code 1. Error message: /usr/local/bin/unoconv:19: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives from distutils.version import LooseVersion unoconv: Cannot find a suitable office installation on your system. ERROR: Please locate your office installation and send your feedback to: http://github.com/dagwieers/unoconv/issues at stirling.software.SPDF.utils.ProcessExecutor.runCommandWithOutputHandling(ProcessExecutor.java:93) at stirling.software.SPDF.controller.api.converters.ConvertOfficeController.convertToPdf(ConvertOfficeController.java:42) at stirling.software.SPDF.controller.api.converters.ConvertOfficeController.processPdfWithOCR(ConvertOfficeController.java:74) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:207) at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:152) at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:118) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:884) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:797) at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1081) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:974) at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1011) at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:914) at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:590) at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:885) at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:658) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:205) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149) at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:51) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149) at stirling.software.SPDF.config.MetricsFilter.doFilterInternal(MetricsFilter.java:41) at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149) at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100) at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149) at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:93) at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149) at org.springframework.web.filter.ServerHttpObservationFilter.doFilterInternal(ServerHttpObservationFilter.java:109) at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149) at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201) at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:166) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:90) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:482) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:115) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:93) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74) at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:738) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:341) at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:390) at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:63) at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:894) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1741) at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:52) at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191) at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.base/java.lang.Thread.run(Thread.java:833)
-
-
For me, it’s doing something but doesn’t let me access the file after conversion - nothing happens:
Jul 01 23:05:45 Running command: soffice --infilter=writer_pdf_import --convert-to doc --outdir /tmp/output_12731918207241714492 /tmp/input_12820457046940439487.pdfCommand output: Jul 01 23:05:45 convert /tmp/input_12820457046940439487.pdf -> /tmp/output_12731918207241714492/input_12820457046940439487.doc using filter : MS Word 97 Jul 01 23:06:04 Running command: soffice --infilter=writer_pdf_import --convert-to docx --outdir /tmp/output_13110194667904771198 /tmp/input_2013883566298576050.pdfCommand output: Jul 01 23:06:04 convert /tmp/input_2013883566298576050.pdf -> /tmp/output_13110194667904771198/input_2013883566298576050.docx using filter : MS Word 2007 XML Jul 01 23:06:35 Running command: soffice --infilter=writer_pdf_import --convert-to rtf --outdir /tmp/output_12759007429764915141 /tmp/input_5102818123428729636.pdfCommand output: Jul 01 23:06:35 convert /tmp/input_5102818123428729636.pdf -> /tmp/output_12759007429764915141/input_5102818123428729636.rtf using filter : Rich Text Format
-
@LoudLemur said in PDF to Text/RTF fails:
I tried this yesterday, pdf-to-rtf, pdf-to-txt. Both failed, and created empty files instead.
Same.
@nebulon said in PDF to Text/RTF fails:
Were there any errors shown
No errors were shown on screen nor in the app logs.
@nebulon said in PDF to Text/RTF fails:
And just to be sure, have you tried other pdf documents?
Yes, I've tried numerous PDFs all with the same empty .txt file as the result.
-
@girish said in PDF to Text/RTF fails:
Yeah, I have never got the conversion to work.
So I guess this shouldn't be marked as solved.
@girish said in PDF to Text/RTF fails:
I think it's probably an upstream bug.
If that's the case perhaps @froodle can help?
-
@jdaviescoates You can also try if running
soffice --infilter=writer_pdf_import --convert-to txt:Text --outdir /tmp/invoice Invoice.pdf
works . For me, it doesn't produce anything even on my laptop. -
-
@girish said in PDF to Text/RTF fails:
@jdaviescoates You can also try if running
soffice --infilter=writer_pdf_import --convert-to txt:Text --outdir /tmp/invoice Invoice.pdf
works . For me, it doesn't produce anything even on my laptop.When I try that on my laptop I just get:
Warning: failed to launch javaldx - java may not function correctly
-
@jdaviescoates said in PDF to Text/RTF fails:
Warning: failed to launch javaldx - java may not function correctly
I made this error go away by installing a whole bunch of packages. Some openoffice java support etc. I don't recall which ones now.
-
@girish this step has no OCR if that's what you're wanting, you would need to run OCR step first
In that usecase this would only carry over the image file which txt wouldn't support
However I can try debug this to see if I can reproduce if you are on about a pdf containing actual text
Could very likely be issue on stirling pdf side -
@froodle said in PDF to Text/RTF fails:
However I can try debug this to see if I can reproduce if you are on about a pdf containing actual text
Could very likely be issue on stirling pdf sideI've tried numerous PDFs with actual text and all of them failed just resulting in a blank text file.