Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps | Demo | Docs | Install
  1. Cloudron Forum
  2. Support
  3. Email healthcheck notification: "Relay error: Connect to smtp.live.com timed out"

Email healthcheck notification: "Relay error: Connect to smtp.live.com timed out"

Scheduled Pinned Locked Moved Solved Support
emailoutbound
33 Posts 11 Posters 4.1k Views 9 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • girishG Offline
    girishG Offline
    girish
    Staff
    wrote on last edited by girish
    #12

    Indeed, smtp.live.com is apparently gone or does not respond to port 25 anymore.

    Some background: Cloudron tries to connect to some well know servers on port 25 for diagnostic purposes. It uses this to check if outbound port 25 is allowed on the VPS. It's not really used for anything else. The list of servers comes from https://git.cloudron.io/cloudron/box/-/blob/master/src/mail.js#L172

    The warning can be ignored, for the moment. I have removed it in the next release.

    I think we will try to create a smtpdiag.cloudron.io or something to test port 25 reach ability.

    d19dotcaD 1 Reply Last reply
    4
    • girishG girish

      Indeed, smtp.live.com is apparently gone or does not respond to port 25 anymore.

      Some background: Cloudron tries to connect to some well know servers on port 25 for diagnostic purposes. It uses this to check if outbound port 25 is allowed on the VPS. It's not really used for anything else. The list of servers comes from https://git.cloudron.io/cloudron/box/-/blob/master/src/mail.js#L172

      The warning can be ignored, for the moment. I have removed it in the next release.

      I think we will try to create a smtpdiag.cloudron.io or something to test port 25 reach ability.

      d19dotcaD Offline
      d19dotcaD Offline
      d19dotca
      wrote on last edited by d19dotca
      #13

      @girish Hi Girish! I think it's a good idea to add in a Cloudron-controlled SMTP server for testing purposes. I still would suggest we have a two-check failure workflow to avoid false-positives like this, as that would be best practice in similar scenarios outside of Cloudron (like liveness probes in Kubernetes which will generally work with multiple failure points to avoid false-positives). If it's too much work though I understand, I just still think it'd be really helpful for these types of scenarios and would so I'd love to see health checks done in such a way to avoid false-positives like this kind of issue.

      --
      Dustin Dauncey
      www.d19.ca

      girishG 1 Reply Last reply
      2
      • d19dotcaD d19dotca

        @girish Hi Girish! I think it's a good idea to add in a Cloudron-controlled SMTP server for testing purposes. I still would suggest we have a two-check failure workflow to avoid false-positives like this, as that would be best practice in similar scenarios outside of Cloudron (like liveness probes in Kubernetes which will generally work with multiple failure points to avoid false-positives). If it's too much work though I understand, I just still think it'd be really helpful for these types of scenarios and would so I'd love to see health checks done in such a way to avoid false-positives like this kind of issue.

        girishG Offline
        girishG Offline
        girish
        Staff
        wrote on last edited by girish
        #14

        @d19dotca Thing is since we don't control external services, it's hard to tell why something failed. Did they blacklist the server IP? Was it because outbound port 25 is blocked? Was it because the service died temporarily or even permanently (like the case for this post).

        Atleast, when I wrote the code, I didn't expect these services to go away 🙂 By now, all but 2 services remain. We started with around 5 services, 5 years ago. Anyway, I have now deployed port25check.cloudron.io and the code from next release will use that to check connectivity. Since, we don't blacklist there and will keep it running, we can be fairly certain that the VPS outbound port 25 is blocked. Let's see.

        d19dotcaD 1 Reply Last reply
        4
        • girishG girish

          @d19dotca Thing is since we don't control external services, it's hard to tell why something failed. Did they blacklist the server IP? Was it because outbound port 25 is blocked? Was it because the service died temporarily or even permanently (like the case for this post).

          Atleast, when I wrote the code, I didn't expect these services to go away 🙂 By now, all but 2 services remain. We started with around 5 services, 5 years ago. Anyway, I have now deployed port25check.cloudron.io and the code from next release will use that to check connectivity. Since, we don't blacklist there and will keep it running, we can be fairly certain that the VPS outbound port 25 is blocked. Let's see.

          d19dotcaD Offline
          d19dotcaD Offline
          d19dotca
          wrote on last edited by d19dotca
          #15

          @girish said in Email healthcheck notification: "Relay error: Connect to smtp.live.com timed out":

          Thing is since we don't control external services, it's hard to tell why something failed.

          For sure, but that's also why double-checking in the event of a failure would be best in order to avoid false-positives instead of one failure generating a ton of alerts. 😉 Logic to show that one failure would then cause Cloudron to perhaps not use it for a few hours would allow for rate-limiting or blacklisting to be resolved in time on its own, and would avoid needing to wait for an entire new release to update the list of SMTP servers as they change, etc. If one fails but one succeeds, we automatically know port 25 outbound is not blocked.

          I have now deployed port25check.cloudron.io and the code from next release will use that to check connectivity. Since, we don't blacklist there and will keep it running, we can be fairly certain that the VPS outbound port 25 is blocked.

          That's awesome and will add to the troubleshooting ability!! Happy to see that too.

          Personally I'd still love to see redundancy in place, as there will certainly be the rare outage on your end too as with other services, but this will at least add a bit more under your control to help lessen the likelihood of false positives which is still a step in the right direction. If I'm banging the drum well past my allotted time on this then that's understandable as it certainly isn't major, just something I'd love to see improved further still. I'll let it go now. 😛

          Thanks for everything you do! 🙂

          --
          Dustin Dauncey
          www.d19.ca

          girishG 1 Reply Last reply
          3
          • d19dotcaD d19dotca

            @girish said in Email healthcheck notification: "Relay error: Connect to smtp.live.com timed out":

            Thing is since we don't control external services, it's hard to tell why something failed.

            For sure, but that's also why double-checking in the event of a failure would be best in order to avoid false-positives instead of one failure generating a ton of alerts. 😉 Logic to show that one failure would then cause Cloudron to perhaps not use it for a few hours would allow for rate-limiting or blacklisting to be resolved in time on its own, and would avoid needing to wait for an entire new release to update the list of SMTP servers as they change, etc. If one fails but one succeeds, we automatically know port 25 outbound is not blocked.

            I have now deployed port25check.cloudron.io and the code from next release will use that to check connectivity. Since, we don't blacklist there and will keep it running, we can be fairly certain that the VPS outbound port 25 is blocked.

            That's awesome and will add to the troubleshooting ability!! Happy to see that too.

            Personally I'd still love to see redundancy in place, as there will certainly be the rare outage on your end too as with other services, but this will at least add a bit more under your control to help lessen the likelihood of false positives which is still a step in the right direction. If I'm banging the drum well past my allotted time on this then that's understandable as it certainly isn't major, just something I'd love to see improved further still. I'll let it go now. 😛

            Thanks for everything you do! 🙂

            girishG Offline
            girishG Offline
            girish
            Staff
            wrote on last edited by
            #16

            @d19dotca no worries 🙂 I think the issue is that let's say we add another external service for dependency and the connection does not work, what should we show in the UI? That outbound port 25 works or it does not? Is it useful to have messages like "We managed to connect to port25check.cloudron.io but not to smtp.live.com" (or any of those combinations). I suspect users will come back with same questions/confusion as they do now. Atleast, the code currently is written with the assumption that connectivity (or not) is a "reliable" indicator of outbound port 25. Maybe I misunderstood what you mean by redundancy.

            d19dotcaD timconsidineT RoundHouse1924R 3 Replies Last reply
            0
            • girishG girish

              @d19dotca no worries 🙂 I think the issue is that let's say we add another external service for dependency and the connection does not work, what should we show in the UI? That outbound port 25 works or it does not? Is it useful to have messages like "We managed to connect to port25check.cloudron.io but not to smtp.live.com" (or any of those combinations). I suspect users will come back with same questions/confusion as they do now. Atleast, the code currently is written with the assumption that connectivity (or not) is a "reliable" indicator of outbound port 25. Maybe I misunderstood what you mean by redundancy.

              d19dotcaD Offline
              d19dotcaD Offline
              d19dotca
              wrote on last edited by d19dotca
              #17

              @girish said in Email healthcheck notification: "Relay error: Connect to smtp.live.com timed out":

              what should we show in the UI? That outbound port 25 works or it does not? Is it useful to have messages like "We managed to connect to port25check.cloudron.io but not to smtp.live.com" (or any of those combinations).

              Good question! I don't actually think anything should be shown in the UI if only one SMTP test fails out of two, as that scenario would imply a false-positive.

              So what I envision is the following (hopefully this explains it better):

              -- Cloudron runs periodic checks on one of several SMTP servers for testing purposes.
              ---- If the check succeeds, then wait for next check 30 minute interval.
              ---- If the check fails, then run one more test right away (or even 60 seconds later to avoid network blips on the VPS) to a second/different SMTP server to validate the finding.
              ------ If the second SMTP server succeeds, then ignore the initial failure and mark as successful. Possibly make a log entry, but nothing needed in the UI.
              ------ If the second SMTP server fails, log the errors with more details (mention both SMTP servers that were checked and failed). In the UI, show a message similar to Relay error: SMTP connection tests failed. Check if port 25 (outbound) is blocked. View the Cloudron logs for more details.

              I don't really think the exact servers need to be listed in the UI if they're already in the logs. If both SMTP servers fail, it'll be with much higher confidence that port 25 outbound is blocked and that should be the admin's focus. If they can confirm that it's not blocked, then they can use the logs to get more details and run additional tests from their server.

              That's how I picture it anyways. 🙂 I see that as helping avoid false-positives while also providing enough details in the logs for when an issue is actually detected (and more confidently in that case too). The UI can be a simplified in a small way to refer the admin to their logs for further details while still suggesting that port 25 may be blocked.

              Side note: I just checked and the "troubleshooting" hyperlink at the bottom of the alert message overall leads to an incorrect spot. May need to be updated to perhaps https://docs.cloudron.io/email/#outbound-smtp or something like that.

              --
              Dustin Dauncey
              www.d19.ca

              1 Reply Last reply
              2
              • girishG girish

                @d19dotca no worries 🙂 I think the issue is that let's say we add another external service for dependency and the connection does not work, what should we show in the UI? That outbound port 25 works or it does not? Is it useful to have messages like "We managed to connect to port25check.cloudron.io but not to smtp.live.com" (or any of those combinations). I suspect users will come back with same questions/confusion as they do now. Atleast, the code currently is written with the assumption that connectivity (or not) is a "reliable" indicator of outbound port 25. Maybe I misunderstood what you mean by redundancy.

                timconsidineT Offline
                timconsidineT Offline
                timconsidine
                App Dev
                wrote on last edited by
                #18

                @girish are we supposed to be doing anything with our email dashboards ? I have a number of domains which are shown as red, but checking some of them in that domain panel status, all shows as green.
                Not too worried, just not sure what we should be doing.

                girishG 1 Reply Last reply
                0
                • timconsidineT timconsidine

                  @girish are we supposed to be doing anything with our email dashboards ? I have a number of domains which are shown as red, but checking some of them in that domain panel status, all shows as green.
                  Not too worried, just not sure what we should be doing.

                  girishG Offline
                  girishG Offline
                  girish
                  Staff
                  wrote on last edited by
                  #19

                  @timconsidine yes, correct, nothing to worry here. It will be fixed in the upcoming update.

                  1 Reply Last reply
                  2
                  • girishG girish

                    @d19dotca no worries 🙂 I think the issue is that let's say we add another external service for dependency and the connection does not work, what should we show in the UI? That outbound port 25 works or it does not? Is it useful to have messages like "We managed to connect to port25check.cloudron.io but not to smtp.live.com" (or any of those combinations). I suspect users will come back with same questions/confusion as they do now. Atleast, the code currently is written with the assumption that connectivity (or not) is a "reliable" indicator of outbound port 25. Maybe I misunderstood what you mean by redundancy.

                    RoundHouse1924R Offline
                    RoundHouse1924R Offline
                    RoundHouse1924
                    wrote on last edited by
                    #20

                    @girish
                    https://port25check.cloudron.io/ produces an error, as the cert is only for api.cloudron.io

                    girishG 1 Reply Last reply
                    0
                    • RoundHouse1924R RoundHouse1924

                      @girish
                      https://port25check.cloudron.io/ produces an error, as the cert is only for api.cloudron.io

                      girishG Offline
                      girishG Offline
                      girish
                      Staff
                      wrote on last edited by
                      #21

                      @roundhouse1924 that's expected. it's not a website and not meant to be connected via http/https. It's only on port 25. you can try telnet port25check.cloudron.io 25 .

                      1 Reply Last reply
                      4
                      • WiseMetalheadW WiseMetalhead referenced this topic on
                      • JOduMonTJ Offline
                        JOduMonTJ Offline
                        JOduMonT
                        wrote on last edited by JOduMonT
                        #22

                        Is anybody try one of these delisting process ??

                        https://sender.office.com

                        This one seams to be specifically for live.com and ...

                        https://support.microsoft.com/en-us/supportrequestform/8ad563e3-288e-2a61-8122-3ba03d6b8d75

                        https://sendersupport.olc.protection.outlook.com/snds/index.aspx

                        I did the 2 first one, the first is pretty quick you receive an email and validate if the IP of your server is in their internal block list,
                        the second is a form is a little bit more elaborate they ask for error message and if you have a website related to that domain.

                        1 Reply Last reply
                        1
                        • J Offline
                          J Offline
                          jayonrails
                          translator
                          wrote on last edited by
                          #23

                          Hi,

                          I have the same problem on my Cloudron right now:

                          Relay error: Connect to port25check.cloudron.io timed out. Check if port 25 (outbound) is blocked

                          Port 25 is not blocked.

                          1 Reply Last reply
                          0
                          • nebulonN Offline
                            nebulonN Offline
                            nebulon
                            Staff
                            wrote on last edited by
                            #24

                            Can you run telnet port25check.cloudron.io 25 via SSH on your server to see if it works?

                            1 Reply Last reply
                            0
                            • J Offline
                              J Offline
                              jayonrails
                              translator
                              wrote on last edited by
                              #25

                              Hi,

                              it does work on my server:

                              telnet port25check.cloudron.io 25
                              Trying 165.227.67.76...
                              Connected to api.cloudron.io.
                              Escape character is '^]'.
                              works
                              Connection closed by foreign host.
                              

                              Is it important? I am using Postmark as a mail relay on all my outgoing mails, so I think it is not neccesary to have port 25 open in general, because it is never used?

                              1 Reply Last reply
                              0
                              • nebulonN Offline
                                nebulonN Offline
                                nebulon
                                Staff
                                wrote on last edited by
                                #26

                                If you use a mail relay for all your domains, then this should not be relevant. I do wonder why it tests for it then and also why the check fails, since the code also just checks like that. Can you open the mail status tabs on all domains to see if this was just a temporary issue?

                                J 1 Reply Last reply
                                0
                                • nebulonN nebulon

                                  If you use a mail relay for all your domains, then this should not be relevant. I do wonder why it tests for it then and also why the check fails, since the code also just checks like that. Can you open the mail status tabs on all domains to see if this was just a temporary issue?

                                  J Offline
                                  J Offline
                                  jayonrails
                                  translator
                                  wrote on last edited by
                                  #27

                                  @nebulon said in Email healthcheck notification: "Relay error: Connect to smtp.live.com timed out":

                                  If you use a mail relay for all your domains, then this should not be relevant.

                                  Thanks for clarification!

                                  @nebulon said in Email healthcheck notification: "Relay error: Connect to smtp.live.com timed out":

                                  an you open the mail status tabs on all domains to see if this was just a temporary issue?

                                  I will check and let you know if I found something.

                                  1 Reply Last reply
                                  0
                                  • RoundHouse1924R Offline
                                    RoundHouse1924R Offline
                                    RoundHouse1924
                                    wrote on last edited by
                                    #28

                                    I had one domain using an external relay and having port 25 closed on the VPS. The above error was present, but disappeared when port 25 was opened.

                                    So, the port 25 check seems to be unnecessary and confusing for domains that use external relays.

                                    girishG 1 Reply Last reply
                                    0
                                    • RoundHouse1924R RoundHouse1924

                                      I had one domain using an external relay and having port 25 closed on the VPS. The above error was present, but disappeared when port 25 was opened.

                                      So, the port 25 check seems to be unnecessary and confusing for domains that use external relays.

                                      girishG Offline
                                      girishG Offline
                                      girish
                                      Staff
                                      wrote on last edited by
                                      #29

                                      @RoundHouse1924 said in Email healthcheck notification: "Relay error: Connect to smtp.live.com timed out":

                                      So, the port 25 check seems to be unnecessary and confusing for domains that use external relays.

                                      The port 25 check is skipped for domains with a relay. If you find otherwise, please let us know, cause it's a bug. I just tested it with a relay and it is skipped.

                                      RoundHouse1924R 1 Reply Last reply
                                      0
                                      • girishG girish

                                        @RoundHouse1924 said in Email healthcheck notification: "Relay error: Connect to smtp.live.com timed out":

                                        So, the port 25 check seems to be unnecessary and confusing for domains that use external relays.

                                        The port 25 check is skipped for domains with a relay. If you find otherwise, please let us know, cause it's a bug. I just tested it with a relay and it is skipped.

                                        RoundHouse1924R Offline
                                        RoundHouse1924R Offline
                                        RoundHouse1924
                                        wrote on last edited by RoundHouse1924
                                        #30

                                        @girish
                                        The situation I described was with v7.3.6 when I had only one outgoing domain and it used an external relay.

                                        Now with v7.4.1, I have 3 outgoing domains. One via the same external relay; the other 2 using the internal SMTP. Port 25 is open on the VPS and all 3 status lights are green.

                                        So, in order to test your answer, I blocked outgoing Port 25 on the VPS firewall.

                                        As expected, the 2 direct domains go red.

                                        However, the external relay domain's Cloudron status page shows:-
                                        MX record = Current value: [not set]
                                        DMARC record = Current value: [not set]
                                        SMTP Status Outbound SMTP (Relay) = Connection timeout

                                        Looks to me that, with Port 25 closed, the SMTP check is made, but times out.

                                        The puzzler is to know what could be causing the MX and DMARC record checks to fail --- just because Port 25 is closed.

                                        EDIT:
                                        With Port 25 closed, Uptime Kuma and Tiny Tiny RSS cannot do their stuff, so I've now reopened it.

                                        1 Reply Last reply
                                        0
                                        • girishG Offline
                                          girishG Offline
                                          girish
                                          Staff
                                          wrote on last edited by
                                          #31

                                          Strange, why would Uptime Kuma and TTRSS fail with port 25 closed ?

                                          Here's what I did:

                                          • Block the outbound port 25 - iptables -A OUTPUT -p tcp --destination-port 25 -j DROP

                                          • Check status:
                                            image.png

                                          • Unblock - iptables -D OUTPUT 1

                                          • Check status again:
                                            image.png

                                          1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Don't have an account? Register

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • Bookmarks
                                          • Search