Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps | Demo | Docs | Install
  1. Cloudron Forum
  2. Support
  3. Email healthcheck notification: "Relay error: Connect to smtp.live.com timed out"

Email healthcheck notification: "Relay error: Connect to smtp.live.com timed out"

Scheduled Pinned Locked Moved Solved Support
emailoutbound
33 Posts 11 Posters 3.9k Views 9 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • d19dotcaD Offline
      d19dotcaD Offline
      d19dotca
      wrote on last edited by girish
      #1

      Hello,

      I noticed there was an incident with the smtp.live.com about 3 hours ago according to the Cloudron notifications page. It was only for five domains, but they all shared the same SMTP endpoint so I suspect there was a blip on Microsoft's side. Just an FYI.

      Not concerned because I know it's a false alert, but it did get me thinking... would it not be better to perhaps try one more SMTP destination if the first one reports a failure by the healthcheck? That would likely avoid false-positives like this one.

      a46ce2d0-1c9b-4557-bb53-4b7ca0e6cfff-image.png

      --
      Dustin Dauncey
      www.d19.ca

      timconsidineT P 2 Replies Last reply
      4
      • d19dotcaD d19dotca

        @girish Hi Girish! I think it's a good idea to add in a Cloudron-controlled SMTP server for testing purposes. I still would suggest we have a two-check failure workflow to avoid false-positives like this, as that would be best practice in similar scenarios outside of Cloudron (like liveness probes in Kubernetes which will generally work with multiple failure points to avoid false-positives). If it's too much work though I understand, I just still think it'd be really helpful for these types of scenarios and would so I'd love to see health checks done in such a way to avoid false-positives like this kind of issue.

        girishG Offline
        girishG Offline
        girish
        Staff
        wrote on last edited by girish
        #14

        @d19dotca Thing is since we don't control external services, it's hard to tell why something failed. Did they blacklist the server IP? Was it because outbound port 25 is blocked? Was it because the service died temporarily or even permanently (like the case for this post).

        Atleast, when I wrote the code, I didn't expect these services to go away 🙂 By now, all but 2 services remain. We started with around 5 services, 5 years ago. Anyway, I have now deployed port25check.cloudron.io and the code from next release will use that to check connectivity. Since, we don't blacklist there and will keep it running, we can be fairly certain that the VPS outbound port 25 is blocked. Let's see.

        d19dotcaD 1 Reply Last reply
        4
        • d19dotcaD Offline
          d19dotcaD Offline
          d19dotca
          wrote on last edited by
          #2

          I continue to get periodic failures to specifically the smtp.live.com server, by the way, causing random health check failures even though it's really fine overall.

          Jan 30 00:01:58 box:mail Ignored error - relay : Connect to smtp.live.com timed out.
          Jan 30 00:01:58 box:mail Ignored error - relay : Connect to smtp.live.com timed out.
          Jan 30 00:01:58 box:mail Ignored error - relay : Connect to smtp.live.com timed out.
          Jan 30 00:01:59 box:mail Ignored error - relay : Connect to smtp.live.com timed out.
          Jan 30 00:01:59 box:mail Ignored error - relay : Connect to smtp.live.com timed out.
          Jan 30 00:01:59 box:mail Ignored error - relay : Connect to smtp.live.com timed out.
          

          --
          Dustin Dauncey
          www.d19.ca

          archosA 1 Reply Last reply
          0
          • d19dotcaD d19dotca

            I continue to get periodic failures to specifically the smtp.live.com server, by the way, causing random health check failures even though it's really fine overall.

            Jan 30 00:01:58 box:mail Ignored error - relay : Connect to smtp.live.com timed out.
            Jan 30 00:01:58 box:mail Ignored error - relay : Connect to smtp.live.com timed out.
            Jan 30 00:01:58 box:mail Ignored error - relay : Connect to smtp.live.com timed out.
            Jan 30 00:01:59 box:mail Ignored error - relay : Connect to smtp.live.com timed out.
            Jan 30 00:01:59 box:mail Ignored error - relay : Connect to smtp.live.com timed out.
            Jan 30 00:01:59 box:mail Ignored error - relay : Connect to smtp.live.com timed out.
            
            archosA Offline
            archosA Offline
            archos
            wrote on last edited by
            #3

            @d19dotca Hi, I was just about to write too. I'm having the same problem, so I shouldn't worry, is it nothing serious?

            1 Reply Last reply
            1
            • d19dotcaD d19dotca

              Hello,

              I noticed there was an incident with the smtp.live.com about 3 hours ago according to the Cloudron notifications page. It was only for five domains, but they all shared the same SMTP endpoint so I suspect there was a blip on Microsoft's side. Just an FYI.

              Not concerned because I know it's a false alert, but it did get me thinking... would it not be better to perhaps try one more SMTP destination if the first one reports a failure by the healthcheck? That would likely avoid false-positives like this one.

              a46ce2d0-1c9b-4557-bb53-4b7ca0e6cfff-image.png

              timconsidineT Offline
              timconsidineT Offline
              timconsidine
              App Dev
              wrote on last edited by
              #4

              @d19dotca I'm getting similar notice on some of my domains, I imagine it is a DNS lookup error

              C 1 Reply Last reply
              0
              • timconsidineT timconsidine

                @d19dotca I'm getting similar notice on some of my domains, I imagine it is a DNS lookup error

                C Offline
                C Offline
                ccfu
                wrote on last edited by
                #5

                I have also been getting random healthcheck failures since yesterday. Always one or two mail domains randomly show red and if I refresh the page they show green but probably a different one or two will show red. Mail is working fine though and if I check the status tab of a domain showing red on the overview page, everything is green. Looks like a DNS error / timeout as @timconsidine mentioned.

                1 Reply Last reply
                1
                • d19dotcaD Offline
                  d19dotcaD Offline
                  d19dotca
                  wrote on last edited by d19dotca
                  #6

                  @timconsidine & @ccfu - I don't think this is a DNS error at all (if it was I'd expect different log entries). This is just a simple timeout. It knows where smtp.live.com is and tries to connect but it times out (in other words smtp.live.com isn't responding within a specified time). I'm pretty sure the issue is on Microsoft's side in this case.

                  @archos - as long as it's intermittent for you, then yes it should be nothing to worry about. It's likely the same checks to smtp.live.com as I'm experiencing too.

                  @staff - I think it'd be great if we could have some redundancy built-in. This isn't the first time this has happened to my knowledge. Sometimes free SMTP services have issues. I think it'd be great to change the logic to be "Connect to SMTP A and see if it succeeds. If Connect to SMTP A fails, then attempt one more Connect but this time to SMTP B to verify the failure. If SMTP B is a success, mark as success. If both SMTP A and SMTP B are failures, mark as failure."

                  --
                  Dustin Dauncey
                  www.d19.ca

                  1 Reply Last reply
                  3
                  • d19dotcaD Offline
                    d19dotcaD Offline
                    d19dotca
                    wrote on last edited by d19dotca
                    #7

                    FYI - MXToolbox reports the same issue: https://mxtoolbox.com/SuperTool.aspx?action=smtp%3Asmtp.live.com&run=toolpage

                    Connecting to 204.79.197.212
                    1/30/2022 10:56:31 AM Connection attempt #1 - Unable to connect after 15 seconds. [15.02 sec]
                    
                    LookupServer 15082ms
                    

                    --
                    Dustin Dauncey
                    www.d19.ca

                    1 Reply Last reply
                    3
                    • d19dotcaD Offline
                      d19dotcaD Offline
                      d19dotca
                      wrote on last edited by d19dotca
                      #8

                      Perhaps the short list here needs to be updated with a few more too: https://git.cloudron.io/cloudron/box/-/blob/master/src/mail.js#L171-176

                      In doing some testing, I'd suggest adding these three to the list for checks too...

                      smtp.mail.yahoo.com (Report: https://mxtoolbox.com/SuperTool.aspx?action=smtp%3Asmtp.mail.yahoo.com&run=toolpage)

                      smtp.aol.com (Report: https://mxtoolbox.com/SuperTool.aspx?action=smtp%3Asmtp.aol.com&run=toolpage)

                      mail.gmx.com (Report: https://mxtoolbox.com/SuperTool.aspx?action=smtp%3Amail.gmx.com&run=toolpage)

                      --
                      Dustin Dauncey
                      www.d19.ca

                      1 Reply Last reply
                      2
                      • d19dotcaD d19dotca

                        Hello,

                        I noticed there was an incident with the smtp.live.com about 3 hours ago according to the Cloudron notifications page. It was only for five domains, but they all shared the same SMTP endpoint so I suspect there was a blip on Microsoft's side. Just an FYI.

                        Not concerned because I know it's a false alert, but it did get me thinking... would it not be better to perhaps try one more SMTP destination if the first one reports a failure by the healthcheck? That would likely avoid false-positives like this one.

                        a46ce2d0-1c9b-4557-bb53-4b7ca0e6cfff-image.png

                        P Offline
                        P Offline
                        p44
                        translator
                        wrote on last edited by
                        #9

                        @d19dotca Me to I had same problem

                        1 Reply Last reply
                        0
                        • nebulonN Offline
                          nebulonN Offline
                          nebulon
                          Staff
                          wrote on last edited by
                          #10

                          Just to rule out one point. Those smtp servers will quickly rate-limit, so if you refresh the status check from your Cloudron dashboard, after already a few attempts within a short period of time, they will fail as the IP gets temporarily blocked.

                          1 Reply Last reply
                          1
                          • jdaviescoatesJ Offline
                            jdaviescoatesJ Offline
                            jdaviescoates
                            wrote on last edited by
                            #11

                            Just adding to the chorus of people noticing this happen to them.

                            I just spotted this notification from 12 hours ago:

                            Relay error: Connect to smtp.live.com timed out. Check if port 25 (outbound) is blocked
                            

                            I wonder if the timeout settings on smtp.live.com have recently changed or something to make it time out quicker.

                            I use Cloudron with Gandi & Hetzner

                            1 Reply Last reply
                            0
                            • girishG Offline
                              girishG Offline
                              girish
                              Staff
                              wrote on last edited by girish
                              #12

                              Indeed, smtp.live.com is apparently gone or does not respond to port 25 anymore.

                              Some background: Cloudron tries to connect to some well know servers on port 25 for diagnostic purposes. It uses this to check if outbound port 25 is allowed on the VPS. It's not really used for anything else. The list of servers comes from https://git.cloudron.io/cloudron/box/-/blob/master/src/mail.js#L172

                              The warning can be ignored, for the moment. I have removed it in the next release.

                              I think we will try to create a smtpdiag.cloudron.io or something to test port 25 reach ability.

                              d19dotcaD 1 Reply Last reply
                              4
                              • girishG girish

                                Indeed, smtp.live.com is apparently gone or does not respond to port 25 anymore.

                                Some background: Cloudron tries to connect to some well know servers on port 25 for diagnostic purposes. It uses this to check if outbound port 25 is allowed on the VPS. It's not really used for anything else. The list of servers comes from https://git.cloudron.io/cloudron/box/-/blob/master/src/mail.js#L172

                                The warning can be ignored, for the moment. I have removed it in the next release.

                                I think we will try to create a smtpdiag.cloudron.io or something to test port 25 reach ability.

                                d19dotcaD Offline
                                d19dotcaD Offline
                                d19dotca
                                wrote on last edited by d19dotca
                                #13

                                @girish Hi Girish! I think it's a good idea to add in a Cloudron-controlled SMTP server for testing purposes. I still would suggest we have a two-check failure workflow to avoid false-positives like this, as that would be best practice in similar scenarios outside of Cloudron (like liveness probes in Kubernetes which will generally work with multiple failure points to avoid false-positives). If it's too much work though I understand, I just still think it'd be really helpful for these types of scenarios and would so I'd love to see health checks done in such a way to avoid false-positives like this kind of issue.

                                --
                                Dustin Dauncey
                                www.d19.ca

                                girishG 1 Reply Last reply
                                2
                                • d19dotcaD d19dotca

                                  @girish Hi Girish! I think it's a good idea to add in a Cloudron-controlled SMTP server for testing purposes. I still would suggest we have a two-check failure workflow to avoid false-positives like this, as that would be best practice in similar scenarios outside of Cloudron (like liveness probes in Kubernetes which will generally work with multiple failure points to avoid false-positives). If it's too much work though I understand, I just still think it'd be really helpful for these types of scenarios and would so I'd love to see health checks done in such a way to avoid false-positives like this kind of issue.

                                  girishG Offline
                                  girishG Offline
                                  girish
                                  Staff
                                  wrote on last edited by girish
                                  #14

                                  @d19dotca Thing is since we don't control external services, it's hard to tell why something failed. Did they blacklist the server IP? Was it because outbound port 25 is blocked? Was it because the service died temporarily or even permanently (like the case for this post).

                                  Atleast, when I wrote the code, I didn't expect these services to go away 🙂 By now, all but 2 services remain. We started with around 5 services, 5 years ago. Anyway, I have now deployed port25check.cloudron.io and the code from next release will use that to check connectivity. Since, we don't blacklist there and will keep it running, we can be fairly certain that the VPS outbound port 25 is blocked. Let's see.

                                  d19dotcaD 1 Reply Last reply
                                  4
                                  • girishG girish

                                    @d19dotca Thing is since we don't control external services, it's hard to tell why something failed. Did they blacklist the server IP? Was it because outbound port 25 is blocked? Was it because the service died temporarily or even permanently (like the case for this post).

                                    Atleast, when I wrote the code, I didn't expect these services to go away 🙂 By now, all but 2 services remain. We started with around 5 services, 5 years ago. Anyway, I have now deployed port25check.cloudron.io and the code from next release will use that to check connectivity. Since, we don't blacklist there and will keep it running, we can be fairly certain that the VPS outbound port 25 is blocked. Let's see.

                                    d19dotcaD Offline
                                    d19dotcaD Offline
                                    d19dotca
                                    wrote on last edited by d19dotca
                                    #15

                                    @girish said in Email healthcheck notification: "Relay error: Connect to smtp.live.com timed out":

                                    Thing is since we don't control external services, it's hard to tell why something failed.

                                    For sure, but that's also why double-checking in the event of a failure would be best in order to avoid false-positives instead of one failure generating a ton of alerts. 😉 Logic to show that one failure would then cause Cloudron to perhaps not use it for a few hours would allow for rate-limiting or blacklisting to be resolved in time on its own, and would avoid needing to wait for an entire new release to update the list of SMTP servers as they change, etc. If one fails but one succeeds, we automatically know port 25 outbound is not blocked.

                                    I have now deployed port25check.cloudron.io and the code from next release will use that to check connectivity. Since, we don't blacklist there and will keep it running, we can be fairly certain that the VPS outbound port 25 is blocked.

                                    That's awesome and will add to the troubleshooting ability!! Happy to see that too.

                                    Personally I'd still love to see redundancy in place, as there will certainly be the rare outage on your end too as with other services, but this will at least add a bit more under your control to help lessen the likelihood of false positives which is still a step in the right direction. If I'm banging the drum well past my allotted time on this then that's understandable as it certainly isn't major, just something I'd love to see improved further still. I'll let it go now. 😛

                                    Thanks for everything you do! 🙂

                                    --
                                    Dustin Dauncey
                                    www.d19.ca

                                    girishG 1 Reply Last reply
                                    3
                                    • d19dotcaD d19dotca

                                      @girish said in Email healthcheck notification: "Relay error: Connect to smtp.live.com timed out":

                                      Thing is since we don't control external services, it's hard to tell why something failed.

                                      For sure, but that's also why double-checking in the event of a failure would be best in order to avoid false-positives instead of one failure generating a ton of alerts. 😉 Logic to show that one failure would then cause Cloudron to perhaps not use it for a few hours would allow for rate-limiting or blacklisting to be resolved in time on its own, and would avoid needing to wait for an entire new release to update the list of SMTP servers as they change, etc. If one fails but one succeeds, we automatically know port 25 outbound is not blocked.

                                      I have now deployed port25check.cloudron.io and the code from next release will use that to check connectivity. Since, we don't blacklist there and will keep it running, we can be fairly certain that the VPS outbound port 25 is blocked.

                                      That's awesome and will add to the troubleshooting ability!! Happy to see that too.

                                      Personally I'd still love to see redundancy in place, as there will certainly be the rare outage on your end too as with other services, but this will at least add a bit more under your control to help lessen the likelihood of false positives which is still a step in the right direction. If I'm banging the drum well past my allotted time on this then that's understandable as it certainly isn't major, just something I'd love to see improved further still. I'll let it go now. 😛

                                      Thanks for everything you do! 🙂

                                      girishG Offline
                                      girishG Offline
                                      girish
                                      Staff
                                      wrote on last edited by
                                      #16

                                      @d19dotca no worries 🙂 I think the issue is that let's say we add another external service for dependency and the connection does not work, what should we show in the UI? That outbound port 25 works or it does not? Is it useful to have messages like "We managed to connect to port25check.cloudron.io but not to smtp.live.com" (or any of those combinations). I suspect users will come back with same questions/confusion as they do now. Atleast, the code currently is written with the assumption that connectivity (or not) is a "reliable" indicator of outbound port 25. Maybe I misunderstood what you mean by redundancy.

                                      d19dotcaD timconsidineT RoundHouse1924R 3 Replies Last reply
                                      0
                                      • girishG girish

                                        @d19dotca no worries 🙂 I think the issue is that let's say we add another external service for dependency and the connection does not work, what should we show in the UI? That outbound port 25 works or it does not? Is it useful to have messages like "We managed to connect to port25check.cloudron.io but not to smtp.live.com" (or any of those combinations). I suspect users will come back with same questions/confusion as they do now. Atleast, the code currently is written with the assumption that connectivity (or not) is a "reliable" indicator of outbound port 25. Maybe I misunderstood what you mean by redundancy.

                                        d19dotcaD Offline
                                        d19dotcaD Offline
                                        d19dotca
                                        wrote on last edited by d19dotca
                                        #17

                                        @girish said in Email healthcheck notification: "Relay error: Connect to smtp.live.com timed out":

                                        what should we show in the UI? That outbound port 25 works or it does not? Is it useful to have messages like "We managed to connect to port25check.cloudron.io but not to smtp.live.com" (or any of those combinations).

                                        Good question! I don't actually think anything should be shown in the UI if only one SMTP test fails out of two, as that scenario would imply a false-positive.

                                        So what I envision is the following (hopefully this explains it better):

                                        -- Cloudron runs periodic checks on one of several SMTP servers for testing purposes.
                                        ---- If the check succeeds, then wait for next check 30 minute interval.
                                        ---- If the check fails, then run one more test right away (or even 60 seconds later to avoid network blips on the VPS) to a second/different SMTP server to validate the finding.
                                        ------ If the second SMTP server succeeds, then ignore the initial failure and mark as successful. Possibly make a log entry, but nothing needed in the UI.
                                        ------ If the second SMTP server fails, log the errors with more details (mention both SMTP servers that were checked and failed). In the UI, show a message similar to Relay error: SMTP connection tests failed. Check if port 25 (outbound) is blocked. View the Cloudron logs for more details.

                                        I don't really think the exact servers need to be listed in the UI if they're already in the logs. If both SMTP servers fail, it'll be with much higher confidence that port 25 outbound is blocked and that should be the admin's focus. If they can confirm that it's not blocked, then they can use the logs to get more details and run additional tests from their server.

                                        That's how I picture it anyways. 🙂 I see that as helping avoid false-positives while also providing enough details in the logs for when an issue is actually detected (and more confidently in that case too). The UI can be a simplified in a small way to refer the admin to their logs for further details while still suggesting that port 25 may be blocked.

                                        Side note: I just checked and the "troubleshooting" hyperlink at the bottom of the alert message overall leads to an incorrect spot. May need to be updated to perhaps https://docs.cloudron.io/email/#outbound-smtp or something like that.

                                        --
                                        Dustin Dauncey
                                        www.d19.ca

                                        1 Reply Last reply
                                        2
                                        • girishG girish

                                          @d19dotca no worries 🙂 I think the issue is that let's say we add another external service for dependency and the connection does not work, what should we show in the UI? That outbound port 25 works or it does not? Is it useful to have messages like "We managed to connect to port25check.cloudron.io but not to smtp.live.com" (or any of those combinations). I suspect users will come back with same questions/confusion as they do now. Atleast, the code currently is written with the assumption that connectivity (or not) is a "reliable" indicator of outbound port 25. Maybe I misunderstood what you mean by redundancy.

                                          timconsidineT Offline
                                          timconsidineT Offline
                                          timconsidine
                                          App Dev
                                          wrote on last edited by
                                          #18

                                          @girish are we supposed to be doing anything with our email dashboards ? I have a number of domains which are shown as red, but checking some of them in that domain panel status, all shows as green.
                                          Not too worried, just not sure what we should be doing.

                                          girishG 1 Reply Last reply
                                          0
                                          • timconsidineT timconsidine

                                            @girish are we supposed to be doing anything with our email dashboards ? I have a number of domains which are shown as red, but checking some of them in that domain panel status, all shows as green.
                                            Not too worried, just not sure what we should be doing.

                                            girishG Offline
                                            girishG Offline
                                            girish
                                            Staff
                                            wrote on last edited by
                                            #19

                                            @timconsidine yes, correct, nothing to worry here. It will be fixed in the upcoming update.

                                            1 Reply Last reply
                                            2
                                            • girishG girish

                                              @d19dotca no worries 🙂 I think the issue is that let's say we add another external service for dependency and the connection does not work, what should we show in the UI? That outbound port 25 works or it does not? Is it useful to have messages like "We managed to connect to port25check.cloudron.io but not to smtp.live.com" (or any of those combinations). I suspect users will come back with same questions/confusion as they do now. Atleast, the code currently is written with the assumption that connectivity (or not) is a "reliable" indicator of outbound port 25. Maybe I misunderstood what you mean by redundancy.

                                              RoundHouse1924R Offline
                                              RoundHouse1924R Offline
                                              RoundHouse1924
                                              wrote on last edited by
                                              #20

                                              @girish
                                              https://port25check.cloudron.io/ produces an error, as the cert is only for api.cloudron.io

                                              girishG 1 Reply Last reply
                                              0
                                              Reply
                                              • Reply as topic
                                              Log in to reply
                                              • Oldest to Newest
                                              • Newest to Oldest
                                              • Most Votes


                                                • Login

                                                • Don't have an account? Register

                                                • Login or register to search.
                                                • First post
                                                  Last post
                                                0
                                                • Categories
                                                • Recent
                                                • Tags
                                                • Popular
                                                • Bookmarks
                                                • Search