Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps | Demo | Docs | Install
  1. Cloudron Forum
  2. Support
  3. Email healthcheck notification: "Relay error: Connect to smtp.live.com timed out"

Email healthcheck notification: "Relay error: Connect to smtp.live.com timed out"

Scheduled Pinned Locked Moved Solved Support
emailoutbound
33 Posts 11 Posters 6.4k Views 9 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • d19dotcaD Offline
    d19dotcaD Offline
    d19dotca
    wrote on last edited by d19dotca
    #6

    @timconsidine & @ccfu - I don't think this is a DNS error at all (if it was I'd expect different log entries). This is just a simple timeout. It knows where smtp.live.com is and tries to connect but it times out (in other words smtp.live.com isn't responding within a specified time). I'm pretty sure the issue is on Microsoft's side in this case.

    @archos - as long as it's intermittent for you, then yes it should be nothing to worry about. It's likely the same checks to smtp.live.com as I'm experiencing too.

    @staff - I think it'd be great if we could have some redundancy built-in. This isn't the first time this has happened to my knowledge. Sometimes free SMTP services have issues. I think it'd be great to change the logic to be "Connect to SMTP A and see if it succeeds. If Connect to SMTP A fails, then attempt one more Connect but this time to SMTP B to verify the failure. If SMTP B is a success, mark as success. If both SMTP A and SMTP B are failures, mark as failure."

    --
    Dustin Dauncey
    www.d19.ca

    1 Reply Last reply
    3
    • d19dotcaD Offline
      d19dotcaD Offline
      d19dotca
      wrote on last edited by d19dotca
      #7

      FYI - MXToolbox reports the same issue: https://mxtoolbox.com/SuperTool.aspx?action=smtp%3Asmtp.live.com&run=toolpage

      Connecting to 204.79.197.212
      1/30/2022 10:56:31 AM Connection attempt #1 - Unable to connect after 15 seconds. [15.02 sec]
      
      LookupServer 15082ms
      

      --
      Dustin Dauncey
      www.d19.ca

      1 Reply Last reply
      3
      • d19dotcaD Offline
        d19dotcaD Offline
        d19dotca
        wrote on last edited by d19dotca
        #8

        Perhaps the short list here needs to be updated with a few more too: https://git.cloudron.io/cloudron/box/-/blob/master/src/mail.js#L171-176

        In doing some testing, I'd suggest adding these three to the list for checks too...

        smtp.mail.yahoo.com (Report: https://mxtoolbox.com/SuperTool.aspx?action=smtp%3Asmtp.mail.yahoo.com&run=toolpage)

        smtp.aol.com (Report: https://mxtoolbox.com/SuperTool.aspx?action=smtp%3Asmtp.aol.com&run=toolpage)

        mail.gmx.com (Report: https://mxtoolbox.com/SuperTool.aspx?action=smtp%3Amail.gmx.com&run=toolpage)

        --
        Dustin Dauncey
        www.d19.ca

        1 Reply Last reply
        2
        • d19dotcaD d19dotca

          Hello,

          I noticed there was an incident with the smtp.live.com about 3 hours ago according to the Cloudron notifications page. It was only for five domains, but they all shared the same SMTP endpoint so I suspect there was a blip on Microsoft's side. Just an FYI.

          Not concerned because I know it's a false alert, but it did get me thinking... would it not be better to perhaps try one more SMTP destination if the first one reports a failure by the healthcheck? That would likely avoid false-positives like this one.

          a46ce2d0-1c9b-4557-bb53-4b7ca0e6cfff-image.png

          P Offline
          P Offline
          p44
          translator
          wrote on last edited by
          #9

          @d19dotca Me to I had same problem

          1 Reply Last reply
          0
          • nebulonN Offline
            nebulonN Offline
            nebulon
            Staff
            wrote on last edited by
            #10

            Just to rule out one point. Those smtp servers will quickly rate-limit, so if you refresh the status check from your Cloudron dashboard, after already a few attempts within a short period of time, they will fail as the IP gets temporarily blocked.

            1 Reply Last reply
            1
            • jdaviescoatesJ Offline
              jdaviescoatesJ Offline
              jdaviescoates
              wrote on last edited by
              #11

              Just adding to the chorus of people noticing this happen to them.

              I just spotted this notification from 12 hours ago:

              Relay error: Connect to smtp.live.com timed out. Check if port 25 (outbound) is blocked
              

              I wonder if the timeout settings on smtp.live.com have recently changed or something to make it time out quicker.

              I use Cloudron with Gandi & Hetzner

              1 Reply Last reply
              0
              • girishG Offline
                girishG Offline
                girish
                Staff
                wrote on last edited by girish
                #12

                Indeed, smtp.live.com is apparently gone or does not respond to port 25 anymore.

                Some background: Cloudron tries to connect to some well know servers on port 25 for diagnostic purposes. It uses this to check if outbound port 25 is allowed on the VPS. It's not really used for anything else. The list of servers comes from https://git.cloudron.io/cloudron/box/-/blob/master/src/mail.js#L172

                The warning can be ignored, for the moment. I have removed it in the next release.

                I think we will try to create a smtpdiag.cloudron.io or something to test port 25 reach ability.

                d19dotcaD 1 Reply Last reply
                4
                • girishG girish

                  Indeed, smtp.live.com is apparently gone or does not respond to port 25 anymore.

                  Some background: Cloudron tries to connect to some well know servers on port 25 for diagnostic purposes. It uses this to check if outbound port 25 is allowed on the VPS. It's not really used for anything else. The list of servers comes from https://git.cloudron.io/cloudron/box/-/blob/master/src/mail.js#L172

                  The warning can be ignored, for the moment. I have removed it in the next release.

                  I think we will try to create a smtpdiag.cloudron.io or something to test port 25 reach ability.

                  d19dotcaD Offline
                  d19dotcaD Offline
                  d19dotca
                  wrote on last edited by d19dotca
                  #13

                  @girish Hi Girish! I think it's a good idea to add in a Cloudron-controlled SMTP server for testing purposes. I still would suggest we have a two-check failure workflow to avoid false-positives like this, as that would be best practice in similar scenarios outside of Cloudron (like liveness probes in Kubernetes which will generally work with multiple failure points to avoid false-positives). If it's too much work though I understand, I just still think it'd be really helpful for these types of scenarios and would so I'd love to see health checks done in such a way to avoid false-positives like this kind of issue.

                  --
                  Dustin Dauncey
                  www.d19.ca

                  girishG 1 Reply Last reply
                  2
                  • d19dotcaD d19dotca

                    @girish Hi Girish! I think it's a good idea to add in a Cloudron-controlled SMTP server for testing purposes. I still would suggest we have a two-check failure workflow to avoid false-positives like this, as that would be best practice in similar scenarios outside of Cloudron (like liveness probes in Kubernetes which will generally work with multiple failure points to avoid false-positives). If it's too much work though I understand, I just still think it'd be really helpful for these types of scenarios and would so I'd love to see health checks done in such a way to avoid false-positives like this kind of issue.

                    girishG Offline
                    girishG Offline
                    girish
                    Staff
                    wrote on last edited by girish
                    #14

                    @d19dotca Thing is since we don't control external services, it's hard to tell why something failed. Did they blacklist the server IP? Was it because outbound port 25 is blocked? Was it because the service died temporarily or even permanently (like the case for this post).

                    Atleast, when I wrote the code, I didn't expect these services to go away 🙂 By now, all but 2 services remain. We started with around 5 services, 5 years ago. Anyway, I have now deployed port25check.cloudron.io and the code from next release will use that to check connectivity. Since, we don't blacklist there and will keep it running, we can be fairly certain that the VPS outbound port 25 is blocked. Let's see.

                    d19dotcaD 1 Reply Last reply
                    4
                    • girishG girish

                      @d19dotca Thing is since we don't control external services, it's hard to tell why something failed. Did they blacklist the server IP? Was it because outbound port 25 is blocked? Was it because the service died temporarily or even permanently (like the case for this post).

                      Atleast, when I wrote the code, I didn't expect these services to go away 🙂 By now, all but 2 services remain. We started with around 5 services, 5 years ago. Anyway, I have now deployed port25check.cloudron.io and the code from next release will use that to check connectivity. Since, we don't blacklist there and will keep it running, we can be fairly certain that the VPS outbound port 25 is blocked. Let's see.

                      d19dotcaD Offline
                      d19dotcaD Offline
                      d19dotca
                      wrote on last edited by d19dotca
                      #15

                      @girish said in Email healthcheck notification: "Relay error: Connect to smtp.live.com timed out":

                      Thing is since we don't control external services, it's hard to tell why something failed.

                      For sure, but that's also why double-checking in the event of a failure would be best in order to avoid false-positives instead of one failure generating a ton of alerts. 😉 Logic to show that one failure would then cause Cloudron to perhaps not use it for a few hours would allow for rate-limiting or blacklisting to be resolved in time on its own, and would avoid needing to wait for an entire new release to update the list of SMTP servers as they change, etc. If one fails but one succeeds, we automatically know port 25 outbound is not blocked.

                      I have now deployed port25check.cloudron.io and the code from next release will use that to check connectivity. Since, we don't blacklist there and will keep it running, we can be fairly certain that the VPS outbound port 25 is blocked.

                      That's awesome and will add to the troubleshooting ability!! Happy to see that too.

                      Personally I'd still love to see redundancy in place, as there will certainly be the rare outage on your end too as with other services, but this will at least add a bit more under your control to help lessen the likelihood of false positives which is still a step in the right direction. If I'm banging the drum well past my allotted time on this then that's understandable as it certainly isn't major, just something I'd love to see improved further still. I'll let it go now. 😛

                      Thanks for everything you do! 🙂

                      --
                      Dustin Dauncey
                      www.d19.ca

                      girishG 1 Reply Last reply
                      3
                      • d19dotcaD d19dotca

                        @girish said in Email healthcheck notification: "Relay error: Connect to smtp.live.com timed out":

                        Thing is since we don't control external services, it's hard to tell why something failed.

                        For sure, but that's also why double-checking in the event of a failure would be best in order to avoid false-positives instead of one failure generating a ton of alerts. 😉 Logic to show that one failure would then cause Cloudron to perhaps not use it for a few hours would allow for rate-limiting or blacklisting to be resolved in time on its own, and would avoid needing to wait for an entire new release to update the list of SMTP servers as they change, etc. If one fails but one succeeds, we automatically know port 25 outbound is not blocked.

                        I have now deployed port25check.cloudron.io and the code from next release will use that to check connectivity. Since, we don't blacklist there and will keep it running, we can be fairly certain that the VPS outbound port 25 is blocked.

                        That's awesome and will add to the troubleshooting ability!! Happy to see that too.

                        Personally I'd still love to see redundancy in place, as there will certainly be the rare outage on your end too as with other services, but this will at least add a bit more under your control to help lessen the likelihood of false positives which is still a step in the right direction. If I'm banging the drum well past my allotted time on this then that's understandable as it certainly isn't major, just something I'd love to see improved further still. I'll let it go now. 😛

                        Thanks for everything you do! 🙂

                        girishG Offline
                        girishG Offline
                        girish
                        Staff
                        wrote on last edited by
                        #16

                        @d19dotca no worries 🙂 I think the issue is that let's say we add another external service for dependency and the connection does not work, what should we show in the UI? That outbound port 25 works or it does not? Is it useful to have messages like "We managed to connect to port25check.cloudron.io but not to smtp.live.com" (or any of those combinations). I suspect users will come back with same questions/confusion as they do now. Atleast, the code currently is written with the assumption that connectivity (or not) is a "reliable" indicator of outbound port 25. Maybe I misunderstood what you mean by redundancy.

                        d19dotcaD timconsidineT RoundHouse1924R 3 Replies Last reply
                        0
                        • girishG girish

                          @d19dotca no worries 🙂 I think the issue is that let's say we add another external service for dependency and the connection does not work, what should we show in the UI? That outbound port 25 works or it does not? Is it useful to have messages like "We managed to connect to port25check.cloudron.io but not to smtp.live.com" (or any of those combinations). I suspect users will come back with same questions/confusion as they do now. Atleast, the code currently is written with the assumption that connectivity (or not) is a "reliable" indicator of outbound port 25. Maybe I misunderstood what you mean by redundancy.

                          d19dotcaD Offline
                          d19dotcaD Offline
                          d19dotca
                          wrote on last edited by d19dotca
                          #17

                          @girish said in Email healthcheck notification: "Relay error: Connect to smtp.live.com timed out":

                          what should we show in the UI? That outbound port 25 works or it does not? Is it useful to have messages like "We managed to connect to port25check.cloudron.io but not to smtp.live.com" (or any of those combinations).

                          Good question! I don't actually think anything should be shown in the UI if only one SMTP test fails out of two, as that scenario would imply a false-positive.

                          So what I envision is the following (hopefully this explains it better):

                          -- Cloudron runs periodic checks on one of several SMTP servers for testing purposes.
                          ---- If the check succeeds, then wait for next check 30 minute interval.
                          ---- If the check fails, then run one more test right away (or even 60 seconds later to avoid network blips on the VPS) to a second/different SMTP server to validate the finding.
                          ------ If the second SMTP server succeeds, then ignore the initial failure and mark as successful. Possibly make a log entry, but nothing needed in the UI.
                          ------ If the second SMTP server fails, log the errors with more details (mention both SMTP servers that were checked and failed). In the UI, show a message similar to Relay error: SMTP connection tests failed. Check if port 25 (outbound) is blocked. View the Cloudron logs for more details.

                          I don't really think the exact servers need to be listed in the UI if they're already in the logs. If both SMTP servers fail, it'll be with much higher confidence that port 25 outbound is blocked and that should be the admin's focus. If they can confirm that it's not blocked, then they can use the logs to get more details and run additional tests from their server.

                          That's how I picture it anyways. 🙂 I see that as helping avoid false-positives while also providing enough details in the logs for when an issue is actually detected (and more confidently in that case too). The UI can be a simplified in a small way to refer the admin to their logs for further details while still suggesting that port 25 may be blocked.

                          Side note: I just checked and the "troubleshooting" hyperlink at the bottom of the alert message overall leads to an incorrect spot. May need to be updated to perhaps https://docs.cloudron.io/email/#outbound-smtp or something like that.

                          --
                          Dustin Dauncey
                          www.d19.ca

                          1 Reply Last reply
                          2
                          • girishG girish

                            @d19dotca no worries 🙂 I think the issue is that let's say we add another external service for dependency and the connection does not work, what should we show in the UI? That outbound port 25 works or it does not? Is it useful to have messages like "We managed to connect to port25check.cloudron.io but not to smtp.live.com" (or any of those combinations). I suspect users will come back with same questions/confusion as they do now. Atleast, the code currently is written with the assumption that connectivity (or not) is a "reliable" indicator of outbound port 25. Maybe I misunderstood what you mean by redundancy.

                            timconsidineT Offline
                            timconsidineT Offline
                            timconsidine
                            App Dev
                            wrote on last edited by
                            #18

                            @girish are we supposed to be doing anything with our email dashboards ? I have a number of domains which are shown as red, but checking some of them in that domain panel status, all shows as green.
                            Not too worried, just not sure what we should be doing.

                            girishG 1 Reply Last reply
                            0
                            • timconsidineT timconsidine

                              @girish are we supposed to be doing anything with our email dashboards ? I have a number of domains which are shown as red, but checking some of them in that domain panel status, all shows as green.
                              Not too worried, just not sure what we should be doing.

                              girishG Offline
                              girishG Offline
                              girish
                              Staff
                              wrote on last edited by
                              #19

                              @timconsidine yes, correct, nothing to worry here. It will be fixed in the upcoming update.

                              1 Reply Last reply
                              2
                              • girishG girish

                                @d19dotca no worries 🙂 I think the issue is that let's say we add another external service for dependency and the connection does not work, what should we show in the UI? That outbound port 25 works or it does not? Is it useful to have messages like "We managed to connect to port25check.cloudron.io but not to smtp.live.com" (or any of those combinations). I suspect users will come back with same questions/confusion as they do now. Atleast, the code currently is written with the assumption that connectivity (or not) is a "reliable" indicator of outbound port 25. Maybe I misunderstood what you mean by redundancy.

                                RoundHouse1924R Offline
                                RoundHouse1924R Offline
                                RoundHouse1924
                                wrote on last edited by
                                #20

                                @girish
                                https://port25check.cloudron.io/ produces an error, as the cert is only for api.cloudron.io

                                girishG 1 Reply Last reply
                                0
                                • RoundHouse1924R RoundHouse1924

                                  @girish
                                  https://port25check.cloudron.io/ produces an error, as the cert is only for api.cloudron.io

                                  girishG Offline
                                  girishG Offline
                                  girish
                                  Staff
                                  wrote on last edited by
                                  #21

                                  @roundhouse1924 that's expected. it's not a website and not meant to be connected via http/https. It's only on port 25. you can try telnet port25check.cloudron.io 25 .

                                  1 Reply Last reply
                                  4
                                  • WiseMetalheadW WiseMetalhead referenced this topic on
                                  • JOduMonTJ Offline
                                    JOduMonTJ Offline
                                    JOduMonT
                                    wrote on last edited by JOduMonT
                                    #22

                                    Is anybody try one of these delisting process ??

                                    https://sender.office.com

                                    This one seams to be specifically for live.com and ...

                                    https://support.microsoft.com/en-us/supportrequestform/8ad563e3-288e-2a61-8122-3ba03d6b8d75

                                    https://sendersupport.olc.protection.outlook.com/snds/index.aspx

                                    I did the 2 first one, the first is pretty quick you receive an email and validate if the IP of your server is in their internal block list,
                                    the second is a form is a little bit more elaborate they ask for error message and if you have a website related to that domain.

                                    1 Reply Last reply
                                    1
                                    • J Offline
                                      J Offline
                                      jayonrails
                                      translator
                                      wrote on last edited by
                                      #23

                                      Hi,

                                      I have the same problem on my Cloudron right now:

                                      Relay error: Connect to port25check.cloudron.io timed out. Check if port 25 (outbound) is blocked

                                      Port 25 is not blocked.

                                      1 Reply Last reply
                                      0
                                      • nebulonN Offline
                                        nebulonN Offline
                                        nebulon
                                        Staff
                                        wrote on last edited by
                                        #24

                                        Can you run telnet port25check.cloudron.io 25 via SSH on your server to see if it works?

                                        1 Reply Last reply
                                        0
                                        • J Offline
                                          J Offline
                                          jayonrails
                                          translator
                                          wrote on last edited by
                                          #25

                                          Hi,

                                          it does work on my server:

                                          telnet port25check.cloudron.io 25
                                          Trying 165.227.67.76...
                                          Connected to api.cloudron.io.
                                          Escape character is '^]'.
                                          works
                                          Connection closed by foreign host.
                                          

                                          Is it important? I am using Postmark as a mail relay on all my outgoing mails, so I think it is not neccesary to have port 25 open in general, because it is never used?

                                          1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Don't have an account? Register

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • Bookmarks
                                          • Search