Could pull, but now cannot (private reg)
-
Hi all,
I thought I'd explore putting a Grocy container together. Groovy.
I can push/pull (and even DELETE via
curl
) from this registry, which is hosted on my cloudron machine (docker.host.us
). From my dev machine, it would appear that the docker registry is ticking over just fine.When I build the Grocy image and push it to the registry, I can find it there via the web UI and via
curl
. However, when I try and do a cloudron install, I get the following error in the installation logs for the app:Feb 05 17:06:00 box:docker downloadImage docker.host.us/jadudm/grocy:3.1.3 Feb 05 17:06:00 box:docker downloadImage: pulling image. attempt 1 Feb 05 17:06:00 box:docker pullImage: will pull docker.host.us/jadudm/grocy:3.1.3. auth: yes Feb 05 17:06:21 box:docker downloadImage: pulling image. attempt 2 Feb 05 17:06:21 box:docker pullImage: will pull docker.host.us/jadudm/grocy:3.1.3. auth: yes Feb 05 17:06:41 box:docker downloadImage: pulling image. attempt 3
and it ultimately fails after 10 retries. There's then... an attempt to standup extensions (based on the manifest, I imagine), and then this:
Feb 05 17:09:37 box:apptask run: app error for state pending_install: BoxError: (HTTP code 404) no such container - No such image: docker.host.us/jadudm/grocy:3.1.3 at createSubcontainer (/home/yellowtent/box/src/docker.js:397:28) at processTicksAndRejections (internal/process/task_queues.js:95:5) at async Object.createContainer (/home/yellowtent/box/src/docker.js:403:12) at async createContainer (/home/yellowtent/box/src/apptask.js:88:23) at async install (/home/yellowtent/box/src/apptask.js:385:5) { reason: 'Docker Error', details: {}, nestedError: Error: (HTTP code 404) no such container - No such image: docker.host.us/jadudm/grocy:3.1.3 at /home/yellowtent/box/node_modules/docker-modem/lib/modem.js:315:17 at getCause (/home/yellowtent/box/node_modules/docker-modem/lib/modem.js:345:7) at Modem.buildPayload (/home/yellowtent/box/node_modules/docker-modem/lib/modem.js:314:5) at IncomingMessage.<anonymous> (/home/yellowtent/box/node_modules/docker-modem/lib/modem.js:286:14) at IncomingMessage.emit (events.js:412:35) at endReadableNT (internal/streams/readable.js:1317:12) at processTicksAndRejections (internal/process/task_queues.js:82:21) { reason: 'no such container', statusCode: 404, json: { message: 'No such image: docker.host.us/jadudm/grocy:3.1.3' } } Feb 05 17:09:37 box:taskworker Task took 219.043 seconds ... cut more nested errors of the same flavor ...
What is odd to me is that I have had my private (locally hosted) container registry running for some time without any issues. I've installed other images from this registry. I don't think I'm out of disk or memory, but... I find that the stacked RAM diagram to be not helpful for checking the status of the cloudron host; I've killed a bunch of things just for fun/just in case RAM is an issue. It looks like I have many gigabytes of working space everywhere I need disk.
Is this an auth problem with my local registry (and why would it happen now)? The image comes down fine on my dev machine; could this be a DNS resolution problem on the Cloudron host? (Which... it's on the same network as the dev machine, so that feels odd.) I don't think it's a Cloudflare proxy issue, because (again) I've used this setup for other applications. That said, something probably changed, and that's the mystery of the moment.
Anyone have thoughts about how to go about debugging this? (I don't know if this is an "app development" question, but it is using the cloudron tools to attempt to locally develop an app, hence why I asked here.)
-
@robi I do.
I think the error is somewhere in Cloudron.
Feb 06 10:21:35 box:docker downloadImage: pulling image. attempt 10 Feb 06 10:21:35 box:docker pullImage: will pull docker.host.us/jadudm/grocy:1.0.0. auth: yes
I'll claim that the better behavior would be for Cloudron to fail when it cannot pull an image. Instead, it is trying to set up the extensions and run the container when the container doesn't exist.
Given that it loops through 10 (failed) attempts to pull the image, there's something going wrong with the image pull. There's nothing in the existing logs that gives me enough information to figure out why I'm getting a 404 on the image. (I can pull the image from another machine, and both the dev machine and the Cloudron host are effectively on the same network.)
I'll see if there's a way I can... I don't know. I feel like I'm missing something obvious, but I'd like a bit more info from the Cloudron logs in order to debug this.
EDIT: It looks like
https://git.cloudron.io/cloudron/box/-/blob/master/src/apptask.js#L354
is where
downloadImage
is called, but I'm unclear where errors that are generated from that process are handled. It looks like a failed image download does just fall through to a continued attempt to run the container. -
When I attempt to update my Docker registry credentials in the Settings, I get the following error:
(HTTP code 500) server error - Get https://docker.host.us/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
From the command line on my dev machine, I can
curl -X GET -u jadudm:ASDFASDF https://docker.host.us/v2/_catalog
and get a list of the images on the host. Hence why I am confused as to what may have changed under the hood that my private registry no longer works from Cloudron as an image installation source (when it did not that long ago).
EDIT: In
docker.js
,https://git.cloudron.io/cloudron/box/-/blob/master/src/docker.js#L60
The comment says what it says:
const [error] = await safe(gConnection.checkAuth(config)); // this returns a 500 even for auth errors
I have the correct username/password in my Registry conf. It never changed... but it stopped working at some point in Cloudron. It works just fine elsewhere. So, I don't know if this is a network issue, or an auth issue, or what... because it looks like
checkAuth
doesn't give me a lot of info to debug with? -
@girish I started using the DockerHub repo, and will circle back around to try and re-install the local Docker Registry. So, the answer is "sorta no." But I found a path to keep moving.
I'm not sure why, given no changes on my part, that the local registry decided to stop working for app install by Cloudron. (All of these services are on the same host...) There were not disk/RAM issues that I could see, so something was going on, but I couldn't get enough info to debug beyond the fact that things retried-and-failed.
-
@girish, Even with a clean re-install of the docker registry, I can't convince my Cloudron instance to authenticate against it. I get 500 errors from it consistently (but can hit the
v2
URL from a browser just fine?).I'm happy to do whatever I can to help with debugging. Perhaps this is one of those "it works everywhere else, but not on my machine" kind of things.
-
-
@jadudm This might be an aggressive firewall issue.
I'll update when I know for sure. However, I suspect I'm not letting :5000 through, which would allow web clients to browse the registry, but not allow me to autheticate/push from the command line.
-
@jadudm Sorry, @girish . This was "user error." I do not know how this local registry ever worked, because I would swear the ports were not open to allow it to work.
However, this is a case where
5000
was not open to me. So, I could hit the/v2/
endpoint via HTTPS, and could locally store my login credentials viadocker login
, and I could browse the registry... butpush
andpull
would fail.Sill... I don't know how this worked in the first place, given that I don't think I had firewall rules open for it to succeed.
If this was a ticket, I'd mark it closed.
-
@jadudm Nope.
If the machine I'm working on is
dev
, and the registry is hosted on my cloudron asregistry.cloudron.example
I can now:
- Browse to my private registry.
- Authenticate and push to it from the command line from
dev
. - I cannot, via the Cloudron admin panel, set up
registry.cloudron.example
as my private registry.
I consistently get 500 errors for the
registry.cloudron.example/v2/
endpoint. However, if I visit it in a browser ondev
, it is easily reachable.I am up to date on updates, I have restarted my cloudron recently, I have restarted the registry... I've even torn down and rebuilt the registry. I'm unclear why I can't set up my private registry.
Is there an invisible config file somewhere deep in the heart of Cloudron that I could go inspect the old fashioned way, and make sure some old data isn't being held on to that's... bad?
-
And, for myself, for future reference:
Invalid serverAddress: (HTTP code 500) server error - Get "https://reg.cloudron.example/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
I tried manually changing the auth backend to "htpasswd" per documentation on the Docker site, and created a valid htpass file... but that doesn't change anything.
I have no idea why I cannot configure my cloudron instance to use a private registry.
-
@robi Many thanks. I see where you're coming from. What confuses me... is that everything is on the same machine.
Which is why I'm
(O_o)
, and think something really simple is going on.- I can hit the registry from the web (meaning port
443
gets through the firewall from the outside world), and authenticate as a Cloudron-based user. - I can hit the registry from the command line, and am authenticated as the same Cloudron user. This means port
5000
is getting through. - I cannot convince the Cloudron admin interface to let me authenticate as that user. The Cloudron admin interface (which is on the same machine as the registry) will not authenticate; it gets the
500
error.
I'm thoroughly confused, and am uncertain where to find the debug information that I need to solve this particular problem. Hence why I'm wondering if there's some kind of config that might have... "hung around" from previous attempts?
- I can hit the registry from the web (meaning port
-
A new not-what-I-was-looking-for-but-it-works-solution:
- I'm mirroring the Dockerfile/project from my local Gitea to Gitlab.
- I've set up Gitlab as my registry for the project.
(The project in this case is a Grocy package for Cloudron.)
This arguably gives me a private-enough repo for what I'm exploring, but I was hoping to be able to entirely leverage my Cloudron for this.
I'm just plain confused why, at some point in the past, my locally-hosted registry worked, and now, it does not. This is why I believe there is "crufty" config laying around, somewhere, and things therefore don't work.
(But... this makes almost no sense, because I have uninstalled and reinstalled the registry, so it is not clear how any state could persist.)
Anyway. This remains a mystery, and I would need some explicit guidance as to where I should dig around in the backend of Cloudron to figure out why things are not behaving. (It could also be that there's still some port that I don't have open, or... )
-