Network statistics report container instead of host with node_exporter
-
Hello! Just wanted to check if this is by design (i.e. if security issue exposing host network stats) or a potential bug.
I've determined after load testing that node_exporter running in the Prometheus container is able to report CPU, Memory and other core performance metrics of the host sytem just fine, however network stats such as node_network_receive_bytes_total and node_network_transmit_bytes_total only report data from the container.
I tested in 2 ways, first by running a download from another container running on the same host (same cloudron). My VPS host (hetzner) graphs reported roughly 1.5 MB/s download, while prometheus reported back only in the low single KB/s. The second test is a correlation between letting my connected Grafana dashboard idle, then checking it after a long period of time, which caues a sudden spike in reported network activity, leading me to believe prometheus is only reporting activity from its container.
Could this be a potential bug, or is this done by design (If for example mapping the host's /proc and/or /sys and/or other needed directories read-only to the container were to pose a security risk)?
I've seen in other posts that mounting these directories from the host might resolve the network reporting issue when running in a container, however I don't know enough about docker at this point to know whether this would indeed work, or whether it might pose risks. Here are some examples that I've seen elsewhere:
docker run -d -v /proc/:/host/proc/:ro -v /sys/:/host/sys/:ro
Then in options when we run node_exporter would consist of:
node_exporter --path.sysfs=/host/sys --path.procfs=/host/proc
This is only cobbled together from other threads I've seen elsewhere, so I have no idea whether it would work, or whether it is a risk or not, or if indeed this might even disrupt metrics that already are working such as CPU metrics, etc.
Hope this all makes sense. Thanks!
-
Additional info: The official node_exporter documentation also has good information regarding running node_exporter within a Docker container.
-
@lcd_official this is by design. Apps are run containerized and do not have access to the host system. I am not sure how the memory usage maps though, maybe your host usage and container usage were very similar when you tested it? But the only memory and CPU visible to the container is what has been allocated to it and not the host memory and CPU.
-
@girish Hmm. That is strange. When I ran a stress test on my host system, Prometheus reported back 100% of one core utilized just a I expected, even though I was not running anything within the container. Same results when I ran a memory test on the host using stress-ng. I've also correlated the CPU usage with my VPS host's graphs on occasion when other containers are eating up small amounts of CPU.
I suppose it is possible that what I'm seeing is the host system stealing resources from the Prometheus container, but I'm not sure if that would appear as actual resource usage from within the container...
I suppose more testing is in order... Thanks for your response.
-
lcd_officialreplied to girish on Nov 16, 2023, 9:46 PM last edited by lcd_official Nov 16, 2023, 9:49 PM
@girish Hi Girish! So I could totally be misreading this, but after some testing, it really does look like CPU and Mem stats do come through from the underlying host... Here is what I tried to confirm this. Let me know if I am way off base and I'm missing something crucial.
First, just base observations:
- Prometheus reports back 4GB of total system memory, much more than the 512MB I have allocated for the Prometheus app container itself.
- Most of the time, the CPU utilization of my host VPS hovers around 2-3% as reported by Prometheus data, whereas the (according to Top) CPU utilization of the container is 1-2.5%
- Graphs on my VPS provider's control panel roughly correlate to the CPU utilization observed in Prometheus data.
Second, based on stress-testing
- Using Stress to stress-test individual CPUs or all CPUs to 100% in the host system correctly registers corresponding observations in Prometheus data.
- Using Stress -vm within the Prometheus app container to take up 256MB of RAM correctly registers a ~256MB bump over existing baseline memory observations from Prometheus data, and not 50+% utilization as would be expected if the container stats were being reported (container has 512MB allocated to it).
- Using Stress -vm out on the host system to chew up 2 GB of memory correctly shows a ~2GB increase in memory utilization reported by Prometheus data.
So far CPU, Mem, and Network stats are all that I have played with, and Network is the only one so far that I can confirm positively does only report the container stats instead of the host system.
Not having correct network stats isn't a deal-breaker for me (though it would be nice, but not if doing so would put the system at risk of a container breach). I thought I'd at least report my findings thus far for anyone else that stumbles upon this.
Anyway, even if I am wrong, I have had a real great time playing with this! Thanks for making it all possible!
3/5