Awesome! I'm glad Actual is published. I'll make an account on git.cloudron.io shortly, and contribute some further tests. Thank you for accepting my work!
Sydney
Posts
-
Actual - Self Hosted, Privacy Friendly Financial Planning System -
Plausible (Analytics Platform)Thank you for the feedback. I am very excited for Plausible to be a part of the Cloudron app store, and I wish you the best of luck in packaging Clickbase for Cloudron.
For backing up and restoring the Clickbase database, you may reference the
clickbase-backup.sh
andclickbase-restore.sh
scripts in my repository.Should you use any of my code in the Clickhouse addon, I am happy to assign copyright to you - just send me a release form.
In the meantime, for those who are not willing to wait for @girish , feel free to use the code in my repository - it provisions a fully working installation of Plausible with a local Clickhouse db.
-
Guide for Adding Custom Nginx Directives to a Cloudron ApplicationIntroduction
Cloudron applications are reverse-proxied using Nginx, a high-performance proxy and web server. Cloudron manages the Nginx configuration of applications themselves, but there may be cases where the end user needs to add custom configuration directives on an ad-hoc basis for a specific application. Attempting to modify the application nginx config directly at
/etc/nginx/applications/[app-id]/[app-domain].conf
is not feasible, since these configuration files are ephemeral, and are re-written upon restarts and Cloudron upgrades.However, I developed a workaround that allows users to add Nginx config snippets to Cloudron apps in a way that is persistent, and robust against being over-written. This method uses Nginx's
include
directives, a custom bash script, and a cronjob. This guide presents an overview of this workaround, and shares the custom bash script that I use.Disclaimer
Before proceeding, be advised that Cloudron does not support ad-hoc user modifications to application Nginx configuration. This method is a workaround, and could result in broken reverse-proxy configurations should the user create an invalid Nginx configuration file.
Method
Nginx allows us to embed configuration file snippets using the
include
directive. We'll create a file at/etc/nginx/custom-nginx-directives.conf
containing what we wish to include, and then include this within the application nginx config usinginclude custom-nginx-directives.conf;
. Note thatinclude
allows us to specify either relative, or absolute pathing. On Ubuntu Linux, relative paths are searched for starting from/etc/nginx
.Once you have defined that file, we will use a bash script to add the line
include custom-nginx-directives.conf;
to the application nginx config. On Cloudron, the per-application nginx configuration are templated using EJS from cloudron/box/src/nginxconfig.ejs. The EJS template is complex and has many conditionals, but it has the following invariant structure (i.e. no matter what application, it will always have the following blocks):map $http_upgrade $connection_upgrade { # [Extranuous information removed] } map $upstream_http_referrer_policy $hrp { # [Extranuous information removed] } # http server server { # [Extranuous information removed] } # https server server { # [Extranuous information removed] }
Observe how for every Cloudron app, it will always have two
server
blocks. The first one defines the HTTP (i.e. port 80) listener, and usually contains a redirect to HTTPS. The second one is the HTTPS listener (i.e. port 443), and contains the bulk of the application-specific logic.For my specific use case, I needed to include custom
location
directives for my Cloudron app. This probably represents the most common use case for custom nginx configuration. Hence we need to insert our include directive at the end of the second server block. This can be done by searching for the last closing bracket of the file.It is this invariant which we may take advantage of. The following bash commands will search for the last closing bracket in the nginx file, and use
sed
to insert a line right before it.nginx_config_file="/etc/nginx/applications/[app_id]/[app_domain].conf" include_line="include custom-nginx-directives.conf;" # Find the line number(s) of all the closing brackets (i.e. '}') grep_result=$(grep --line-number '}' "$nginx_config_file") # Grep returns output in the form line_numbers:match. Extract only the line numbers with 'cut' line_numbers=$(echo "$grep_result" | cut --delimiter=':' --fields=1) # Select the last line number using 'tail' last_bracket_line=$(echo "$line_numbers" | tail --lines=1) # Append the $include_line before the last closing bracket in the Nginx configuration file. sed -i "${last_bracket_line}i $include_line" "$nginx_config_file"
Bash Script
We can take the above logic, and implement it in a bash script that does some additional checking and error prevention. We should first make sure that the line is not already present. Likewise, if the line is not present, and we include it, we should also automatically test and reload our nginx configuration so that the changes take effect. Taking the above considerations in mind, we yield the following final script:
#!/bin/bash # # Copyright (c) 2024 Shen Zhou Hong # This code is released under the MIT License. https://mit-license.org/ # # This Bash script checks if the line defined in $include_line is present # in the Nginx configuration file at $nginx_config_file. If the line is not # found, it appends the line before the last closing bracket and tests the # configuration using 'nginx -t'. If the test succeeds, it reloads Nginx with # 'systemctl reload nginx'. # # This file is intended for use with Cloudron in order to add includes to # the nginx configuration of applications in a way that is robust against being # re-written. It should be run using a cronjob as root. # # Example crontab entry (*/30 specifies that it shall run every 30 minutes): # # */30 * * * * /path/to/add_includes.sh 2>&1 set -eEu -o pipefail # Make sure to use absolute pathing if this script is run as a cronjob. # Replace [app_id] and [app_domain] with actual values. nginx_config_file="/etc/nginx/applications/[app_id]/[app_domain].conf" # By default, Nginx's include directive takes either a relative, or an absolute # path. On Ubuntu relative paths are defined in relation to /etc/nginx include_line="include custom-nginx-directives.conf;" # Check if the Nginx configuration file exists if [ ! -f "$nginx_config_file" ]; then echo "Error: Nginx configuration file '$nginx_config_file' not found." exit 1 fi # Check if the line is already present in the config file if grep -q "$include_line" "$nginx_config_file"; then echo "Line is already present in $nginx_config_file. No changes made to file." exit 0 # If the include_line is not present, we will add it to the end of the file, right # before the final closing bracket (i.e. '}'). else # Find the line number(s) of all the closing brackets (i.e. '}') grep_result=$(grep --line-number '}' "$nginx_config_file") # Grep returns output in the form line_numbers:match. Extract only the line numbers with 'cut' line_numbers=$(echo "$grep_result" | cut --delimiter=':' --fields=1) # Select the last line number using 'tail' last_bracket_line=$(echo "$line_numbers" | tail --lines=1) # Append the $include_line before the last closing bracket in the Nginx configuration file. # The sed command with the 'i' operation specifies an insertion at the line number defined # in the $last_bracket_line. sed -i "${last_bracket_line}i $include_line" "$nginx_config_file" echo "Line added successfully to $nginx_config_file." # Test the Nginx configuration if nginx -t; then echo "Nginx configuration test successful. Reloading Nginx..." systemctl reload nginx echo "Nginx reloaded successfully." else echo "Nginx configuration test failed. Please check the configuration manually." exit 1 fi fi
Link to latest version on Github Gist
Crontab
Running the script will perform the include once. However, the changes made to
/etc/nginx/applications
will inevitably be lost upon restart, or platform upgrade. Hence, we will next define a crontab entry that will run the script frequently at a regular interval. Access your crontab as root via:sudo crontab -e
Now add the following entry:
*/30 * * * * /path/to/add_includes.sh 2>&1
The above crontab command will run the script every 30 minutes.
Conclusion
This guide presents a method of adding persistent custom nginx directives to Cloudron applications using a bash script and a crontab. Although it is not a very sophisticated approach, it works well enough for my use case, and I hope it will be useful for other users as well.
In the future, I hope there will be a way for Cloudron to support custom Nginx directives, so that these workarounds are no longer necessary.
-
Plausible (Analytics Platform)@girish I've released the beta app package for plausible. Please let me know if you have any feedback: this app package is still very much a work in progress, and I hope to improve it, particularly in regards to the database backup concerns that you outlined earlier.
In particular, I hope to work with you to create a Clickbase addon so we can support Clickbase natively within Cloudron. I think this might be the best way forwards to take care of the backups issue.
-
Plausible (Analytics Platform)Cloudron App Package for Plausible Analytics Released
Hello everyone! Final update for the year of 2023. I have finished writing the initial documentation for my plausible app package, and I am now releasing the git repository for public testing. You may find the link to it below:Please note that the plausible-app package depends upon pre-built Ubuntu binaries. These un-official binaries are built directly from the upstream source at Plausible Analytics, are are compiled automatically using Github Actions. You may find the build environment used to build these binaries below:
Please be advised that while the app package works, it has not been extensively tested. This app package is not ready for production use, and using it may incurr data loss.
-
Plausible (Analytics Platform)@girish Great questions!
Best Practices for Backing-up Clickhouse
Clickhouse is a database, and you're absolutely right that the best practice for backing up databases would be to save and restore dumps. Right now, for the minimal viable prototype I've simply installed Clickhouse in the read-only Docker image, and then used a customclickhouse-config.xml
file to set it's data directory to/app/data/clickhouse/
. Right now, this seems to work, but a more mature implementation would ideally dump backups to /app/data, and then load them in upon restore. I'll have to do further research to see what's the best way to do this, so please allow me to get back to you with more information.Clickhouse Multi-tenancy and Potential Addon support
Clickhouse does support multi-tenancy. From my understanding, it is a full featured database system that allows you to define multiple users with complex access-control-lists and create multiple independent databases. It even offers a MYSQL-compatible interface for legacy applications. Hence, I believe it will absolutely be possible to implement Clickhouse as a first-class Cloudron addon. It would likewise simplify the backup and restore process as well.I am interested in implementing Clickhouse as a potential addon, for the future. I actually looked into it at the start of the packaging process, but I wasn't able to make much headway because I couldn't find a lot of documentation on how to create Cloudron addons. Where could I find the source code for some of Cloudron's existing addons? If you could direct me to the source code for Cloudron's MySQL, MongoDB, and PostgreSQL addons, I am confident I figure something out by comparing and contrasting the code.
-
Plausible (Analytics Platform)It works! After a final afternoon learning about Supervisord and multi-process Docker containerization, I was able to create a working Cloudron app package for Plausible! It was a very challenging, but enjoyable process - a journey where I learned how to use multiple tools, such as docker-based build environments, Elixir, Clickbase, and Supervisord. All that remains now is for me to write the documentation and end-to-end unit tests!
I will release a git repository with the unstable app package once I finish writing the documentation. In the meantime, I am going to do some additional testing on my own, including testing of the sendmail addon configuration. In particular, I want to make that the plausible database is properly backed up and restorable, prior to releasing a git repository. I hope to get this done in the next few days!
In the meantime, if you want to help me out, please help me test Plausible! Come visit my website with your ad-blocker disabled, so I can generate some test data. I can't figure out how to work the upstream Elixir test harness, so instead of synthetic data you can provide me with some real data
-
Plausible (Analytics Platform)Update: I'm almost done packaging the app! This one was quite difficult! I've managed to make Clickhouse available in a secure, and reasonably elegant fashion, and I've gotten to the point where I can get Plausible working within my Cloudron setup.
Right now, the only task I have left is to use supervisord to manage the Clickhouse and Plausible processes within the same Docker container. Once I complete this, I will release a repository for testing.
-
Plausible (Analytics Platform)Plausible Analytics Packaging Attempt
I've began an attempt at packaging Plausible for the Cloudron platform. So far, I've spent two days on it, and it's a non-trivial effort. I'll leave some notes here for those who may be considering packaging this, or similar apps:
Overview of Plausible's architecture
Plausible depends upon two databases:
- PostgreSQL, which is provided by Cloudron's Postgres addon
- Clickhouse, a high-performance DBMS commonly used in analytics.
Clickhouse must be made available to Plausible in some way, and the upstream maintainers at Plausible provide a reference self-hosting implementation using multiple docker containers and
docker-compose
. This is a turnkey solution which completely abstracts away the underlying Plausible implementation.This reference-implementation depends on an alpinelinux docker container for the Plausible binary, as well as upstream container images for Postgresql and Clickhouse.
Difficulties in the packaging process
There were two major challenges that I encountered in the packaging process.
Building Plausible Binaries for Cloudron's Ubuntu-derived base Image
Plausible is not a simple nodejs application. It is an Elixir application written in Erlang, that uses NPM for asset management. It depends on an Elixir-based toolchain that creates compiled binaries from the Plausible source code.
The Plausible team does not provide builds for their application, hence any prospective user or self-hoster that does not wish to depend upon the reference implementation must build Plausible from source.
The Plausible repository contains a
Dockerfile
which builds the application binary as a part of their own containerization process. My first attempt was to replicate the build process using Plausible's original Dockerfile, copy the build artifacts, and then create a Cloudron package using said builds.Unfortunately, this did not work because Plausible's original build toolchain depends on an Alpinelinux environment. Cloudron containers use a base image that is derived from Ubuntu 22.04. Hence the build binaries for Plausible were entirely incompatible with the Cloudron docker image.
Thus for my second attempt, I setup a parallel repository that specifically created an Ubuntu 22.04-based build environment for Plausible, and used git submodules to create unofficial, Ubuntu-compatible binaries.
These binaries worked, and with a little additional effort, I was able to copy them over to my Cloudron app package.
Packaging Clickhouse for the Cloudron App Package
The second difficulty that I encountered was providing Clickhouse for the Cloudron app package. Clickhouse is a high-performance DBMS system often used in the analytics space, and Plausible requires it as a dependency.
Hence, any effort to package Plausible for Cloudron requires packaging Clickhouse as well.
The major difficulty that I am encountering is that Clickhouse expects to be run under the
clickhouse
user. This causes all sorts of obscure permissions issues when combined withgosu
and the Cloudron environment. Additionally, the clickhouse database configuration is non-trivial, and so far I have yet to manage a working database connection between Plausible and Clickhouse.Plausible Initialization
Finally, Plausible's binary must be run with a set of configuration options exposed as environment variables. These configuration options require initialization, and some of them are undocumented.
I hope these notes will be helpful to any future packaging attempts, as they took me quite a lot of time to figure out.
Plausible requires an
SECRET_KEY_BASE
env, which you can provision according to the documentation here. However in addition to that, it requires aTOTP_VAULT_KEY
env, which is completely undocumented. The application binary will crash and segfault if it is not provided. TheTOTP_VAULT_KEY
env is a 32-byte base64-encoded string. An example for it can be found at Plausible'senv.dev
file.Finally, on initial setup, Plausible expects certain tables to be available in the PostgreSQL database. If those tables do not exist, the binary crashes and segfaults. Plausible provides a
createdb.sh
script which initializes its database, but it does not work on Cloudron because it expects an empty PostgreSQL server and tries to create its own database.createdb.sh
is unable to take Cloudron's existing database (which is created automatically per-application) and simply create tables within it.A workaround is available for this issue, by running Plausible's
migrate.sh
instead. The script exits with a non-zero exit code, because there is nothing to migrate, but it does create the right tables. Thus as a part of the application's initialization process, the packager must be careful to run the migration script in order to initialize Plausible's PostgreSQL database.Summary
In conclusion, Plausible Analytics is a challenging app to package for the Cloudron environment, chiefly due to three difficulties:
- A non-trivial Erlang-based build process which must create compatible binaries for Cloudron's Ubuntu-derived base image
- Dependency on Clickhouse, a high-performance DBMS which is difficult to configure, and has obscure permissions-related issues due to dependency on
clickhouse
user. - Database initialization is poorly documented and requires a workaround.
Right now, I was able to successfully create a build toolchain that outputs compatible Ubuntu binaries. I was also able to solve the PostgreSQL database initialization issue with the aforementioned
migrate.sh
workaround, and I am even able to see the Plausible admin dashboard.My only outstanding difficulty is with Clickhouse, as I can't seem to get it to read the
config.xml
files, nor write to files in the right place.Thus, this is a call-for-help. Can somebody help me get Clickhouse working within a Cloudron application package? The most difficult part of Plausible's packaging process was the build toolchain, and I was able to get that working. If we can get Clickhouse to work, I am confident that we can release an app package for Plausible in due course.
-
Actual - Self Hosted, Privacy Friendly Financial Planning SystemI implemented automated testing!
A final update: I implemented automated testing for the package. The tests include:
- Installing the app, and configuring it with test credentials and data
- Logging in and out of the app
- Restarting the app
- Backing up and restoring from backup
- Moving the app to a different location (i.e. subdomain)
You can run the tests within the
test/
directory using:USERNAME=<cloudron username> PASSWORD=<cloudron password> mocha test.js
At this point, I believe this package is 90% ready for inclusion into the Cloudron marketplace. Please let me know if there's any additional improvements or modifications that I can make!
-
Actual - Self Hosted, Privacy Friendly Financial Planning System@LoudLemur Thank you for your warm thanks and holiday wishes! I had packaged Actual because I wanted to use it for my personal use. Indeed, that was why I originally created this thread on the App wishlist nearly two years ago. Ultimately, I realised the best way for me to contribute back to the Cloudron community, was to package the apps I needed myself - that's what inspired me to go ahead and package this application!
As for feedback on the packaging process: I found it a little difficult and unintuitive, particularly because it was difficult to find all the information that I needed, and a lot of it was scattered in many places. In my opinion, the documentation for the app packaging process is rather sparse. This is because although the different files and options are adequately documented, there is very little on best practices.
Indeed, the best source of information on app packaging was from the repositories of existing applications. I spent the vast majority of my time leafing through the
Dockerfile
,start.sh
, andCloudronManifest.json
of apps like:- https://git.cloudron.io/cloudron/hedgedoc-app/
- https://git.cloudron.io/cloudron/hastebin-app/
- https://git.cloudron.io/cloudron/ghost-app/
I would often compare and contrast them, or look at how @girish would accomplish a common task like provisioning a
config.json
file, in order to know how to package my own app.Overall, I'd say that the app packaging process is adequate, but has significant room for improvement. Perhaps this would be an opportunity for me to contribute to the documentation in the future
Finally, as for future steps: I hope to first understand how to configure selenium tests for my existing app, so I can understand the complete, end-to-end packaging process. Once I get tests figured out, I'll move on to other apps!
-
Actual - Self Hosted, Privacy Friendly Financial Planning SystemApp Package Created for Actualbudget
Hello everyone!
I've created a Cloudron App Package for Actualbudget. This package was created after following the instructions in the Cloudron app packaging tutorial, cheat sheet, as well as by following best-practices observed in other, existing Cloudron app packages. If you want to give Actualbudget a try on Cloudron, it's available now at the following git repository:
I've tested the application to the fullest extent possible as an individual. Please feel free to share any feedback with me here!
@girish I believe my app package is ready for unstable status and inclusion into the Cloudron app market. Note that:
- The
Dockerfile
andstart.sh
files properly expose environment variables, configuration files, and encapsulates the upstream app data in/app/data
in a non-naive manner. - The app package contains a detailed and well-defined
CloudronManifest.json
file with all relevent fields, includingmedialinks
,minBoxVersion
,icon
, andhealthCheckPath
. - The app package contains a complete
DESCRIPTION.md
,POSTINSTALL.md
, andCHANGELOG
.
I believe the only thing that this application is lacking are the selenium tests. I hope to contribute these at a later date.Edit: As of 2023-12-24, the repository also includes a complete set of application lifecycle tests, such as:
- Installing the app, and configuring it with test credentials and data
- Logging in and out of the app
- Restarting the app
- Backing up and restoring from backup
- Moving the app to a different location (i.e. subdomain)
Updating the app*
*The only test that I was unable to implement is the app update test. From what I can tell, the app needs to be published on the Cloudron store for that. I assume you'll be able to implement that final test once the app is published
- The
-
Actual - Self Hosted, Privacy Friendly Financial Planning System@girish Hi there! Congratulations on the recent 7.3.x release. Is there any information on adding Actual to the Cloudron app store? I would very much like to use it
-
Actual - Self Hosted, Privacy Friendly Financial Planning System@LoudLemur Thank you very much! I'll update the post to include the docker image link.
I'm glad to see so much interest for Actual. I am really looking forwards to seeing it on Cloudron as well!
-
Actual - Self Hosted, Privacy Friendly Financial Planning SystemIt will be really great if this app could be packaged before the start of June. I'll be beginning a new job then, and it would be a good time to upgrade my budgeting workflow
-
Actual - Self Hosted, Privacy Friendly Financial Planning System@jdaviescoates Thank you!
I'll edit the main post to include the correct link. I am quite excited for Actual, as it is quite polished and well-made- and I hope to see it on Cloudron soon!
-
Actual - Self Hosted, Privacy Friendly Financial Planning SystemUpdate: As of 2023-12-23, I have created a Cloudron app package for Actual. It is now ready for use, see the following:
Actual is a self hosted, open source financial planning and budgeting system. It's designed to be fast, local-first, and privacy friendly, and has a lot of powerful features for budgeting and financial planning.
The Actual application was originally a closed-source software offering by James Long, but it has been recently released as an open source application under the MIT license.
The actual website and landing page is available at:
And their source code is available at:
https://github.com/actualbudget/actual-server
https://github.com/actualbudget/actual
Actual already comes with a Docker image:
https://github.com/actualbudget/actual-server#running-via-docker
They have an online demo that you can use to try out the features here:
I recommend Cloudron to package Actual, because it is a modern, high-quality financial planning app that adds value to a category that Cloudron does not have many applications in. Actual is fast, and unusually powerful, and has a great feature-set and UX/UI experience.
-
Problems with Log Analytics with MatomoHey there, @FeelNiceInc . I'm glad to hear that my tutorial was helpful for you, and I'm sorry that my provided regex did not work.
I think @girish 's solution is the best - by changing Cloudron's Nginx webserver to use the default
combined
log format, matomo's log import script will automatically recognise and import the logs without needing to specify a special regex.The regex that I provided in my tutorial was specifically in order to accomodate Cloudron's idiosyncratic
combined2
log format -- but otherwise it provides little benefit.I'm not sure why the regex didn't work for you, as it is working for me. For future readers that stumble upon this thread, I would recommend going with @girish 's advice, and simply change Cloudron to use the
combined
format.However, if you already have an archive of logs that are in the
combined2
format which you need to import, I recommend trying to figure out the correct regex by hand. I use a regex visualiser called RegExr, which makes it easier to craft custom regular expressions.The Regexr link to the
combined2
log format is here:I recommend taking a few lines of your server logs, and pasting them into regexr -- and see what matches, and what doesn't match. The way the regex expression is formatted is that it defines a few named capture groups, which are as follows:
(?P<ip>[\w*.:-]+)
IP Address(?P<date>.*?)
Date(?P<timezone>.*?)
Timezone(?P<method>\S+)
HTTP Request Method (e.g. Post, Get)(?P<path>.*?)
HTTP Request Path (e.g. /homepage.html)(?P<status>\d+)
HTTP Request Status(?P<generation_time_milli>\d*\.?\d+)
Amount of time for the server to respond(?P<referrer>.*?)
Referrer header(?P<host>[\w\-\.]*)
Host(?P<user_agent>.*?)
User Agent (what browser, device, etc)
All the weird things like
\s
or.+
in between simply account for things like spaces in the log lines. Try playing around with the Regex until it matches everything in your logs. The regexr website makes it all very visual and easy to understand.I'm glad that you were able to get log analytics working. I hope this helps!
-
Guide to Setting up Log Analytics with Matomo - Automatically Import Combined2 Format Nginx Logs Using a CronjobAbout Matomo and Log Analytics
This is a guide on setting up Log Analytics with Matomo. Matomo is an open source, self hosted, privacy-friendly analytics platform that comes available as a Cloudron crate. Standard Matomo installations ingest data through a JavaScript tracking that you must embed in each website that you wish to enable analytics on.
Matomo also offers Log Analytics - where instead of using a client-side JavaScript tracker, it ingests data directly from your Nginx log files (
access.log
). In comparison to using JavaScript, server-side Log Analytics have the following benefits:- It's more privacy-friendly: Instead of injecting tracking code into your website, you will do passive analysis from Nginx log data only.
- It offers better performance for visitors: If your website is optimised for speed, you don't want to make another request for the analytics library, which adds to the loading time.
- It's more durable: With the popularity of ad-blockers, a lot of analytics scripts don't load at all. Log Analytics offer more accurate data.
I am using Log Analytics primarily out of privacy consideration for my website's visitors. I want to understand where my visitors come from, but in the most respectful, privacy-friendly way possible. Server-side log analytics means I won't inject any code at all, which is much friendlier in my opinion.
Overview of how log analytics works
Broadly speaking, the process for sending logs to your Matomo installation looks like this. We will be automating it using a cronjob.
- Cloudron's Nginx webserver creates logs in
/var/log/nginx
which are calledaccess.log
. - Using Matomo's
import_logs.py
script (GitHub), we send the log files to your Matomo installation url (e.g.https://matomo.example.com
- On your Matomo docker container, an
archive
job is run, and the data is now available on the dashboard.
The biggest difficulty in this setup is step 2. As of Cloudron box version
7.0.1
, Cloudron's Nginx is configured to use a niche log format calledcombined2
. This log format seems to be only used bycollectd
and nobody else, hence Matomo'simport_logs.py
script cannot parse it. We will have to use a custom regex pattern in order to allow Matomo's import script to work.Note: According to @girish , Cloudron will revert to the default Nginx
combined
log format for the 7.1 release (source). Hence, if you are following this guide from the future, feel free to omit the custom regex pattern.Differences between Nginx's default log format, and Cloudron's Combined2
The
combined2
log format that Cloudron uses is slightly different from Nginx's defaultcombined
log format. Here's a comparison of their structure:The
combined2
format:$remote_addr - [$time_local] "$request" $status $body_bytes_sent $request_time "$http_referer" "$host" "$http_user_agent"
The
combined
format:$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"
As you can see, the fields are different just enough that the
import_logs.p
script cannot parse it. Thankfully, we can specify a custom regex pattern using the--log-format-regex
option.Regex Pattern for
combined2
logs:This is the Regex pattern that you need to use to parse the logs successfully:
(?P<ip>[\w*.:-]+)\s+\S+\s+\[(?P<date>.*?)\s+(?P<timezone>.*?)\]\s+"(?P<method>\S+)\s+(?P<path>.*?)\s+\S+"\s+(?P<status>\d+)\s+(?P<length>\S+)\s+(?P<generation_time_milli>\d*\.?\d+)\s+"(?P<referrer>.*?)"\s"(?P<host>[\w\-\.]*)"\s"(?P<user_agent>.*?)"
Essentially, it defines a bunch of named capture groups, such as
<date>
or<path>
whichimport_logs.py
can understand.Here's a nice, visual explanation of the regex format, complete with some example log data (IP addresses are fake, sourced from reserved ranges):
Using the
import_logs.py
scriptIn order to import our server logs into Matomo, we must use their provided Python 3 import script. We can get the script from their official Github repository:
https://github.com/matomo-org/matomo-log-analytics
I will show you where to download it in a moment.
The
import_logs.py
script requires three parameters:--url
: This is the url of your matomo installation. It must include thehttps://
prefix!--token-auth
: This is an API authentication token from Matomo. You must generate it from the dashboard.--log-format-regex
This tells the script to use your custom provided regex pattern, so it can understand Cloudron'scombined2
format.
Once again, if you are following this guide from the future (e.g. version 7.1 and above), you do not need to specify the
--log-format-regex
.This is how the command should look like:
python3 import_logs.py \ --url=https://matomo.example.com \ --token-auth=KEEP_THIS_SECRET \ --log-format-regex='(?P<ip>[\w*.:-]+)\s+\S+\s+\[(?P<date>.*?)\s+(?P<timezone>.*?)\]\s+"(?P<method>\S+)\s+(?P<path>.*?)\s+\S+"\s+(?P<status>\d+)\s+(?P<length>\S+)\s+(?P<generation_time_milli>\d*\.?\d+)\s+"(?P<referrer>.*?)"\s"(?P<host>[\w\-\.]*)"\s"(?P<user_agent>.*?)"' \ /var/log/nginx/access.log.1
Run the
import_logs.py
Now we are ready to get it working. We must first login to your base Cloudron server. Cloudron does not run Nginx on a per-application basis (i.e. in every docker container), but rather runs Nginx on the base server itself. Hence all the logs are there, and we need to execute the script there.
First, login to your server using SSH:
ssh root@my.example.com
Next, we will download the script from Matomo, and go inside the directory that contains it.
cd ~ git clone https://github.com/matomo-org/matomo-log-analytics.git cd matomo-log-analytics
Now we run the above command. Make sure to have the correct
--url
and--token-auth
parameters, as well as the right log file, which should be/var/log/nginx/access.log.1
.python3 import_logs.py \ --url=https://matomo.example.com \ --token-auth=KEEP_THIS_SECRET \ --log-format-regex='(?P<ip>[\w*.:-]+)\s+\S+\s+\[(?P<date>.*?)\s+(?P<timezone>.*?)\]\s+"(?P<method>\S+)\s+(?P<path>.*?)\s+\S+"\s+(?P<status>\d+)\s+(?P<length>\S+)\s+(?P<generation_time_milli>\d*\.?\d+)\s+"(?P<referrer>.*?)"\s"(?P<host>[\w\-\.]*)"\s"(?P<user_agent>.*?)"' \ /var/log/nginx/access.log.1
If you are successful, you should see the output which looks like this:
x lines parsed, x lines recorded, x records/sec (avg), x records/sec (current) Parsing log /var/log/nginx/access.log.1... x lines parsed, x lines recorded, x records/sec (avg), x records/sec (current) ... Processing your log data ------------------------ In order for your logs to be processed by Matomo, you may need to run the following command: ./console core:archive --force-all-websites --url='https://matomo.example.com'
Now your logs should have been ingested by Matomo. If you have any additional logs, such as
access.log.2
,access.log.3
, et cetera, this is the time to import them as well.In order for Matomo's dashboard to update, we will have to tell it to
archive
, thankfully the default Matomo cloudron installation is already configured to archive automatically every 15 minutes. If you wish to perform a manualarchive
, simply open a terminal in the Matomo docker (you can do this from the browser) and tell it to run thearchive
cronjob.Check your Matomo dashboard now
Do you see any data? If you do not see any data, it may be because you have not setup a website in the Matomo dashboard. By default, Matomo rejects log entries that do not correspond to a website in the dashboard. If your Matomo install is brand new, this is the time for you to add your websites. Then run the import commands again.
Now your dashboard should be updated with the log analytics.
Automating Log Imports using Cronjobs
Now, we must import the files every day. The best way to automate this is to put the command into a bash script, and set a cronjob to automate it. This is the script that you can use:
#!/usr/bin/env bash python3 import_logs.py \ --url=https://matomo.example.com \ --token-auth=KEEP_THIS_SECRET \ --log-format-regex='(?P<ip>[\w*.:-]+)\s+\S+\s+\[(?P<date>.*?)\s+(?P<timezone>.*?)\]\s+"(?P<method>\S+)\s+(?P<path>.*?)\s+\S+"\s+(?P<status>\d+)\s+(?P<length>\S+)\s+(?P<generation_time_milli>\d*\.?\d+)\s+"(?P<referrer>.*?)"\s"(?P<host>[\w\-\.]*)"\s"(?P<user_agent>.*?)"' \ access.log.1
Make sure that the log file is
access.log.1
. Nginx automatically rotates the log files once a day at midnight, where the log files switch like this:access.log -> access.log.1 -> access.log.2.gz -> access.log.3.gz
Since we are running our cronjob once a day at 1:00am, we always want to get
access.log.1
which represent "yesterday's" logs, fresh right after the rotation If you ran the import onaccess.log
, you will gain an empty file since the logs were just rotated.Save it somewhere like at
/root/import-cronjob.sh
.Now, all you have to do is to add the cronjob into the
root
crontab. To do so, you run:crontab -e
Follow the on-screen instructions to choose an editor, and then add the following cronjob:
0 1 * * * /root/import-cronjob.sh >/dev/null 2>&1
This tells the server to run the script once a day, at 1:00 (1:00am), and to silence all output.
Save the crontab, and now your setup should be complete. Congratulations, your Matomo instance on Cloudron is now using server-side Log Analytics!
I hope this tutorial has been helpful. If you need any help, feel free to ask questions.
Keywords to aid search
Matomo, log analytics, nginx, logs, combined2, log format.