Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


    Cloudron Forum

    • Register
    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular

    Solved Problems with Log Analytics with Matomo

    Matomo
    4
    7
    234
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • F
      FeelNiceInc last edited by

      Hello all,
      first of all thanks to @Sydney for your great tutorial for log analysis.
      Unfortunately I still have problems in the implementation.

      I installed Matomo in Cloudron, set up the site in Matomo and now I want to import the logs. I use the command for this:

      python3 import_logs.py \
      --url=https://analytics.my-site.de \
      --token-auth=my-token\
      --log-format-regex='(?P<ip>[\w*.:-]+)\s+\S+\s+[(?P<date>.*?)\s+(?P<timezone>.*?)]\s+"(?P<method>\S+)\s+(?P<path>.*?)\s+\S+"\s+(?P<status>\d+)\s+(?P<length>\S+)\s+(?P<generation_time_milli>\d*\.?\d+)\s+"(?P<referrer>.*?)"\s"(?P<host>[\w\-\.]*)"\s"(?P<user_agent>.*?)"' \
      /var/log/nginx/access.log.1
      

      The answer:

      0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
      Parsing log /var/log/nginx/access.log.1...
      
      Logs import summary
      -------------------
      
          0 requests imported successfully
          0 requests were downloads
          23233 requests ignored:
              0 HTTP errors
              0 HTTP redirects
              23233 invalid log lines
              0 filtered log lines
              0 requests did not match any known site
              0 requests did not match any --hostname
              0 requests done by bots, search engines...
              0 requests to static resources (css, js, images, ico, ttf...)
              0 requests to file downloads did not match any --download-extensions
      
      Website import summary
      ----------------------
      
          0 requests imported to 0 sites
              0 sites already existed
              0 sites were created:
      
          0 distinct hostnames did not match any existing site:
      
      
      
      Performance summary
      -------------------
      
          Total time: 0 seconds
          Requests imported per second: 0.0 requests per second
      
      Processing your log data
      ------------------------
      
          In order for your logs to be processed by Matomo, you may need to run the following command:
           ./console core:archive --force-all-websites --url='https://analytics.my-site.de'
      
      

      Debug Log-Example:

      Invalid line detected (line did not match): 66.249.*.* - [31/Jan/2022:21:59:34 +0000] "GET my-site.com/blog/*/*/* HTTP/1.1" 200 14007 0.438 "-" "my-site.com" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.g**gle.com/bot.html)"
      
      

      I only want to track a WordPress site, no other apps running in Cloudron.

      Version Cloudron: v7.0.4 (Ubuntu 20.04.3 LTS)
      Version Matomo: Matomo 4.6.2

      Would anyone here support me?
      Thanks in advance 🙂

      girish 1 Reply Last reply Reply Quote 0
      • girish
        girish Staff @FeelNiceInc last edited by

        @feelniceinc I think this is because the regexp to parse the log lines is not correct. Cloudron uses a format called "combined2" like below, so you might have to adjust that regexp accordingly:

            log_format combined2 '$remote_addr - [$time_local] '
                '"$request" $status $body_bytes_sent $request_time '
                '"$http_referer" "$host" "$http_user_agent"';
        

        That said, in the next release, we have removed the above custom format since it was causing problems when integrating with other tools (like crowdsec, iirc). As a temporary workaround, you can edit the nginx configs to say access_log /var/log/nginx/access.log combined; instead of combined2 and restart nginx to see if it parses correctly.

        F 1 Reply Last reply Reply Quote 1
        • F
          FeelNiceInc @girish last edited by

          @girish Hey, thank you for the hint. Matomo is working now. Thank you 🙂

          robi 1 Reply Last reply Reply Quote 0
          • robi
            robi @FeelNiceInc last edited by

            @feelniceinc Can you post your final import command so the solution is available here?

            Life of Advanced Technology

            F 1 Reply Last reply Reply Quote 0
            • F
              FeelNiceInc @robi last edited by

              @robi said in Problems with Log Analytics with Matomo:

              @feelniceinc Can you post your final import command so the solution is available here?

              Hey, my bash script for the cronjob is now:

              #!/usr/bin/env bash
              
              sudo python3 /path/to/matomo-log-analytics/import_logs.py \
              --url=https://mysite/matomo \
              --token-auth=token \
              --idsite=site_id  \
              /var/log/nginx/access.log.1
              
              
              
              1 Reply Last reply Reply Quote 0
              • S
                Sydney last edited by

                Hey there, @FeelNiceInc . I'm glad to hear that my tutorial was helpful for you, and I'm sorry that my provided regex did not work.

                I think @girish 's solution is the best - by changing Cloudron's Nginx webserver to use the default combined log format, matomo's log import script will automatically recognise and import the logs without needing to specify a special regex.

                The regex that I provided in my tutorial was specifically in order to accomodate Cloudron's idiosyncratic combined2 log format -- but otherwise it provides little benefit.

                I'm not sure why the regex didn't work for you, as it is working for me. For future readers that stumble upon this thread, I would recommend going with @girish 's advice, and simply change Cloudron to use the combined format.

                However, if you already have an archive of logs that are in the combined2 format which you need to import, I recommend trying to figure out the correct regex by hand. I use a regex visualiser called RegExr, which makes it easier to craft custom regular expressions.

                The Regexr link to the combined2 log format is here:

                https://regexr.com/6dlnf

                I recommend taking a few lines of your server logs, and pasting them into regexr -- and see what matches, and what doesn't match. The way the regex expression is formatted is that it defines a few named capture groups, which are as follows:

                • (?P<ip>[\w*.:-]+) IP Address
                • (?P<date>.*?) Date
                • (?P<timezone>.*?) Timezone
                • (?P<method>\S+) HTTP Request Method (e.g. Post, Get)
                • (?P<path>.*?) HTTP Request Path (e.g. /homepage.html)
                • (?P<status>\d+) HTTP Request Status
                • (?P<generation_time_milli>\d*\.?\d+) Amount of time for the server to respond
                • (?P<referrer>.*?) Referrer header
                • (?P<host>[\w\-\.]*) Host
                • (?P<user_agent>.*?) User Agent (what browser, device, etc)

                All the weird things like \s or .+ in between simply account for things like spaces in the log lines. Try playing around with the Regex until it matches everything in your logs. The regexr website makes it all very visual and easy to understand.

                I'm glad that you were able to get log analytics working. I hope this helps!

                F 1 Reply Last reply Reply Quote 2
                • F
                  FeelNiceInc @Sydney last edited by

                  Hey @sydney,
                  I tried regexr.com. The regex seems to be correct. However, I still get the error message. No idea what is wrong here.
                  But thanks for your engagement 🙂

                  1 Reply Last reply Reply Quote 0
                  • First post
                    Last post
                  Powered by NodeBB