Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps | Demo | Docs | Install
  1. Cloudron Forum
  2. Matomo
  3. Problems with Log Analytics with Matomo

Problems with Log Analytics with Matomo

Scheduled Pinned Locked Moved Solved Matomo
7 Posts 4 Posters 1.1k Views 4 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • F Offline
    F Offline
    FeelNiceInc
    wrote on last edited by
    #1

    Hello all,
    first of all thanks to @Sydney for your great tutorial for log analysis.
    Unfortunately I still have problems in the implementation.

    I installed Matomo in Cloudron, set up the site in Matomo and now I want to import the logs. I use the command for this:

    python3 import_logs.py \
    --url=https://analytics.my-site.de \
    --token-auth=my-token\
    --log-format-regex='(?P<ip>[\w*.:-]+)\s+\S+\s+[(?P<date>.*?)\s+(?P<timezone>.*?)]\s+"(?P<method>\S+)\s+(?P<path>.*?)\s+\S+"\s+(?P<status>\d+)\s+(?P<length>\S+)\s+(?P<generation_time_milli>\d*\.?\d+)\s+"(?P<referrer>.*?)"\s"(?P<host>[\w\-\.]*)"\s"(?P<user_agent>.*?)"' \
    /var/log/nginx/access.log.1
    

    The answer:

    0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
    Parsing log /var/log/nginx/access.log.1...
    
    Logs import summary
    -------------------
    
        0 requests imported successfully
        0 requests were downloads
        23233 requests ignored:
            0 HTTP errors
            0 HTTP redirects
            23233 invalid log lines
            0 filtered log lines
            0 requests did not match any known site
            0 requests did not match any --hostname
            0 requests done by bots, search engines...
            0 requests to static resources (css, js, images, ico, ttf...)
            0 requests to file downloads did not match any --download-extensions
    
    Website import summary
    ----------------------
    
        0 requests imported to 0 sites
            0 sites already existed
            0 sites were created:
    
        0 distinct hostnames did not match any existing site:
    
    
    
    Performance summary
    -------------------
    
        Total time: 0 seconds
        Requests imported per second: 0.0 requests per second
    
    Processing your log data
    ------------------------
    
        In order for your logs to be processed by Matomo, you may need to run the following command:
         ./console core:archive --force-all-websites --url='https://analytics.my-site.de'
    
    

    Debug Log-Example:

    Invalid line detected (line did not match): 66.249.*.* - [31/Jan/2022:21:59:34 +0000] "GET my-site.com/blog/*/*/* HTTP/1.1" 200 14007 0.438 "-" "my-site.com" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.g**gle.com/bot.html)"
    
    

    I only want to track a WordPress site, no other apps running in Cloudron.

    Version Cloudron: v7.0.4 (Ubuntu 20.04.3 LTS)
    Version Matomo: Matomo 4.6.2

    Would anyone here support me?
    Thanks in advance 🙂

    girishG 1 Reply Last reply
    0
    • F FeelNiceInc

      Hello all,
      first of all thanks to @Sydney for your great tutorial for log analysis.
      Unfortunately I still have problems in the implementation.

      I installed Matomo in Cloudron, set up the site in Matomo and now I want to import the logs. I use the command for this:

      python3 import_logs.py \
      --url=https://analytics.my-site.de \
      --token-auth=my-token\
      --log-format-regex='(?P<ip>[\w*.:-]+)\s+\S+\s+[(?P<date>.*?)\s+(?P<timezone>.*?)]\s+"(?P<method>\S+)\s+(?P<path>.*?)\s+\S+"\s+(?P<status>\d+)\s+(?P<length>\S+)\s+(?P<generation_time_milli>\d*\.?\d+)\s+"(?P<referrer>.*?)"\s"(?P<host>[\w\-\.]*)"\s"(?P<user_agent>.*?)"' \
      /var/log/nginx/access.log.1
      

      The answer:

      0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
      Parsing log /var/log/nginx/access.log.1...
      
      Logs import summary
      -------------------
      
          0 requests imported successfully
          0 requests were downloads
          23233 requests ignored:
              0 HTTP errors
              0 HTTP redirects
              23233 invalid log lines
              0 filtered log lines
              0 requests did not match any known site
              0 requests did not match any --hostname
              0 requests done by bots, search engines...
              0 requests to static resources (css, js, images, ico, ttf...)
              0 requests to file downloads did not match any --download-extensions
      
      Website import summary
      ----------------------
      
          0 requests imported to 0 sites
              0 sites already existed
              0 sites were created:
      
          0 distinct hostnames did not match any existing site:
      
      
      
      Performance summary
      -------------------
      
          Total time: 0 seconds
          Requests imported per second: 0.0 requests per second
      
      Processing your log data
      ------------------------
      
          In order for your logs to be processed by Matomo, you may need to run the following command:
           ./console core:archive --force-all-websites --url='https://analytics.my-site.de'
      
      

      Debug Log-Example:

      Invalid line detected (line did not match): 66.249.*.* - [31/Jan/2022:21:59:34 +0000] "GET my-site.com/blog/*/*/* HTTP/1.1" 200 14007 0.438 "-" "my-site.com" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.g**gle.com/bot.html)"
      
      

      I only want to track a WordPress site, no other apps running in Cloudron.

      Version Cloudron: v7.0.4 (Ubuntu 20.04.3 LTS)
      Version Matomo: Matomo 4.6.2

      Would anyone here support me?
      Thanks in advance 🙂

      girishG Offline
      girishG Offline
      girish
      Staff
      wrote on last edited by
      #2

      @feelniceinc I think this is because the regexp to parse the log lines is not correct. Cloudron uses a format called "combined2" like below, so you might have to adjust that regexp accordingly:

          log_format combined2 '$remote_addr - [$time_local] '
              '"$request" $status $body_bytes_sent $request_time '
              '"$http_referer" "$host" "$http_user_agent"';
      

      That said, in the next release, we have removed the above custom format since it was causing problems when integrating with other tools (like crowdsec, iirc). As a temporary workaround, you can edit the nginx configs to say access_log /var/log/nginx/access.log combined; instead of combined2 and restart nginx to see if it parses correctly.

      F 1 Reply Last reply
      1
      • girishG girish

        @feelniceinc I think this is because the regexp to parse the log lines is not correct. Cloudron uses a format called "combined2" like below, so you might have to adjust that regexp accordingly:

            log_format combined2 '$remote_addr - [$time_local] '
                '"$request" $status $body_bytes_sent $request_time '
                '"$http_referer" "$host" "$http_user_agent"';
        

        That said, in the next release, we have removed the above custom format since it was causing problems when integrating with other tools (like crowdsec, iirc). As a temporary workaround, you can edit the nginx configs to say access_log /var/log/nginx/access.log combined; instead of combined2 and restart nginx to see if it parses correctly.

        F Offline
        F Offline
        FeelNiceInc
        wrote on last edited by
        #3

        @girish Hey, thank you for the hint. Matomo is working now. Thank you 🙂

        robiR 1 Reply Last reply
        0
        • F FeelNiceInc

          @girish Hey, thank you for the hint. Matomo is working now. Thank you 🙂

          robiR Offline
          robiR Offline
          robi
          wrote on last edited by
          #4

          @feelniceinc Can you post your final import command so the solution is available here?

          Conscious tech

          F 1 Reply Last reply
          0
          • robiR robi

            @feelniceinc Can you post your final import command so the solution is available here?

            F Offline
            F Offline
            FeelNiceInc
            wrote on last edited by
            #5

            @robi said in Problems with Log Analytics with Matomo:

            @feelniceinc Can you post your final import command so the solution is available here?

            Hey, my bash script for the cronjob is now:

            #!/usr/bin/env bash
            
            sudo python3 /path/to/matomo-log-analytics/import_logs.py \
            --url=https://mysite/matomo \
            --token-auth=token \
            --idsite=site_id  \
            /var/log/nginx/access.log.1
            
            
            
            1 Reply Last reply
            0
            • S Offline
              S Offline
              Sydney
              wrote on last edited by
              #6

              Hey there, @FeelNiceInc . I'm glad to hear that my tutorial was helpful for you, and I'm sorry that my provided regex did not work.

              I think @girish 's solution is the best - by changing Cloudron's Nginx webserver to use the default combined log format, matomo's log import script will automatically recognise and import the logs without needing to specify a special regex.

              The regex that I provided in my tutorial was specifically in order to accomodate Cloudron's idiosyncratic combined2 log format -- but otherwise it provides little benefit.

              I'm not sure why the regex didn't work for you, as it is working for me. For future readers that stumble upon this thread, I would recommend going with @girish 's advice, and simply change Cloudron to use the combined format.

              However, if you already have an archive of logs that are in the combined2 format which you need to import, I recommend trying to figure out the correct regex by hand. I use a regex visualiser called RegExr, which makes it easier to craft custom regular expressions.

              The Regexr link to the combined2 log format is here:

              https://regexr.com/6dlnf

              I recommend taking a few lines of your server logs, and pasting them into regexr -- and see what matches, and what doesn't match. The way the regex expression is formatted is that it defines a few named capture groups, which are as follows:

              • (?P<ip>[\w*.:-]+) IP Address
              • (?P<date>.*?) Date
              • (?P<timezone>.*?) Timezone
              • (?P<method>\S+) HTTP Request Method (e.g. Post, Get)
              • (?P<path>.*?) HTTP Request Path (e.g. /homepage.html)
              • (?P<status>\d+) HTTP Request Status
              • (?P<generation_time_milli>\d*\.?\d+) Amount of time for the server to respond
              • (?P<referrer>.*?) Referrer header
              • (?P<host>[\w\-\.]*) Host
              • (?P<user_agent>.*?) User Agent (what browser, device, etc)

              All the weird things like \s or .+ in between simply account for things like spaces in the log lines. Try playing around with the Regex until it matches everything in your logs. The regexr website makes it all very visual and easy to understand.

              I'm glad that you were able to get log analytics working. I hope this helps!

              F 1 Reply Last reply
              2
              • S Sydney

                Hey there, @FeelNiceInc . I'm glad to hear that my tutorial was helpful for you, and I'm sorry that my provided regex did not work.

                I think @girish 's solution is the best - by changing Cloudron's Nginx webserver to use the default combined log format, matomo's log import script will automatically recognise and import the logs without needing to specify a special regex.

                The regex that I provided in my tutorial was specifically in order to accomodate Cloudron's idiosyncratic combined2 log format -- but otherwise it provides little benefit.

                I'm not sure why the regex didn't work for you, as it is working for me. For future readers that stumble upon this thread, I would recommend going with @girish 's advice, and simply change Cloudron to use the combined format.

                However, if you already have an archive of logs that are in the combined2 format which you need to import, I recommend trying to figure out the correct regex by hand. I use a regex visualiser called RegExr, which makes it easier to craft custom regular expressions.

                The Regexr link to the combined2 log format is here:

                https://regexr.com/6dlnf

                I recommend taking a few lines of your server logs, and pasting them into regexr -- and see what matches, and what doesn't match. The way the regex expression is formatted is that it defines a few named capture groups, which are as follows:

                • (?P<ip>[\w*.:-]+) IP Address
                • (?P<date>.*?) Date
                • (?P<timezone>.*?) Timezone
                • (?P<method>\S+) HTTP Request Method (e.g. Post, Get)
                • (?P<path>.*?) HTTP Request Path (e.g. /homepage.html)
                • (?P<status>\d+) HTTP Request Status
                • (?P<generation_time_milli>\d*\.?\d+) Amount of time for the server to respond
                • (?P<referrer>.*?) Referrer header
                • (?P<host>[\w\-\.]*) Host
                • (?P<user_agent>.*?) User Agent (what browser, device, etc)

                All the weird things like \s or .+ in between simply account for things like spaces in the log lines. Try playing around with the Regex until it matches everything in your logs. The regexr website makes it all very visual and easy to understand.

                I'm glad that you were able to get log analytics working. I hope this helps!

                F Offline
                F Offline
                FeelNiceInc
                wrote on last edited by
                #7

                Hey @sydney,
                I tried regexr.com. The regex seems to be correct. However, I still get the error message. No idea what is wrong here.
                But thanks for your engagement 🙂

                1 Reply Last reply
                0
                Reply
                • Reply as topic
                Log in to reply
                • Oldest to Newest
                • Newest to Oldest
                • Most Votes


                • Login

                • Don't have an account? Register

                • Login or register to search.
                • First post
                  Last post
                0
                • Categories
                • Recent
                • Tags
                • Popular
                • Bookmarks
                • Search