Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


    Cloudron Forum

    • Register
    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular

    Run s3_media_upload script

    Matrix (Synapse/Element)
    3
    16
    331
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • nichu42
      nichu42 last edited by

      Hey there,

      I finally managed to have new media stored on S3 using the integrated S3 storage module.
      Now I'm puzzled on how to move the existing media to S3.

      I know this is done with the s3_media_upload script, which I found in /app/code/env/bin.

      If I start the script with something as the suggested "s3_media_upload upload /app/data/data/media_store matrix --storage-class STANDARD_IA --delete", it comes back with the following error message:

      s3_media_upload: error: Could not open 'cache.db' as sqlite DB: unable to open database file
      

      I suspect that I am overlooking an important prerequisite. Anyone can help me out?

      Best regards,

      Nic

      admin @ https://blueplanet.social
      Matrix: @nichu42:blueplanet.social

      1 Reply Last reply Reply Quote 1
      • robi
        robi last edited by

        First, find out where it's expecting the cache file. Then see if you can specify a location for it on the command line.

        Alternatively, run it from the /app/data/ location in case it stores it right where it runs from.

        Life of Advanced Technology

        nichu42 1 Reply Last reply Reply Quote 0
        • nichu42
          nichu42 @robi last edited by

          @robi
          Thank you for your response.

          I think I made a major step forward.

          The script expects a database.yaml file to be present. It needs to include the user, password, database and host entries that can be found and copied over from the homeserver.yaml file.
          Once you have prepared this, the script will create the cache.db on its own after you run the following command:

          s3_media_upload update /app/data/data/media_store 1m
          

          1m means all files that haven't been touched for a month will be used.

          Now the script is ready to upload. This can be triggered with the following command:

          s3_media_upload upload --delete /app/data/data/media_store s3_bucket_name
          

          Unfortunately, here I am stuck again. The script returns the following error message:

          botocore.exceptions.NoCredentialsError: Unable to locate credentials
          

          The script documentation states "This module uses boto3, and so the credentials should be specified", and refers to https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html#guide-configuration
          Here I am completely lost again and don't even know where to start. Is this maybe something that needs to be done from Cloudron, @girish ?

          admin @ https://blueplanet.social
          Matrix: @nichu42:blueplanet.social

          robi 1 Reply Last reply Reply Quote 0
          • robi
            robi @nichu42 last edited by

            @nichu42 yeah those are s3 credentials for the bucket that need to be provided..

            see if you can find example commands for the upload step.

            Life of Advanced Technology

            nichu42 1 Reply Last reply Reply Quote 0
            • nichu42
              nichu42 @robi last edited by

              @robi
              The example command is what I stated above.
              The S3 credentials cannot be submitted on the command line.

              They are in the homeserver.yaml, but it seems that boto3 (whatever that is) doesn't read them but expects environment variables to be set. That's why I thought this might be something that needs to be done by Cloudron.

              admin @ https://blueplanet.social
              Matrix: @nichu42:blueplanet.social

              robi 1 Reply Last reply Reply Quote 0
              • robi
                robi @nichu42 last edited by

                @nichu42 hmm, ok.

                what's the error if you rename the .yaml file? (just to make sure that's where it's looking for the info..)

                Life of Advanced Technology

                nichu42 1 Reply Last reply Reply Quote 0
                • nichu42
                  nichu42 @robi last edited by

                  @robi homeserver.yaml is the configuration file for Synapse. It will not start without it.
                  The S3 configuration is correct: Synapse uploads new media to the bucket.
                  However, "boto3" needs different configuration as it seems.

                  admin @ https://blueplanet.social
                  Matrix: @nichu42:blueplanet.social

                  robi 1 Reply Last reply Reply Quote 0
                  • robi
                    robi @nichu42 last edited by

                    @nichu42 right, here are the options:
                    https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html

                    Life of Advanced Technology

                    nichu42 1 Reply Last reply Reply Quote 0
                    • nichu42
                      nichu42 @robi last edited by

                      @robi said in Run s3_media_upload script:

                      @nichu42 right, here are the options:
                      https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html

                      Yes, that's what I figured.
                      But I have no idea how to make any of these options work with Cloudron.
                      The file system is read-only, so I cannot put a config file where "boto3" expects it (~/.aws/credentials).

                      That's why I thought that maybe @girish has to enable the use of environment variables.

                      admin @ https://blueplanet.social
                      Matrix: @nichu42:blueplanet.social

                      girish 2 Replies Last reply Reply Quote 0
                      • girish
                        girish Staff @nichu42 last edited by

                        @nichu42 took me a while to figure what/where this script was. I guess it's this - https://github.com/matrix-org/synapse-s3-storage-provider/blob/main/scripts/s3_media_upload ?

                        1 Reply Last reply Reply Quote 0
                        • girish
                          girish Staff @nichu42 last edited by

                          @nichu42 you have to create a so called database.yaml file manually as per https://github.com/matrix-org/synapse-s3-storage-provider#regular-cleanup-job

                          "database.yaml should contain the keys that would be passed to psycopg2 to connect to your database. They can be found in the contents of the database.args parameter in your homeserver.yaml."

                          From what I can make out from the code, it needs to be like this:

                          postgres:
                              user: xx
                              password: yy
                              database: zz
                              host: postgresql
                          

                          Might be worthwhile asking upstream to document this...

                          nichu42 1 Reply Last reply Reply Quote 1
                          • nichu42
                            nichu42 @girish last edited by

                            @girish Thank you for responding!

                            Yes, this thread is about the script that you have linked (https://github.com/matrix-org/synapse-s3-storage-provider#regular-cleanup-job). It is part of Cloudron's Synapse installation and can be found in /app/code/env/bin.

                            I had already managed to make the database config as you have mentioned in your post.

                            The problem is: The script uses "Boto3" (AWS SDK for Python) which expects the S3 credentials either to be saved in the config file ~/.aws/credentials or as environment variables, see
                            https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html

                            Please correct me if I'm wrong, but Cloudron doesn't grant me access to either of these. That's why I mentioned you in this thread. I think you'd have to enable one of these options to make the script work.

                            admin @ https://blueplanet.social
                            Matrix: @nichu42:blueplanet.social

                            girish 1 Reply Last reply Reply Quote 0
                            • girish
                              girish Staff @nichu42 last edited by

                              @nichu42 From the link you posted, there is a bunch of environment variables you can set - both for the credentials itself and also for the config file. Have you tried those? Or is the question about how to use those env variables?

                              nichu42 1 Reply Last reply Reply Quote 0
                              • nichu42
                                nichu42 @girish last edited by

                                @girish Yes, correct: How to set these environment variables with Cloudron?

                                admin @ https://blueplanet.social
                                Matrix: @nichu42:blueplanet.social

                                girish 1 Reply Last reply Reply Quote 0
                                • girish
                                  girish Staff @nichu42 last edited by

                                  @nichu42 You are running this on a Web Terminal right ? You can just export foo=bar like in a normal terminal and then run the s3_media_upload script ?

                                  nichu42 1 Reply Last reply Reply Quote 1
                                  • nichu42
                                    nichu42 @girish last edited by nichu42

                                    @girish Yay! Thank you.
                                    I am all new to this Linux game so I wasn't aware I could just set the environment variables like that.

                                    For everyone else, this is what you need to do:

                                    ──────────────────────────────

                                    1. Set up S3 with Synapse. See my post here: https://forum.cloudron.io/post/60415

                                    2. Create a database.yaml file in /app/data/configs that contains the postgres database credentials.
                                      You can find those in the existing homeserver.yaml file.

                                        user: xxx
                                        password: xxx
                                        database: xxx
                                        host: postgresql
                                    
                                    1. Create a script (e.g., s3cleanup.sh) with the following contents:
                                    #!/bin/bash
                                    cd /app/data/configs
                                    export AWS_ACCESS_KEY_ID=[your S3 compatible access key]
                                    export AWS_SECRET_ACCESS_KEY=[your s3 compatible secret access key]
                                    /app/code/env/bin/s3_media_upload update /app/data/data/media_store 1m
                                    /app/code/env/bin/s3_media_upload upload --delete --endpoint-url https://yours3storageendpoint.com /app/data/data/media_store [your s3_bucket_name]
                                    
                                    1. Run the s3cleanup.sh script.
                                      It will look up media that hasn't been touched for 1m (= 1 month) or whatever you set above. It needs to be an integeger value, followed by either m = month(s), d = day(s) or y = year(s).
                                      It will create a cache.db file that refers to the media that matches your criteria.
                                      In the second step, it will upload all files from the cache.db to your s3 storage and delete the local copies.

                                    The output looks like this:

                                    Syncing files that haven't been accessed since: 2022-12-25 14:59:14.674154
                                    Synced 603 new rows
                                    100%|████████████████████████████████████| 603/603 [00:00<00:00, 16121.24files/s]
                                    Updated 0 as deleted
                                    100%|████████████████████████████████████| 603/603 [03:25<00:00,  2.93files/s]
                                    Uploaded 603 media out of 603
                                    Uploaded 3203 files
                                    Uploaded 263.6M
                                    Deleted 603 media
                                    Deleted 3203 files
                                    Deleted 263.6M
                                    

                                    Edit: Added path /app/data/configs to script to make it work as cron job.
                                    Edit2: Added more choices for duration suffixes in 's3_media_upload update' job.

                                    Disclaimer: This is to the best of my knowledge and understanding. It worked for me, but I accept no liability for loss of data on your server caused by my incompetence. 😉

                                    admin @ https://blueplanet.social
                                    Matrix: @nichu42:blueplanet.social

                                    1 Reply Last reply Reply Quote 4
                                    • First post
                                      Last post
                                    Powered by NodeBB