Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps | Demo | Docs | Install
  1. Cloudron Forum
  2. Superset
  3. Metadata DB, Caching Redis and Celery Executor

Metadata DB, Caching Redis and Celery Executor

Scheduled Pinned Locked Moved Solved Superset
7 Posts 3 Posters 750 Views 3 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • C Offline
    C Offline
    Charles_Nkdl
    wrote on last edited by
    #1

    Hello Cloudron,

    I'm currently using Superset for dashboarding from various datasources. Some of these queries takes a lot of times and ressources actually and i wanted to optimize this.

    Some easy solutions to optimize these times in Superset is modifying multiples things in the config, which i want to do as i did in my local environment but sadly, i'm not too acquainted with Cloudron environnement so i wanted to ask for your feedback.

    1 / Changing the Sqlite Metadatabase of Superset

    When you initialize Superset, it does create a Sqlite DB but it does limit its potential. Superset handles and tested a postgreSQL or MySQL metadata backend. I checked the dockerfile from your git superset and the superset_config and a postgre metadb was already setup in the config file

    2/ Same thing with Redis and Celery, the config is present in the git of cloudron implementation (https://git.cloudron.io/cloudron/superset-app/-/blob/main/superset_config.py?ref_type=heads)

    Sadly, even if redis and celery is configured in the super superset_config in the git, it doesnt work as checking logs of celery and cache of redis using the connection port

    I tried fidgetting it with my local config but i do miss something to make it work, do you think you could help me on this point ???

    Thank you !

    girishG 1 Reply Last reply
    1
    • C Charles_Nkdl

      Hello Cloudron,

      I'm currently using Superset for dashboarding from various datasources. Some of these queries takes a lot of times and ressources actually and i wanted to optimize this.

      Some easy solutions to optimize these times in Superset is modifying multiples things in the config, which i want to do as i did in my local environment but sadly, i'm not too acquainted with Cloudron environnement so i wanted to ask for your feedback.

      1 / Changing the Sqlite Metadatabase of Superset

      When you initialize Superset, it does create a Sqlite DB but it does limit its potential. Superset handles and tested a postgreSQL or MySQL metadata backend. I checked the dockerfile from your git superset and the superset_config and a postgre metadb was already setup in the config file

      2/ Same thing with Redis and Celery, the config is present in the git of cloudron implementation (https://git.cloudron.io/cloudron/superset-app/-/blob/main/superset_config.py?ref_type=heads)

      Sadly, even if redis and celery is configured in the super superset_config in the git, it doesnt work as checking logs of celery and cache of redis using the connection port

      I tried fidgetting it with my local config but i do miss something to make it work, do you think you could help me on this point ???

      Thank you !

      girishG Offline
      girishG Offline
      girish
      Staff
      wrote on last edited by
      #2

      @Charles_Nkdl said in Metadata DB, Caching Redis and Celery Executor:

      1 / Changing the Sqlite Metadatabase of Superset

      Cloudron package uses postgresql . See https://git.cloudron.io/cloudron/superset-app/-/blob/main/superset_config.py?ref_type=heads#L11

      Sadly, even if redis and celery is configured in the super superset_config in the git, it doesnt work as checking logs of celery and cache of redis using the connection port

      This seems like a bug. What error do you see in the logs? We should fix the package.

      1 Reply Last reply
      0
      • C Offline
        C Offline
        Charles_Nkdl
        wrote on last edited by
        #3

        Hi Girish !
        I don't see any error, there is just no logs present in the redis cache from superset as if it wasnt connected to cache the results from dashboards and datasets, i think some flags are missing in the super config
        Redis cache 0 handles the pidbox and queue of celery workers so not a lot of things,
        cache 1 is this :
        image.png
        There is only celery task metadata, but sadly i can't connect to celery logs directly as it give me an error :

        celery -A superset status
        Traceback (most recent call last):
          File "/app/code/superset/kombu/connection.py", line 472, in _reraise_as_library_errors
            yield
          File "/app/code/superset/kombu/connection.py", line 459, in _ensure_connection
            return retry_over_time(
          File "/app/code/superset/kombu/utils/functional.py", line 318, in retry_over_time
            return fun(*args, **kwargs)
          File "/app/code/superset/kombu/connection.py", line 934, in _connection_factory
            self._connection = self._establish_connection()
          File "/app/code/superset/kombu/connection.py", line 860, in _establish_connection
            conn = self.transport.establish_connection()
          File "/app/code/superset/kombu/transport/pyamqp.py", line 203, in establish_connection
            conn.connect()
          File "/app/code/superset/amqp/connection.py", line 324, in connect
            self.transport.connect()
          File "/app/code/superset/amqp/transport.py", line 129, in connect
            self._connect(self.host, self.port, self.connect_timeout)
          File "/app/code/superset/amqp/transport.py", line 184, in _connect
            self.sock.connect(sa)
        ConnectionRefusedError: [Errno 111] Connection refused
        
        The above exception was the direct cause of the following exception:
        
        Traceback (most recent call last):
          File "/app/code/superset/bin/celery", line 8, in <module>
            sys.exit(main())
          File "/app/code/superset/celery/__main__.py", line 15, in main
            sys.exit(_main())
          File "/app/code/superset/celery/bin/celery.py", line 236, in main
            return celery(auto_envvar_prefix="CELERY")
          File "/app/code/superset/click/core.py", line 1157, in __call__
            return self.main(*args, **kwargs)
          File "/app/code/superset/click/core.py", line 1078, in main
            rv = self.invoke(ctx)
          File "/app/code/superset/click/core.py", line 1688, in invoke
            return _process_result(sub_ctx.command.invoke(sub_ctx))
          File "/app/code/superset/click/core.py", line 1434, in invoke
            return ctx.invoke(self.callback, **ctx.params)
          File "/app/code/superset/click/core.py", line 783, in invoke
            return __callback(*args, **kwargs)
          File "/app/code/superset/click/decorators.py", line 33, in new_func
            return f(get_current_context(), *args, **kwargs)
          File "/app/code/superset/celery/bin/base.py", line 135, in caller
            return f(ctx, *args, **kwargs)
          File "/app/code/superset/celery/bin/control.py", line 133, in status
            callback=callback).ping()
          File "/app/code/superset/celery/app/control.py", line 294, in ping
            return self._request('ping')
          File "/app/code/superset/celery/app/control.py", line 106, in _request
            return self._prepare(self.app.control.broadcast(
          File "/app/code/superset/celery/app/control.py", line 776, in broadcast
            return self.mailbox(conn)._broadcast(
          File "/app/code/superset/kombu/pidbox.py", line 330, in _broadcast
            chan = channel or self.connection.default_channel
          File "/app/code/superset/kombu/connection.py", line 953, in default_channel
            self._ensure_connection(**conn_opts)
          File "/app/code/superset/kombu/connection.py", line 458, in _ensure_connection
            with ctx():
          File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
            self.gen.throw(typ, value, traceback)
          File "/app/code/superset/kombu/connection.py", line 476, in _reraise_as_library_errors
            raise ConnectionError(str(exc)) from exc
        kombu.exceptions.OperationalError: [Errno 111] Connection refused
        

        I tried supercharging another dashboard following the docs and trying to adapt to your env: https://superset.apache.org/docs/configuration/cache
        Here is the result :
        image.png
        So i can manage to get cache on my viz, i think it just lacked some flags
        here is my config_file :

        import os
        import sys
        
        REDIS_HOST = os.environ["CLOUDRON_REDIS_HOST"]
        REDIS_PORT = os.environ["CLOUDRON_REDIS_PORT"]
        REDIS_CELERY_DB = "0"
        REDIS_RESULTS_DB = "1"
        
        
        FILTER_STATE_CACHE_CONFIG = {
            'CACHE_TYPE': 'RedisCache',
            'CACHE_DEFAULT_TIMEOUT': 86400,
            'CACHE_KEY_PREFIX': 'superset_filter_cache',
            'CACHE_REDIS_URL': f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_RESULTS_DB}"
        }
        
        EXPLORE_FORM_DATA_CACHE_CONFIG = {
            'CACHE_TYPE': 'RedisCache',
            'CACHE_DEFAULT_TIMEOUT': 86400,
            'CACHE_KEY_PREFIX': 'superset_filter_cache',
            'CACHE_REDIS_URL': f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_RESULTS_DB}"
        }
        
        CACHE_CONFIG = {
            'CACHE_TYPE': 'RedisCache',
            'CACHE_DEFAULT_TIMEOUT': 86400,
            'CACHE_KEY_PREFIX': 'superset_filter_cache',
            'CACHE_REDIS_URL': f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_RESULTS_DB}"
        }
        
        DATA_CACHE_CONFIG = {
            'CACHE_TYPE': 'RedisCache',
            'CACHE_DEFAULT_TIMEOUT': 86400,
            'CACHE_KEY_PREFIX': 'superset_filter_cache',
            'CACHE_REDIS_URL': f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_RESULTS_DB}"
        }
        
        class CeleryConfig(object):
            broker_url = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_CELERY_DB}"
            imports = (
                "superset.sql_lab",
                "superset.tasks.scheduler",
            )
            result_backend = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_CELERY_DB}"
            worker_prefetch_multiplier = 10
            task_acks_late = True
            task_annotations = {
                "sql_lab.get_sql_results": {
                    "rate_limit": "100/s",
                },
            }
        
        CELERY_CONFIG = CeleryConfig
        

        I don't know if you think it is ok...
        But sadly i couldn't get celery access to logs or see the tasks results to follow activity

        vladimir.dV 1 Reply Last reply
        0
        • C Charles_Nkdl

          Hi Girish !
          I don't see any error, there is just no logs present in the redis cache from superset as if it wasnt connected to cache the results from dashboards and datasets, i think some flags are missing in the super config
          Redis cache 0 handles the pidbox and queue of celery workers so not a lot of things,
          cache 1 is this :
          image.png
          There is only celery task metadata, but sadly i can't connect to celery logs directly as it give me an error :

          celery -A superset status
          Traceback (most recent call last):
            File "/app/code/superset/kombu/connection.py", line 472, in _reraise_as_library_errors
              yield
            File "/app/code/superset/kombu/connection.py", line 459, in _ensure_connection
              return retry_over_time(
            File "/app/code/superset/kombu/utils/functional.py", line 318, in retry_over_time
              return fun(*args, **kwargs)
            File "/app/code/superset/kombu/connection.py", line 934, in _connection_factory
              self._connection = self._establish_connection()
            File "/app/code/superset/kombu/connection.py", line 860, in _establish_connection
              conn = self.transport.establish_connection()
            File "/app/code/superset/kombu/transport/pyamqp.py", line 203, in establish_connection
              conn.connect()
            File "/app/code/superset/amqp/connection.py", line 324, in connect
              self.transport.connect()
            File "/app/code/superset/amqp/transport.py", line 129, in connect
              self._connect(self.host, self.port, self.connect_timeout)
            File "/app/code/superset/amqp/transport.py", line 184, in _connect
              self.sock.connect(sa)
          ConnectionRefusedError: [Errno 111] Connection refused
          
          The above exception was the direct cause of the following exception:
          
          Traceback (most recent call last):
            File "/app/code/superset/bin/celery", line 8, in <module>
              sys.exit(main())
            File "/app/code/superset/celery/__main__.py", line 15, in main
              sys.exit(_main())
            File "/app/code/superset/celery/bin/celery.py", line 236, in main
              return celery(auto_envvar_prefix="CELERY")
            File "/app/code/superset/click/core.py", line 1157, in __call__
              return self.main(*args, **kwargs)
            File "/app/code/superset/click/core.py", line 1078, in main
              rv = self.invoke(ctx)
            File "/app/code/superset/click/core.py", line 1688, in invoke
              return _process_result(sub_ctx.command.invoke(sub_ctx))
            File "/app/code/superset/click/core.py", line 1434, in invoke
              return ctx.invoke(self.callback, **ctx.params)
            File "/app/code/superset/click/core.py", line 783, in invoke
              return __callback(*args, **kwargs)
            File "/app/code/superset/click/decorators.py", line 33, in new_func
              return f(get_current_context(), *args, **kwargs)
            File "/app/code/superset/celery/bin/base.py", line 135, in caller
              return f(ctx, *args, **kwargs)
            File "/app/code/superset/celery/bin/control.py", line 133, in status
              callback=callback).ping()
            File "/app/code/superset/celery/app/control.py", line 294, in ping
              return self._request('ping')
            File "/app/code/superset/celery/app/control.py", line 106, in _request
              return self._prepare(self.app.control.broadcast(
            File "/app/code/superset/celery/app/control.py", line 776, in broadcast
              return self.mailbox(conn)._broadcast(
            File "/app/code/superset/kombu/pidbox.py", line 330, in _broadcast
              chan = channel or self.connection.default_channel
            File "/app/code/superset/kombu/connection.py", line 953, in default_channel
              self._ensure_connection(**conn_opts)
            File "/app/code/superset/kombu/connection.py", line 458, in _ensure_connection
              with ctx():
            File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
              self.gen.throw(typ, value, traceback)
            File "/app/code/superset/kombu/connection.py", line 476, in _reraise_as_library_errors
              raise ConnectionError(str(exc)) from exc
          kombu.exceptions.OperationalError: [Errno 111] Connection refused
          

          I tried supercharging another dashboard following the docs and trying to adapt to your env: https://superset.apache.org/docs/configuration/cache
          Here is the result :
          image.png
          So i can manage to get cache on my viz, i think it just lacked some flags
          here is my config_file :

          import os
          import sys
          
          REDIS_HOST = os.environ["CLOUDRON_REDIS_HOST"]
          REDIS_PORT = os.environ["CLOUDRON_REDIS_PORT"]
          REDIS_CELERY_DB = "0"
          REDIS_RESULTS_DB = "1"
          
          
          FILTER_STATE_CACHE_CONFIG = {
              'CACHE_TYPE': 'RedisCache',
              'CACHE_DEFAULT_TIMEOUT': 86400,
              'CACHE_KEY_PREFIX': 'superset_filter_cache',
              'CACHE_REDIS_URL': f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_RESULTS_DB}"
          }
          
          EXPLORE_FORM_DATA_CACHE_CONFIG = {
              'CACHE_TYPE': 'RedisCache',
              'CACHE_DEFAULT_TIMEOUT': 86400,
              'CACHE_KEY_PREFIX': 'superset_filter_cache',
              'CACHE_REDIS_URL': f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_RESULTS_DB}"
          }
          
          CACHE_CONFIG = {
              'CACHE_TYPE': 'RedisCache',
              'CACHE_DEFAULT_TIMEOUT': 86400,
              'CACHE_KEY_PREFIX': 'superset_filter_cache',
              'CACHE_REDIS_URL': f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_RESULTS_DB}"
          }
          
          DATA_CACHE_CONFIG = {
              'CACHE_TYPE': 'RedisCache',
              'CACHE_DEFAULT_TIMEOUT': 86400,
              'CACHE_KEY_PREFIX': 'superset_filter_cache',
              'CACHE_REDIS_URL': f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_RESULTS_DB}"
          }
          
          class CeleryConfig(object):
              broker_url = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_CELERY_DB}"
              imports = (
                  "superset.sql_lab",
                  "superset.tasks.scheduler",
              )
              result_backend = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_CELERY_DB}"
              worker_prefetch_multiplier = 10
              task_acks_late = True
              task_annotations = {
                  "sql_lab.get_sql_results": {
                      "rate_limit": "100/s",
                  },
              }
          
          CELERY_CONFIG = CeleryConfig
          

          I don't know if you think it is ok...
          But sadly i couldn't get celery access to logs or see the tasks results to follow activity

          vladimir.dV Offline
          vladimir.dV Offline
          vladimir.d
          wrote on last edited by
          #4

          @Charles_Nkdl said in Metadata DB, Caching Redis and Celery Executor:

          There is only celery task metadata, but sadly i can't connect to celery logs directly as it give me an error

          Could you try to run:

          $ gosu cloudron bash -c "source /app/pkg/env.sh && celery --app=superset.tasks.celery_app:app status"
          
          

          It gives some output for me:

          $ gosu cloudron bash -c "source /app/pkg/env.sh && celery --app=superset.tasks.celery_app:app status"
          Loaded your LOCAL configuration at [/app/pkg/superset_config.py]
          2024-07-03 13:02:37,979:INFO:superset.utils.logging_configurator:logging was configured successfully
          2024-07-03 13:02:37,982:INFO:root:Configured event logger of type <class 'superset.utils.log.DBEventLogger'>
          ->  celery@c67725ff-11b2-4620-b5dd-3a420ab289a2: OK
           
           1 node online.
          
          1 Reply Last reply
          0
          • C Offline
            C Offline
            Charles_Nkdl
            wrote on last edited by
            #5
            gosu cloudron bash -c "source /app/pkg/env.sh && celery --app=superset.tasks.celery_app:app status"
            Loaded your LOCAL configuration at [/app/pkg/superset_config.py]
            2024-07-04 14:43:03,737:INFO:superset.utils.logging_configurator:logging was configured successfully
            2024-07-04 14:43:03,739:INFO:root:Configured event logger of type <class 'superset.utils.log.DBEventLogger'>
            /app/code/superset/flask_limiter/extension.py:333: UserWarning: Using the in-memory storage for tracking rate limits as no storage was explicitly specified. This is not recommended for production use. See: https://flask-limiter.readthedocs.io#configuring-a-storage-backend for documentation about configuring the storage backend.
              warnings.warn(
            ->  celery@d4fc71d0-b0cc-4acc-a9f8-ec7dd74b5096: OK
            
            1 node online.
            

            It does give me some feedback

            But the redis cache results are different as if not each data (dashboards , query, filters ....) was correctly cached in the results

            I won't lie i'm not good enough to decipher these redis logs and what they corresponds to exactly, but the loading speed between the reload of dashboards lacks something in the config file

            Sadly, i don't have the expertise to pinpoint which components is lacking something as i don't have much experience in celery nor redis... And superset documentation is quite obscure at some points,

            Sorry

            1 Reply Last reply
            1
            • girishG Offline
              girishG Offline
              girish
              Staff
              wrote on last edited by
              #6

              The new superset package got rid of the warnings and fixed up the redis configuration.

              1 Reply Last reply
              0
              • C Offline
                C Offline
                Charles_Nkdl
                wrote on last edited by
                #7

                Works fine thank you for your work !

                1 Reply Last reply
                0
                • nebulonN nebulon marked this topic as a question on
                • nebulonN nebulon has marked this topic as solved on
                Reply
                • Reply as topic
                Log in to reply
                • Oldest to Newest
                • Newest to Oldest
                • Most Votes


                • Login

                • Don't have an account? Register

                • Login or register to search.
                • First post
                  Last post
                0
                • Categories
                • Recent
                • Tags
                • Popular
                • Bookmarks
                • Search