Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse

Cloudron Forum

Apps | Demo | Docs | Install

FYI size of n-gram data sets

Scheduled Pinned Locked Moved LanguageTool
7 Posts 6 Posters 147 Views
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • luckowL Offline
    luckowL Offline
    luckow translator
    wrote on last edited by
    #1

    EN is around 8 GB

    Pronouns: he/him | Primary language: German

    necrevistonnezrN 1 Reply Last reply
    3
  • nebulonN Offline
    nebulonN Offline
    nebulon Staff
    wrote on last edited by
    #2

    good point, we will put that in the docs

    1 Reply Last reply
    0
  • RazielKanosR Offline
    RazielKanosR Offline
    RazielKanos
    wrote on last edited by
    #3

    how can i add another language? do i just
    NGRAM_DATASET=("en,de")?

    luckowL vladimir.dV 2 Replies Last reply
    0
  • luckowL Offline
    luckowL Offline
    luckow translator
    replied to RazielKanos on last edited by luckow
    #4

    @RazielKanos NGRAM_DATASET=("en;de") works for me.
    Sorry. Not true 🙂

    Pronouns: he/him | Primary language: German

    1 Reply Last reply
    0
  • vladimir.dV Offline
    vladimir.dV Offline
    vladimir.d
    replied to RazielKanos on last edited by vladimir.d
    #5

    @RazielKanos said in FYI size of n-gram data sets:

    how can i add another language? do i just
    NGRAM_DATASET=("en,de")?

    Basically it's a bash script array variable so you should split values by a whitespace.

    NGRAM_DATASET=("en" "de")
    

    I'm not a German speaker but I heard it works very well.
    Just wondering how it works with two languages.

    1 Reply Last reply
    4
  • girishG Do not disturb
    girishG Do not disturb
    girish Staff
    wrote on last edited by
    #6

    The warning is now in https://docs.cloudron.io/apps/languagetool/#n-grams . Also, the way to install ngrams has slightly changed.

    1 Reply Last reply
    0
  • necrevistonnezrN Offline
    necrevistonnezrN Offline
    necrevistonnezr
    replied to luckow on last edited by
    #7

    @luckow said in FYI size of n-gram data sets:

    EN is around 8 GB

    This is download size. Unpacked it takes 14.34 GB of server space for English and 3.06 GB for German.

    2834EBAC-FFAF-40E4-B4B7-3584448CB671.jpeg 49076E70-5F1C-479F-911E-D1C717771557.jpeg

    1 Reply Last reply
    1

  • Login

  • Don't have an account? Register

  • Login or register to search.
  • First post
    Last post
0
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Login

  • Don't have an account? Register

  • Login or register to search.