FYI size of n-gram data sets
-
EN is around 8 GB
-
how can i add another language? do i just
NGRAM_DATASET=("en,de")? -
how can i add another language? do i just
NGRAM_DATASET=("en,de")?@RazielKanos NGRAM_DATASET=("en;de")
works for me.
Sorry. Not true
-
how can i add another language? do i just
NGRAM_DATASET=("en,de")?@RazielKanos said in FYI size of n-gram data sets:
how can i add another language? do i just
NGRAM_DATASET=("en,de")?Basically it's a bash script array variable so you should split values by a whitespace.
NGRAM_DATASET=("en" "de")I'm not a German speaker but I heard it works very well.
Just wondering how it works with two languages. -
@luckow said in FYI size of n-gram data sets:
EN is around 8 GB
This is download size. Unpacked it takes 14.34 GB of server space for English and 3.06 GB for German.

Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login