<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[FYI size of n-gram data sets]]></title><description><![CDATA[<p dir="auto">EN is around 8 GB</p>
]]></description><link>https://forum.cloudron.io/topic/8542/fyi-size-of-n-gram-data-sets</link><generator>RSS for Node</generator><lastBuildDate>Thu, 12 Mar 2026 02:05:24 GMT</lastBuildDate><atom:link href="https://forum.cloudron.io/topic/8542.rss" rel="self" type="application/rss+xml"/><pubDate>Thu, 26 Jan 2023 22:29:47 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to FYI size of n-gram data sets on Tue, 07 Feb 2023 05:42:27 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/luckow" aria-label="Profile: luckow">@<bdi>luckow</bdi></a> said in <a href="/post/60913">FYI size of n-gram data sets</a>:</p>
<blockquote>
<p dir="auto">EN is around 8 GB</p>
</blockquote>
<p dir="auto">This is download size. Unpacked it takes 14.34 GB of server space for English and 3.06 GB for German.</p>
<p dir="auto"><img src="/assets/uploads/files/1675748543264-2834ebac-ffaf-40e4-b4b7-3584448cb671.jpeg" alt="2834EBAC-FFAF-40E4-B4B7-3584448CB671.jpeg" class=" img-fluid img-markdown" /> <img src="/assets/uploads/files/1675748543186-49076e70-5f1c-479f-911e-d1c717771557.jpeg" alt="49076E70-5F1C-479F-911E-D1C717771557.jpeg" class=" img-fluid img-markdown" /></p>
]]></description><link>https://forum.cloudron.io/post/61594</link><guid isPermaLink="true">https://forum.cloudron.io/post/61594</guid><dc:creator><![CDATA[necrevistonnezr]]></dc:creator><pubDate>Tue, 07 Feb 2023 05:42:27 GMT</pubDate></item><item><title><![CDATA[Reply to FYI size of n-gram data sets on Mon, 06 Feb 2023 20:36:29 GMT]]></title><description><![CDATA[<p dir="auto">The warning is now in <a href="https://docs.cloudron.io/apps/languagetool/#n-grams" target="_blank" rel="noopener noreferrer nofollow ugc">https://docs.cloudron.io/apps/languagetool/#n-grams</a> . Also, the way to install ngrams has slightly changed.</p>
]]></description><link>https://forum.cloudron.io/post/61583</link><guid isPermaLink="true">https://forum.cloudron.io/post/61583</guid><dc:creator><![CDATA[girish]]></dc:creator><pubDate>Mon, 06 Feb 2023 20:36:29 GMT</pubDate></item><item><title><![CDATA[Reply to FYI size of n-gram data sets on Mon, 30 Jan 2023 10:57:20 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/razielkanos" aria-label="Profile: RazielKanos">@<bdi>RazielKanos</bdi></a> said in <a href="/post/61004">FYI size of n-gram data sets</a>:</p>
<blockquote>
<p dir="auto">how can i add another language? do i just<br />
NGRAM_DATASET=("en,de")?</p>
</blockquote>
<p dir="auto">Basically it's a bash script array variable so you should split values by a whitespace.</p>
<pre><code>NGRAM_DATASET=("en" "de")
</code></pre>
<p dir="auto">I'm not a German speaker but I heard it works very well.<br />
Just wondering how it works with two languages.</p>
]]></description><link>https://forum.cloudron.io/post/61053</link><guid isPermaLink="true">https://forum.cloudron.io/post/61053</guid><dc:creator><![CDATA[vladimir.d]]></dc:creator><pubDate>Mon, 30 Jan 2023 10:57:20 GMT</pubDate></item><item><title><![CDATA[Reply to FYI size of n-gram data sets on Sun, 29 Jan 2023 14:50:34 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/razielkanos" aria-label="Profile: RazielKanos">@<bdi>RazielKanos</bdi></a> NGRAM_DATASET=("en;de") <s>works for me.</s><br />
Sorry. Not true <img src="https://forum.cloudron.io/assets/plugins/nodebb-plugin-emoji/emoji/android/1f642.png?v=c3aa4c12b7e" class="not-responsive emoji emoji-android emoji--slightly_smiling_face" style="height:23px;width:auto;vertical-align:middle" title=":)" alt="🙂" /></p>
]]></description><link>https://forum.cloudron.io/post/61014</link><guid isPermaLink="true">https://forum.cloudron.io/post/61014</guid><dc:creator><![CDATA[luckow]]></dc:creator><pubDate>Sun, 29 Jan 2023 14:50:34 GMT</pubDate></item><item><title><![CDATA[Reply to FYI size of n-gram data sets on Sun, 29 Jan 2023 07:50:57 GMT]]></title><description><![CDATA[<p dir="auto">how can i add another language? do i just<br />
NGRAM_DATASET=("en,de")?</p>
]]></description><link>https://forum.cloudron.io/post/61004</link><guid isPermaLink="true">https://forum.cloudron.io/post/61004</guid><dc:creator><![CDATA[RazielKanos]]></dc:creator><pubDate>Sun, 29 Jan 2023 07:50:57 GMT</pubDate></item><item><title><![CDATA[Reply to FYI size of n-gram data sets on Fri, 27 Jan 2023 10:14:26 GMT]]></title><description><![CDATA[<p dir="auto">good point, we will put that in the docs</p>
]]></description><link>https://forum.cloudron.io/post/60931</link><guid isPermaLink="true">https://forum.cloudron.io/post/60931</guid><dc:creator><![CDATA[nebulon]]></dc:creator><pubDate>Fri, 27 Jan 2023 10:14:26 GMT</pubDate></item></channel></rss>