Spellchecker improvement discussion

@dnaber, Could you, please, run the updated features extractor?
And could you also SELECT DISTINCT rule_id FROM corrections WHERE rule_id LIKE "MORFOLOGIK_%";?

MORFOLOGIK_RULE_PL_PL
MORFOLOGIK_RULE_RU_RU
MORFOLOGIK_RULE_CA_ES
MORFOLOGIK_RULE_EN_US
MORFOLOGIK_RULE_UK_UA
MORFOLOGIK_RULE_IT_IT
MORFOLOGIK_RULE_ES
MORFOLOGIK_RULE_EN_GB
MORFOLOGIK_RULE_RO_RO
MORFOLOGIK_RULE_SL_SI
MORFOLOGIK_RULE_EN_AU
MORFOLOGIK_RULE_NL_NL
MORFOLOGIK_RULE_SK_SK
MORFOLOGIK_RULE_AST
MORFOLOGIK_RULE_EL_GR
MORFOLOGIK_RULE_EN_NZ
MORFOLOGIK_RULE_TL
MORFOLOGIK_RULE_BE_BY
MORFOLOGIK_RULE_EN_CA
MORFOLOGIK_RULE_BR_FR
MORFOLOGIK_RULE_EN_ZA
MORFOLOGIK_RULE_SR_EKAVIAN

Will send result of feature extractor soon.

features extractor was erroneously containing a mistake – it worked mostly with en- language based records. Could you, please, run the updated features extractor?

Done, but now the result is rather small (3MB).

I’ve improved the errors handling, so could you, please, run the updated features extractor one more time?

So there are morfologik rules that were never logged (i.e. invoked)? For example “MORFOLOGIK_RULE_DE_DE”.

Sorry, I forgot about our rule ids being inconsistent. German rules are: AUSTRIAN_GERMAN_SPELLER_RULE, GERMAN_SPELLER_RULE, SWISS_GERMAN_SPELLER_RULE

Ok, and are there any other languages with morfologik rules named non-morfologik way?

Not sure, please check the getId() of all classes extending SpellingCheckRule.

Ok, thanks!

@dnaber could you, please, run the updated features extractor? I’ve added %GERMAN and FR rules extraction and increased the context window size – now equals to 3.

Running java -jar languagetool-suggestions-logs-features-extractor-1.7.jar I now get:

Exception in thread "main" java.lang.ExceptionInInitializerError
Caused by: java.lang.RuntimeException: Could not activate rules
	at org.languagetool.JLanguageTool.<init>(JLanguageTool.java:192)
	at org.languagetool.JLanguageTool.<init>(JLanguageTool.java:167)
	at io.github.oserikov.languagetool.Main$1.<init>(Main.java:68)
	at io.github.oserikov.languagetool.Main.<clinit>(Main.java:65)
Caused by: java.io.IOException: Cannot load or parse input stream of '/org/languagetool/rules/fr/grammar.xml'
	at org.languagetool.rules.patterns.PatternRuleLoader.getRules(PatternRuleLoader.java:76)
	at org.languagetool.Language.getPatternRules(Language.java:368)
	at org.languagetool.JLanguageTool.activateDefaultPatternRules(JLanguageTool.java:368)
	at org.languagetool.JLanguageTool.<init>(JLanguageTool.java:189)
	... 3 more
Caused by: java.lang.IllegalArgumentException: 'fr' is not a language code known to LanguageTool. Supported language codes are: be-BY, br-FR, ca-ES, de-AT, el-GR, en-AU, en-CA, en-GB, en-NZ, en-US, en-ZA, es, it, nl, pl-PL, ro-RO, ru-RU, sk-SK, sl-SI, sr, tl-PH, uk-UA. The list of languages is read from META-INF/org/languagetool/language-module.properties in the Java classpath. See http://wiki.languagetool.org/java-api for details.
	at org.languagetool.Languages.getLanguageForShortCode(Languages.java:151)
        (...)

Aww, will fix tonight, now afk.

Could you, please re-download the tool? I’ve updated the release with a fix.