It appears as blue, and only with a right-click it says it is unidentified.
The whole idea was to get also a setting in LT to change the colour of the underline for this case.
For example, if I could change it to pink, I would know just by looking at the document that I needed to add postags (valid words but not with morphological information).
It is possible to get the speller and postag data and list the words that are in the speller, not in postag database. From a corpus, word frequencies could be added. Won’t that be easier?
You may put this rule in a separate category.
And for this category in the user interface you may set the pink colour.
(Tools-LanguageTool-Options-Underline Color of Category and set colour for this category).
@marcoagpinto
Portuguese has a category described as ‘Desenvolvimento’, I added it some time ago, which is the one responsible for those detection. You can find it in Grammar tab, in inside the Options, which I believe you did, since that category is disabled by default.
If you wish to develop the tagger, you can use the standalone tool and change the color of that category for whatever you which, also in the Options, inside a tab conveniently named ‘Underline Color/Cor do Sublinhado’.
I am almost done. To my surprise, the pt Hunspell is very tolerant. At first, it allows for - to break the word. So every word consisting of valid parts separated by - are accepted. That is rather tolerant.
So you will find words like --a and a-- as valid.
I used words found in my collection of PT texts for the frequency; I unmunched Hunspell for the max amount of tolerated words.
I dumped the postag dictionary and used that to check if the words had a postag.
If you send me an email at info at taaltik.nl I will return the results zipped; it is about 200 MB of words and frequency numbers.
By the way… don’t be surprised when there are some words in it that are incorrect. I used Hunspell -G to list the correct words; that has a bug that also lists the parts of corrects words having a - and are correct as a whole, but the part is not.
If you want those to be gone, you can perform a Hunspell -L -d pt_PT on the list to remove those.