[pt] Portuguese rule contribution/discussion

tiagosantos · June 19, 2017, 9:44pm

Thank you.

It suggests “O meu”;
Diz is according to European Portuguese rules. Anyway, this is more of a style rule than a grammar rule, so you can ignore it if you prefer.

To suggest replacing “pás” with “paz”.
A new rule has been added to handle the word confusion.
[pt] one confusion pair added · languagetool-org/languagetool@5b4338b · GitHub
It has very few trigger words, so you may wish to add a few more, if you find more useful examples.

It says “pás” is informal language.
Is 2) a false positive or can it somehow be improved not to be triggered in a sentence such as the above?

Due to the priority system, the word will not be tagged as informal when misused, but a special rule for “pá” has been added as well. This is the first false positive I see with it, but it should reduce them even further.

marcoagpinto · June 21, 2017, 8:54pm

@tiagosantos

Hello!

"
1. Informação pessoal
1.1. Nome
1.2. Morada
1.3. Telemóvel
1.4. E-mail
"

It says in the first line that the sentence starts with a number.

Most of the documents I write follow the logic above:

blah blah blah
1.1. blah
blah
2.1. blah
2.2. blah
etc.

Is it easy to improve the rule not to show the warning?

Thanks!

Kind regards,

tiagosantos · June 22, 2017, 8:04am

I could not reproduce this on my latest build. Only for the second line (1.1. Name).
I have been fiddling with this rule recently, because I found this type of false positive, so it should be fixed now on yesterday’s build. Anyway, I found that the decimal number was also triggered and it has also been improved.

SEND_END detection regression, may be the reason behind some false positives in this situations.

marcoagpinto · June 22, 2017, 8:13am

Thanks, Tiago!

I will test it when the nightly is released.

marcoagpinto · June 23, 2017, 5:44am

@tiagosantos

Sorry for only testing now.
"

Informação pessoal
1.1. Nome
1.2. Morada
1.3. Telemóvel
1.4. E-mail
"

It now flags 1.1.

All the rest works fine.

marcoagpinto · June 23, 2017, 6:38am

@tiagosantos

Tiago,

“Meu irmão, descansa em pás!”

Still triggers informal language

tiagosantos · June 23, 2017, 8:09am

This is not a problem because wrongWordinContext takes priority. Every word can change the interpretation of the sentence completely (specially in disambiguation mechanisms), so we can’t safeguard against it.
Anyway, anyone can change line 20477 to
<exception postag_regexp='yes' postag='(N.|[ADP]..|V.....)[FC].*|SPS.+'/></token>
in their copy if they find this is a common error in their wirtting.

marcoagpinto · June 23, 2017, 9:05am

@tiagosantos

Hello!

You are going to murder me… but I found another issue… there is a dash issue in dates, please see the attached image (I tried to attach an .ODT but it is not supported… which is strange because it worked before?)

It complains about the dash in the first date: “20170326” but not in the one at the end of the page.

.

PS-> Tiago, I tried to paste the text here and then copy/paste into LO and it no longer gave the error, so I am sending you the ODT via e-mail.

Thanks!

Kind regards,

tiagosantos · June 23, 2017, 10:20am

No problem, errors happen. I have seen inconsistant behaviour between LT-server, LT-standalone and LT-LibreOffice before, and there is even a bug I filed related to it. Sentence segmentation method is different.
I have looked into the code already regarding that, but that is an issue that requires more time to solve than what I am willing to spend on it, at the moment.
If the error is too specific, try to just delete the segment and rewrite. Nothing can be done regarding special characters or special formats, specially on LO. For example:

marcoagpinto · July 4, 2017, 7:55am

@tiagosantos

Tiago, I have created a new rule:
“sob o ponto de vista” > “do ponto de vista”

Feel free to improve it:
“sob o MEU/TEU/SEU/NOSSO/VOSSO ponto de vista”

I can’t remember how to add words that may or may not exist.

Thanks!

EDIT: added SEU/NOSSO above

tiagosantos · July 4, 2017, 8:14am

Good one. I didn’t knew there was a debate on this one. But this should be on the “Style” category with an URL explaining it. Is this the best?

Add min='0' to the token to make it optional. Next update I add all those improvements. No worries.

marcoagpinto · July 4, 2017, 8:16am

Thank you, @tiagosantos

marcoagpinto · July 8, 2017, 11:23am

@tiagosantos

I was on Facebook and saw this mistake:
“Temos de por em prática tudo o que aprendemos!”

I was thinking about creating a rule that suggests “pôr” but I was wondering if the rule should be created to match the words “POR EM” or if there are more matches to be checked.

Could you advice?

Thanks!

tiagosantos · July 8, 2017, 6:51pm

This can be very useful.
I believe that the best is to added a simple rule (detect both “de por” and “por em”) and correct after checking the regression tests. I can’t recall any exception, but they are bound to exist.

marcoagpinto · July 8, 2017, 7:29pm

Thanks!

marcoagpinto · July 8, 2017, 10:48pm

@tiagosantos

Nightly results of my “por > pôr” rule:

+Title: Beja
+Line 1, column 44, Rule ID: POR[1]
+Message: Substitua por 'pôr'.
+Suggestion: pôr
+A sua importância é atestada pelo facto de por lá passar uma das vias romanas.
+                                           ^^^                                
+

Should I remove the “de por” rule or is there a way to improve it?

The “por em” seems to be okay.

Thanks!

marcoagpinto · July 9, 2017, 12:50pm

@tiagosantos

Thank you for fixing it.

Kind regards,

tiagosantos · July 9, 2017, 2:18pm

No worries. Best regards.

marcoagpinto · July 18, 2017, 8:31am

@tiagosantos

Hello Tiago,

I am not sure if this is a false positive.

LT flagged:
“Ele disse que quem estabelece a % de incapacidade é o médico a que vou na sexta.”

It suggest “à”.

Now I don’t know how to write the sentence.

Thanks!

tiagosantos · July 18, 2017, 9:21am

Great question.
Seems like a false positive to me, though that made me have doubts about it too, since it does make logical sense to use ‘à’ for time expressions.
After today’s regression test results (that should be very verbose due to the NO_VERB rule correction) I will try to confirm and fix it.
This false positive is very misleading and needs to be addressed.