[pt] Portuguese rule contribution/discussion

@tiagosantos

Hello!

I was looking at my “Prontuário” and it has the following rule:
“São oportunos de citar”

It says it is the same for:
oportuno.
importante.
indecente.
interessante.
perigoso.

So, I went to the LT’s text analyses and they match the following postags.
AQ0MS0
AQ0CS0

I need your opinion if I can create a rule with those postags (for all possible words) + plural forms and if should only be used with “citar” or if I can look for any verb in the “citar”.

Thank you!

Kind regards,

There is a pattern. If there is a pattern, there is a rule.
That is the motto I follow,.
It is hard to say that is a good rule before trying it. Try it. It seems to be a good redundancy rule.
If it has too many false positives, you can refine it or set as default off.

@tiagosantos

Thank you, my friend!

I will add the rule today, the moment I have a chance.

:slight_smile: :slight_smile: :slight_smile:

@tiagosantos

Help!

<rule id="ADJ_DE_VERB" name="Adj + Verbo">
  <pattern>
	<token regexp="yes">importantes?|indecentes?|interessantes?|oportunos?|perigosos?</token>
	<marker>
		<token>de</token>	
	</marker>
	<token postag_regexp='yes' postag='V.+'/>					
 </pattern>	 	 
 <message>Substitua por <suggestion>\1 \3</suggestion>.</message>
 <suggestion>\1 \3</suggestion>
 <example correction="oportunos citar">Tais textos são <marker>oportunos de citar</marker>.</example>	 	
</rule> 

It only works with the verb “citar”

How do I make it work with the other verbs?

Thank you!

	<token postag_regexp='yes' postag='V.+'/>	

Like this? It seems correct.

If I write in the stand-alone:
“Tais textos são oportunos de citar.”
it flags the sentence.

If I write:
“Tais textos são oportunos de falar.”
no flagging happens.

This means I am doing something wrong, probably in the suggestion line.

EDIT: I am not sure if this rule should only be used with “citar”. “falar” doesn’t sound very well.

You are doing it right but in this case the words acts as a noun/activity (usual in infinitives). Disambiguation processes its double meaning and removes the verb form in falar.
The rule is solid, but disambiguation will act on every case that you have a preposition before the verb. Disambiguation can be improved to cover this case, but this is still ahead of the level of precision I am focusing for the moment.
I believe you can make it solid using a workaround like this;

<token regexp='yes' case_sensitive='yes'>.*[aeiou]r</token>

This will match all infinites without caring for the disambiguation. It will fail with the infinitives that require disambiguation, but you can add exceptions to the problematic ones.

@tiagosantos

Thanks!

If you know of a better name for the rule, feel free to change it.

1 Like

@tiagosantos

Hello!

Sorry to bother you.

I found a false positive in the sentence:
“Há quem goste de coelho à caçador.”

It reports a gender issue.

I will try to add a rule tonight or tomorrow but I have been working on several tasks at the same time, so, no promises.

Thanks!

For any expression that omits word, there will be false positives. It comes from:
coelho à [moda do] caçador.

If you find a list of all dishes that have à followed by a masculine noun, including proper names, it can be solved.
It will only be useful if only the most common ones are added, like coelho à caçador, otherwise it render the detection useless.

@tiagosantos

Hello!

I have created a rule today that suggests replacing the verb FAZER with PRATICAR regarding “desporto”.

But, I have just tested it in Writer and it suggest “(praticar)” instead of “praticar”.

"Tens de FAZER desporto."

In the stand-alone tool it seems to work well.

What have I done wrong?

Thank you!

I have seen the rule and it seems well-designed. I will have to look into it, but it is better to add this as a bug report. Tagging and synthesizer behaviour should be consistent in both platforms.
Please detail on GitHub the steps to reproduce.

Done!

@tiagosantos

Hello Tiago,

I was wondering if you could improve the “contudo” rule:
“Tentámos contudo otimizar o código ao máximo.”

MS Word 2016 suggests two commas:
“Tentámos, contudo, otimizar o código ao máximo.”

But LanguageTool first suggests the first comma and after we fix it, it suggests the second.

Thank you!

Kind regards,

Added to the TODO list.

@tiagosantos

In my Master’s dissertation many years ago I wrote:
“Os métodos ou as técnicas de contagem permitem obter resultados fáceis, mesmo nos casos em que manualmente sejam muito morosos e de difícil contabilização, ou ainda, aqueles em que é possível obter o que se denomina de “falsos positivos”, ou seja, resultados que parecem ser válidos, mas que não o são.”

The cosupervisor replaced the bold with “o não são”.

Can I create a rule with this, or does it need more complex analysis?
“não o são” -> “o não são”

Thanks!

You can make something like:

<token>não</token>
<token>o</token>
<token postag_regexp='yes' postag='V.*'/>
<message><suggestion>\2 \1 \3</suggestion>

And then fix the false positives, but I am not sure this is correct, to start with.
I would have written like you did, and the various ‘próclise’ rules I have seen, always refer to writing the pronoun between the negation particle and the verb.
It may be better to look for a public reference before committing the rule, or add it as a style rule marked as default=‘off’.
If my understanding of these rules is wrong, I should also review the rules I created, although it may take a while.

PS - Unless you mean to write ‘são’ as a noun. If "não são’ means not sane, you would have to write ‘o não são’. But I am just saying that from the top of my head. I believe that is not the intended meaning in this situation.

@tiagosantos

I decided not to implement this rule because “são” can be a noun and a verb.

However I added a rule from the “prontuário” the other night.

I would also like to thank you for having improved some of my rules.

@tiagosantos

Hello Tiago,

I was thinking about a nice rule to omit “eu”, for example:
“o que EU quero”

  1. o
  2. que
  3. eu
  4. VERB in that form

Then, it would suggest 1) 2) and 4) removing 3).

Is it a good approach?

If yes, what should I suggest in the rule (grammar text)?

Thank you!

Kind regards,

Edit: Maybe also if 3) is “nós”?

I think the following approach should do it and it would eliminate possible confusions, such as:
1: “o que eu queria” -> “o que você queria” (in this case the suggestion wouldn’t be 100% accurate)

Maybe this way?:

  <rulegroup id='EU_NÓS_REMOVAL' name="Remover pronome pessoal eu/nós">
    <!--      Created by Marco A.G.Pinto, Portuguese rule      -->
    <rule>
      <pattern>
		<token>o</token>
		<token>que</token>
		<token>eu</token>
		<token postag_regexp="yes" postag="VMIP1S0"></token>
      </pattern>
      <message>Pode remover o pronome pessoal: <suggestion>\1 \2 \4</suggestion>.</message>
      <example correction="O que quero"><marker>O que eu quero</marker> é casar.</example>
    </rule>
    <rule>
      <pattern>
		<token>o</token>
		<token>que</token>
		<token>nós</token>
		<token postag_regexp="yes" postag="VMIP1P0"></token>
      </pattern>
      <message>Pode remover o pronome pessoal: <suggestion>\1 \2 \4</suggestion>.</message>
      <example correction="O que queremos"><marker>O que nós queremos</marker> é casar.</example>
    </rule>
  </rulegroup>