I was looking at my “Prontuário” and it has the following rule:
“São oportunos de citar”
It says it is the same for:
oportuno.
importante.
indecente.
interessante.
perigoso.
So, I went to the LT’s text analyses and they match the following postags.
AQ0MS0
AQ0CS0
I need your opinion if I can create a rule with those postags (for all possible words) + plural forms and if should only be used with “citar” or if I can look for any verb in the “citar”.
There is a pattern. If there is a pattern, there is a rule.
That is the motto I follow,.
It is hard to say that is a good rule before trying it. Try it. It seems to be a good redundancy rule.
If it has too many false positives, you can refine it or set as default off.
You are doing it right but in this case the words acts as a noun/activity (usual in infinitives). Disambiguation processes its double meaning and removes the verb form in falar.
The rule is solid, but disambiguation will act on every case that you have a preposition before the verb. Disambiguation can be improved to cover this case, but this is still ahead of the level of precision I am focusing for the moment.
I believe you can make it solid using a workaround like this;
This will match all infinites without caring for the disambiguation. It will fail with the infinitives that require disambiguation, but you can add exceptions to the problematic ones.
For any expression that omits word, there will be false positives. It comes from: coelho à [moda do] caçador.
If you find a list of all dishes that have à followed by a masculine noun, including proper names, it can be solved.
It will only be useful if only the most common ones are added, like coelho à caçador, otherwise it render the detection useless.
I have seen the rule and it seems well-designed. I will have to look into it, but it is better to add this as a bug report. Tagging and synthesizer behaviour should be consistent in both platforms.
Please detail on GitHub the steps to reproduce.
In my Master’s dissertation many years ago I wrote:
“Os métodos ou as técnicas de contagem permitem obter resultados fáceis, mesmo nos casos em que manualmente sejam muito morosos e de difícil contabilização, ou ainda, aqueles em que é possível obter o que se denomina de “falsos positivos”, ou seja, resultados que parecem ser válidos, mas que não o são.”
The cosupervisor replaced the bold with “o não são”.
Can I create a rule with this, or does it need more complex analysis?
“não o são” -> “o não são”
And then fix the false positives, but I am not sure this is correct, to start with.
I would have written like you did, and the various ‘próclise’ rules I have seen, always refer to writing the pronoun between the negation particle and the verb.
It may be better to look for a public reference before committing the rule, or add it as a style rule marked as default=‘off’.
If my understanding of these rules is wrong, I should also review the rules I created, although it may take a while.
PS - Unless you mean to write ‘são’ as a noun. If "não são’ means not sane, you would have to write ‘o não são’. But I am just saying that from the top of my head. I believe that is not the intended meaning in this situation.
I think the following approach should do it and it would eliminate possible confusions, such as:
1: “o que eu queria” -> “o que você queria” (in this case the suggestion wouldn’t be 100% accurate)
Maybe this way?:
<rulegroup id='EU_NÓS_REMOVAL' name="Remover pronome pessoal eu/nós">
<!-- Created by Marco A.G.Pinto, Portuguese rule -->
<rule>
<pattern>
<token>o</token>
<token>que</token>
<token>eu</token>
<token postag_regexp="yes" postag="VMIP1S0"></token>
</pattern>
<message>Pode remover o pronome pessoal: <suggestion>\1 \2 \4</suggestion>.</message>
<example correction="O que quero"><marker>O que eu quero</marker> é casar.</example>
</rule>
<rule>
<pattern>
<token>o</token>
<token>que</token>
<token>nós</token>
<token postag_regexp="yes" postag="VMIP1P0"></token>
</pattern>
<message>Pode remover o pronome pessoal: <suggestion>\1 \2 \4</suggestion>.</message>
<example correction="O que queremos"><marker>O que nós queremos</marker> é casar.</example>
</rule>
</rulegroup>