Help needed

I anyone willing to help me doing the programming part of the introduction of the new AI-like confusion detection? There is a lot of potential power in it for common mistakes like ‘me/mijn’, ‘u/uw’ , mistte/miste etc.

I do have a large corpus which can be split in ‘high quality’ vs ‘low quality’ text.
Tokenizing is different for Dutch than in most other languages; oma’s is considered as one token e.g., so there could be a challenge.