Contributing to LanguageTool

Hello everyone,

My name is Gaurav Sahu, an undergraduate student from IIT Kharagpur, India. I would like to contribute to LanguageTool.

I have a keen interest in NLP and I have been working on NLP projects for some time. The major programming languages I use to code are Python and Ruby. So, I am comfortable with the libraries like tensorflow or pytorch but I have no experience of coding in Java. However, I am willing to learn it if required and would be grateful if anyone could guide me on how to start contributing.

I have forked and set up LanguageTool in my local machine following the instructions in development-overview. I also went through the make-languagetool-better and the issues page and #872 felt like something I could work on initially. That’s just my hunch, so kindly suggest me any issue that you feel I should be working on.

PS: I was just curious, is there a way I can contribute whilst coding majorly in another language? (like what @gulp21 did with the neural-networks rules)

Thank you. Looking forward to your response!

Hi, thanks for your interest in LT. We’re very much interested in machine learning contributions. In the end, it somehow needs to be integrated into Java, but training can happen using any language/framework. I think one of the major missing features of @gulp21’s work is that it cannot tell if your or you're should be used (same for similar cases), because the number of tokens is different (correct me if I’m wrong).

#872 looks like a good issue to work on, if you want to focus on Java. If you want to focus on neural networks, it might still be a good idea - this way you know that the Java part of everything is set up properly, e.g. that you can run the tests etc.

@dnaber - Thank you very much for your reply.

because the number of tokens is different

Yes you are correct. The number of tokens are different for your and you're. Just for confirmation, is this what you are talking about: In “You’re fellow American is good at Basketball”, LT should probably detect that we should use “Your” instead of “You’re” right?

Pardon me, I can’t think of a solution right away but what I feel is that there needs to be a better method of context capturing (probably seq2seq with/without attention mechanism can help us here). To give a better answer to the question, I think I will need to have a better understanding of the working of the neural network @gulp21 implemented.

Yes, that’s what I mean.