Feature(s) request

My desired feature list:

  • sentenized and tokenized (language specific) text output from commandline LT (for testing both properly on large data sets and as preprocessing for things like the AI tools)
  • record-like checking results from command line LT (JSON?); this would make programmatic processing of results easier.

The command-line tool already supports the --json option, is that what you need?

Yes, but it does not combine with line-by-line. (Which does not seem to work with a line-by line file)

The only way I found so far to do things line by line (sentence by sentence i.e.) is using bash to get one line at a time using bash and starting LT for every line (or throw it to the local server using a program).

The idea was to count the # of examples per rule in grammar.xml, then when the # is less than 10 run LT with only that rule on an input file, get the LT output, and make prototype examples from that output, then put this (commented) back in grammer.xml for manual editing.