Perl scripts for NooJ-oriented linguists

While working on the DM, I wrote a few Perl scripts to deal with the dictionary. I think some of them might be useful to the NooJ community. Their functionality is briefly described below.

I would be glad to share these programs, but I do not make them directly available for download, and rather ask interested people to contact me. The reason is these little programs have been designed for my specific needs and tested on but one dictionary ; I do not consider them good enough for a more formal release.

nooj-flx2lexc

Given a NooJ inflected dictionary, this program produces a copy in the lexc format of the Xerox Finite State Tool (XFST) (cf. http://www.stanford.edu/~laurik/fsm...), e.g for the DM : dm-lexc_sample.txt. You can then compile the dictionary with XFST and use all the functionalities of that platform.

nooj-flx2property_values

Given a NooJ inflected dictionary, this program produces for each category the list of property values actually used in the dictionary, e.g for the DM (version 1.0.1) : dm-property_values.txt. Useful to spot typing errors on property values and if you want to make a properties.def file.

paradigm-analysis

Given a non-inflected dictionary, this program produces a tabulation separated text table with 4 columns :

  • paradigm name,
  • categories the paradigm applies,
  • number of times the paradigm is used,
  • list of baseforms the paradigm applies to.

The list of baseforms may be limited to cases where it does not exceed some value. The text file may then be opened with MS Excel or Open Office Calc and used as a standard spreadsheet. E.g. with the DM, I obtain this Excel file (with max. 20 baseforms) : dm-modeles.xls.

A télécharger