Simple Natural Language Processing Tool (SiNLP)

The development of this tool was to produce a powerful, extendible, and yet simple natural language processing (NLP) tool that can be used by researchers inexperienced in NLP. It was inspired from Coh-Metrix—an advanced NLP tool that uses a comprehensive list of indices to analyze text—and was intended to be more accessible, faster, and easier to use. As a NLP tool, SiNLP offers accurate and valid analyses on various dimensions of text. However, because of its simplistic form, it uses fewer and simpler indices (e.g., surface features) that act as proxies for features that are much more complex. The power of the SiNLP tool is that it also allows users to create their own lists to automatically assess texts.

SiNLP is still a powerful NLP tool and extremely convenient; it is a free program that can be downloaded to the user's hard drive (as opposed to accessed over the internet, like Coh-Metrix, requiring the user to wait in a queue of other users), and that allows for batch processing. Instructions on how to use SiNLP are available below along with a download.

Preset SiNLP Indices: SiNLP comes loaded with a number of preset indices. These are briefly described below.

  1. Text Structure indices offer a description of the surface characteristics of text such as number of words, sentences, and paragraphs.
  2. Vocabulary is measured by two indices: the number of unique words in a text in order to assess the quality of vocabulary used, and the number of letters in a word to determine word frequency in a text.
  3. Givenness is a measured by the number of determiners and demonstratives in a text. Greater use of given information makes a text easier to comprehend.
  4. Anaphor Use is a measured by the number of all pronouns and through separate indices of 1st, 2nd, and 3rd pronouns. Anaphor use is indicative of coherence.
  5. Lexical Diversity> is measured by dividing the number of word types (unique words in a text) by the number of word tokens (all words in the text). A text low in lexical diversity is indicative of being high in cohesion.
  6. Connectives and Conjuncts are measured using a number of lists to count the number of connectives and conjuncts present in the text, both of which are strong indicators of text coherence.
  7. Future is measured using lists that containing future words and wild card expressions (e.g. word that end in 'll) to determine the degree of future temporality present in the text. Temporality can be an indicator for situational cohesion.
  8. Syntactic Complexity is measured through the number of words per sentence, and the number of negations. Greater syntactic complexity generally indicates greater writing quality.

In order to use SiNLP, users first select from a list dictionary to determine what measures SiNLP will calculate. SiNLP will calculate the above listed indices automatically. Users can also create and use their own custom list dictionary (see the instructions for help).

Validation of SiNLP:

To demonstrate SiNLP's validity, Crossley and colleagues (2014) predicted the rating scores on student essays for SAT writing prompts by human experts (teachers with a minimum experience of four years teaching college freshman courses) using linguistic indices reported by Coh-Metrix and SiNLP. The study demonstrated that both systems could accurately predict the students' scores (Crossley, Allen, & McNamara, 2014).

References/ Further Reading:

Crossley, S. A., Allen, L. K., Kyle, K., & McNamara, D. S. (2014). Analyzing discourse processing using a simple natural language processing tool (SiNLP). Discourse Processes, 51, 511-534.



Click here for instructions on how to use SiNLP: [PDF]

To access SiNLP please click on the following link: [LINK]

© 2012 SoLET Lab