For Reading Purposes.
EXPLORING THE POTENTIAL OF NATURAL LANGUAGE PROCESSING AND MACHINE LEARNING IN CHILD LANGUAGE DISORDERS DIAGNOSIS
I. Background
Language development is integrally a critical aspect of neurodevelopment in early childhood, in particular for the cognitive and social development. Language disorder has been observed as one of the major developmental conditions amongst young children. Early detection of language disorders in children is of paramount importance in order to provide efficacious treatments and to prevent further deterioration. The goal of this project is to explore the potential of natural language processing and machine learning techniques in developing a computational model that could be used for the early diagnosis of language disorders in children.
As part of the assessment of language development or the diagnosis of language disorders amongst young children, analysis of language samples in the form of transcribed narratives is essential in uncovering various linguistic phenomena manifested, ranging from morphology to syntax to semantics. These could serve as indicative markers for some form of language disorder. For instance, vocabulary richness (such as the mean length of utterance and the total number of different words used) and grammatical mistakes (such as omission of tense marking and disagreement in subject-verb or determiner-noun) were found to be useful measures (Solorio, 2013).
II. Problem Statement
It is hypothesised that various natural language processing (NLP) techniques could potentially be applicable. Through the training of a machine learning algorithm with indicative linguistic patterns mentioned earlier, a computational model could be deployed as a screening measure for diagnostic assessment of child language disorders. For the purpose of this project, the specific form of language disorders that we are exploring here is known as Specific Language Impairment (SLI) - which has been one of the widely studied neurodevelopmental conditions amongst young children (Leonard, 1991).
III. Main Objectives
The project aims at exploring both NLP and machine learning techniques to discover indicative linguistic features that can potentially be useful in identifying children with language disorders. Amongst the key objectives to be attained include:
Are children with language disorders identifiable as compared to children with normal language development through text-based classification?
Of which the linguistic patterns, ranging from morphology to syntax to semantics, are indicators for some form of language disorder?
Of which the machine learning models is best at identifying children with language disorders from their counterparts?
Requirements Task:
1. Preprocessing data from conti-4 : cleaning the raw dataset into structured form for subsequent processing and analysis.
To access data from conti-4 online: https://childes.psy.cmu.edu/ then under database click
Transcipts and media -XML. Then click browse database. On the left side select CLINICAL-MOR, then select conti4-→SLI-narrative.
2. Perform NLP task (Tagging and Parsing).
3. Multiple Features Extraction: such as N-gram(word-based or POS based), grammar rules, other features
4. Basic Classification and Evaluation
5. Feature Selection and more classification
I will be carrying out this assignment in Python 3.0 Using NLTK tool kit for language processing and machine learning toolkit. There is an algorithm that has to be implemented for the machine to learn.
This assignment requires the expert to produce a program based on the description requirement task written in the pdf file "Description Task". Would like to know expert approach on this assignment.
The task can be separated into two sections, the first section being task requirements 1,2 and 3. which covers pre processing, tagging, parsing, and feature extraction.
Then, I would like the same expert who is working on this to continue with the remaining task, which are basic classification and evaluation, feature selection
Attachment:- Conti4.zip