The English - Romance NP database at the UofI Computational Semantics Lab (given in class) consists of a set of English NPs translated in five Romance languages: Spanish (Es), French (Fr), Italian (It), Portuguese (Port), and Romanian (Ro). Each NP was extracted in context from a text collection including works by J. Steinbeck and H.G. Wells. Here are some examples of entries in this database:
LPE-1*story#1/1 of pearl#1/2*historia de la perla*storia della perla* histoire de la perle* história da pérola*istoria perlei*TOPIC
LPE-9*wash#1/1 of light#4/2*baño de luz* scia di luce* rai de lumiESC.AESCNhre* banho de luz *baie de lumina*P-W
LPE-10*turning#1/1 of twigs#1/2*búsqueda entre ramitas* rivoltare di ramoscelli* retourner des brindilles*giro dos galhos*intoarcerea ramurilor*THEME
LPE-10*bits#2/2 of wood#1/1*pedacitos de madera* pezzi di legno* morceaux de bois* pedaços de madeira*bucati de lemn*P-W
Notice that the NP instances in each of the six languages in the database considered are either of the type "N N" or "N P N".
The fields are separated by stars, *. They are:
The English NP field has the following format:
noun1#sense/i [preposition] noun2#sense/j
where i and j can be either 1 or 2 indicating the position of the semantic argument in the NP instance (note that the preposition might be missing). For example, in the NP instance chair#1/2 arm#1/1 which encodes a part-whole relation (P-W), the noun chair indicates the whole (and thus is labeled as 2), and the noun arm encoded the part (and is labeled as 1).
Problem:
Write a shell script containing a combination of UNIX and AWK commands of your choice that will give the answer to the following questions. The shell script should be executed only once and output (to standard output, unless mentioned otherwise):
the number of English NP instances in the file;
the number of unique English NPs (consider for this the entire field $2);
the list of unique semantic relations considered in this database (store it in "sr.uniq.txt");
the frequency of each semantic relation in the database with the following format: Semantic_relation frequency
(store this in "sr-freq.txt");
the three most frequent semantic relations in the database;
the number of English NPs of the type "N P N";
the list of English prepositions (P) in the "N P N" types (store it in "preps.en");
the list of unique English prepositions found in the NP instances per semantic relation. Thus, each list will be stored in a file "preps.[sr]", where [sr] refers to the corresponding semantic relation. Do this for the following relations only: P-W and CAUSE.
the number of polysemous English nouns (a polysemous noun is a noun which has more than one sense; count only those which you know are ambiguous as given by the sense annotations in the cluvi.NPs.txt file); don't count the ambiguous words twice;
which of the five Romance languages has the most "N N" instances (e.g., those NPs which consist of only two consecutive nouns)?
Extra-credit question:
If we consider as default argument position "noun1#sense/1 noun2#sense/2", how many English NP instances encoding P-W have the arguments in reverse order (e.g., "../2 ../1")? Print all these lines in the file "reverse-arg.p-w".