Objectives - learn/practice the following new concepts:
- Input data file with header record and "dirty data"
- Parallel arrays
- Direct address storage algorithm
- User menu & method-selector based on user choice
- Processing arrays as streams
- Calculating stats for a selected subset of data
- Linear search of array - single match & multiple matches
BASIC REQUIREMENTS Same as asgn 4
PROJECT OVERVIEW This project uses data about the countries in the world in order to answer interactive user queries ("questions"). Two main parts to the program. 1st) The input file data needs to be"cleaned up" (by the program) beforestoring it in memory (in parallel arrays) using the "direct address storage algorithm". 2nd) The user is then presented with a menu of query options, which the program then answers for them. (The user will be able to repeatedly enter queries until "DONE" indicated).
INPUT DATA FILE: CountriesData2000.csv(Data from 2000). The 1st record is a "header record" containing the field-identifiers - skip over this record. The remaining records each contain the following fields in this order:
code - a 3-letter unique identifier for the country
id - a small integer which uniquely identifies the country
(1 through N, although the data is NOT sorted on id)
name - a variable-length string - names are unique within the data set
continent - 1 of these 7: Africa, Antarctica, Asia, Europe, North America, Oceania, South America
region - a sub-region within the continent
size - land size in square kilometers - an integer (the largest is 8-digits, so an int will suffice)
population - an integer (largest is a 10-digit number, but only 1,277,558,000, so an int will suffice)
lifeExp - lifeExpectancy - a floating point number with 1 decimal place (an integer for .0 data)
governmentType - not used in this project
shortCode - not used in this project
CLEANING THE INPUT DATA: The data from the file needs "cleaning" before it is stored in the arrays. These are the issues that need to be fixed:
- Remove this from the front of each record: INSERT INTO 'Country' VALUES (
- Remove this from the end of each record: );
- Remove the surrounding single quotes from the non-numeric fields
- NOTE: the replace method can be used for these
USER INPUT: User enters their choice (2 characters) after being shown the menu of options.
OUTPUT TO USER #1: Menu presented to user (a DialogBox)
Choose one of the following:
MP: Median population
MS: Median land size
ML: Median life expectancy
SN: Show country (by name)
SI: Show country (by id)
SC: Show country (by code)
CO: Show all countries in continent
DO: DONE
NOTE: User can enter cap or small letters (program uses toUpperCase method).
OUTPUT TO USER #2: The answer to the user's query (displays in Console Window). HOWEVER, the data must be clearly labeled so the user sees what his/her QUERY was (because the query was entered in a DialogBox, which disappeared by now, and not the Console Window).
STORAGE: 7 parallel arrays for: code, name, continent, region, size, population, lifeExp
IMPORTANT NOTES:
1. Because "direct address storage algorithm" is used [SEE BELOW] to decide on the storage location (i.e., the index) based on the id field, it is not necessary to actually STORE the id field.
2. Country id's are between 1 and N, inclusive. No country has id 0. So NO data is stored in location [0], even though there IS a location [0] in all the 7 arrays. So any processing of the arrays uses 1 to <= n for it's looping, and NOT 0 to< n.
3. If some method ever needs to sort the data, do NOT DAMAGE this main data storage - i.e., the 7 arrays of clean data. Instead, (e.g., for ind median), the method would make a COPY of the array and sort THAT array.
4. Since program does NOT know how many input data records there will be ahead of time, the array sizes should all be "plenty big" (of size MAX_N = 300). HOWEVER, when reading in the data, keep track of n, and use n (and not array's length) to do any processing.
5. The initial header record is NOT included in n - only the real data records.
METHODS IN Utility CLASS(in a separate .java file from main's class)
- cleanData - raw data line sent in, cleaned up data line returned
- getUserOption - menu displayed (dialog box) and user-specified option returned (capitalized, in case user entered small letter)
- findCountryIndex - does a linear search of name array to find target, and returns index where it was found OR a -1 if NOT FOUND
- findCodeIndex - does a linear search of name array to find target, and returns index where it was found OR a -1 if NOT FOUND
- getContinent - displays list of 7 continents (Africa, Antarctica, Asia, Europe, North America, Oceania, South America) with 1-7 next to them (dialog box) - user enters1-7 for their choice - that choice is converted into the continent NAME (as a String) and that's returned to caller (i.e., Africa, Europe, . . .).
- showNameArray - displays a partial list of the name array including name and the index. Only print 1-20, then a space, then 101-120, then a space, then 220-n (the last country) - this method is called just after the data is loaded from the file into memory
METHODS IN QueryHandler CLASS(in a separate .java file from main's class)
- showMedianPop - population data sent in, a COPY of array is made and sorted, then median value printed.
- showMedianSize - same idea as above - but convert land size from square kilometers to square miles to print
- showMedianLifeExp - same idea as above
- showCountryById- print data (the 8 fields) for the country in location [id]
- showCountryByName - ask user for country name (DialogBox), call findCountryIndex to get its index, and print the data (the 8 fields, including id) for the country at that index location
- showCountryByCode - ask user for country code (DialogBox), call findCodeIndex to get its index, and print the data (the 8 fields, including id) for the country at that index location
- showContinent - calls getContinent to determine user's desired continent. Searches data to find ALL matching countries in that target continent and prints the country codes, their names and their regions.
DIRECT ADDRESS STORAGE ALGORITHM: This means that a country with id 25 would have its data stored in spot [25] in each of the 7 arrays. A country with id 200 would have its data stored in spot [200] in all 7 arrays.
MEDIAN: The value where half the values > this one, and half are
COPYING/SORTING ARRAYS: Only copy the N records (from 1 to N) into the NEW array for sorting. Do NOT use the entire MAX_N (or array.length) to control copying, as most of that array is empty (or "garbage").
PRINTING A COUNTRY - use this format for the 8 fields
NOTE: Truncate any country name or region which is longer than 15 characters to make it only 15 char's)
229 (USA)-United States - in North America (North America region)
Population: 278,357,000Land size: 9,363,520 sq.km. Life expectancy: 77.1