1 Introduction
In this assignment, using your AWS linux instance you will build and query an instance of a MongoDB database and an Apache Derby relational database using data concerning pedestrian traffic in Melbourne
Please read ALL the following requirements carefully before you start.
Data
The data that you are going to use in this assignment is open data from the City of Melbourne about pedestrian traffic in the Melbourne CBD. The data is described at the following website: https://data.melbourne.vic.gov.au/Transport-Movement/
Pedestrian-volume-updated-monthly-/b2ak-trbp
and the data can be explored (and downloaded) in various formats from that website. You can download a compressed copy of the data from: https://goanna.cs.rmit.edu.au/~jat/Pedestrian_volume updated_monthly_.csv.zip which can be accessed using the wget command from your AWS linux instance. Or you can download the file on another machine and use scp to copy to your AWS linux instance or use a tool (such as DBeaver or Robomongo to connect to your database and upload the data directly into your database). When you download the file you will need to unzip the file. This file is also available on blackboard.
The data and the first line contains the field names that should be used to inform how you create and name any entities in your databases.
3 Academic Integrity
This is individual assignment, which means what you submit MUST be your own original work.
So make sure you reference any sources you use (including all web resources) as all assign- ments will be checked with plagiarism-detection software.
Any student found to have plagiarised will be subject to disciplinary action in accordance with RMIT policy and procedures. Plagiarism includes submitting code that is not your own or submitting text that is not your own. Submitting a comment from someone else in your code or a sentence from someone else's report is plagiarism, and plagiarism includes submitting work from previous years. Allowing others to copy your work is also plagiarism. All plagiarism will be penalised; there are no exceptions and no excuses.
4 General Requirements and Getting Help
This section contains information about the general requirements that your assignment must meet and how to get help.
1. Your database must be set up on your AWS linux instance (as set up following the instructions in the initial practical classes in the laboratories). As some tasks require timing you should use the same AWS linux instance and versions of MongoDB and Apache Derby for all tasks.
2. Parts of this assignment will ask you to analyse your results, and to write your conclusions in a report. The report MUST be a PDF file. Submissions that do not meet this requirement will NOT be marked.
3. Your report must be well-written. Poorly written or hard to read reports will receive substantially lower marks. Your report should be appropriate to submit in a professional environment (such as including in a portfolio of your work for a prospective employer). The RMIT Study & Learning Centre employs advisors to help you improve your writing. For details, see https://www.rmit.edu.au/studyandlearningcentre.
4. All sections of this assignment are expected to show that you have thought about the problem. The most basic structuring of data and analysis will get the most basic mark.
5. The COSC2407 Database Systems course blackboard contains a discussion board for this assignment allowing a forum for students to ask questions (see below) and contribute to discussion about aspects of the assignment. If there are announcements about the assignment (including if there are any revisions to the assignment specification) these will also be made via announcements on the course blackboard. You are expected to check these on a daily basis.
6. If you have any questions about the assignment (for example to clarify requirements): (a)Please first check this assignment specification, as well the announcements and the discussion board to see if it has already been answered.
(b) If it has NOT already been answered and does NOT include your own code (includ- ing database queries), please post your question on the discussion board.
(c) Otherwise, if your question involves your own code (or is about your personal sit- uation) then discuss it in your practical class with the lab instructor or contact the lecturer (or your tutor) via email.
5 Tasks and Analysis
In this section, you will be asked to carry out several tasks using your AWS linux instance to create and query a database (using both MongoDB and Derby), with the data in the provided file and to analyse your results.
Create a file called report.pdf (various software including word processors can export as PDF). Use this file to report on the following tasks. Each task should be reported under a separate heading with the task name and description, for example for the first task use the heading: Task 1: Load the data into MongoDB.
Task 1: Load the data into MongoDB
You are required to load the data into MongoDB. In your report:
- explain how have you chosen to structure the data inserted in MongoDB
- provide details of the time taken to load the data (The mongoimport is one utility will provide such information). Please note that a naive import into a flat structure in Mongodb will not accrue you a great mark. You need to analyse the data and consider appropriate ways to structure the data and then using any scripting, programming or other tools to format the data accordingly.
- What alternative way or ways could you have organised the data when storing in Mon- goDB, and what advantages or disadvantages would these alternative designs have?
Task 2: Load the data into Derby
You are required to load the data into Derby. In your report:
- explain how have you chosen to structure the Derby relational database and give reasons.
- provide details of the time to load the data into Derby. The same criterion to think about how you structure your data applies and that this structure should reflect your understand- ing of database system theory applies.
- What alternative way or ways could you have organised the data when storing in Derby, and what advantages or disadvantages would these alternative designs have?
Task 3: Queries and timings
Using both MongoDb and Derby, run the queries below 20 times: ten times after a fresh reboot and ten times after the server has been running for a while.
Undertake the following queries (including timings when you run the queries, for comparison):
- Where is the sensor with ID number 3 located?
- For how many days in 2017 has the City of Melbourne published the sensor data of pedestrian counts?
- In 2015, what location had the most pedestrians and was the total number of pedestrians at that location across the whole year)?
- What was the busiest day in the city in 2016 (give the date)?
- What day of the week (i.e. Monday to Sunday) is the busiest?
In your report provide the queries (code), details of the answers returned by the database system and query timings. The best approach would be to detail the timings in a table along with the mean and the standard deviations of these timings.
In your report provide the queries, details of the answers returned by both database systems and query timings.
Task 4: Evaluate and critically analyse results
With reference to your experimental results above:
- Are the timing results what you would have expected to see based on your theoretical understanding of these database systems? Explain why or why not?
- How would you expect these results to have changed if you had used the alternative designs (considered in Tasks 1 and 2) for MongoDB and for Derby? Justify for answers.
- Discuss the advantages and disadvantages of using MongoDB and Derby for dataset.
Assignment - tips with setting up database
If you have designed your schema but are struggling with the commands to set up the database, the following tips from Reza Soltanpoor (one of the lab assistants) may be helpful.
NOTE: All .0.s correspond to variables which should be associated with your own values! :-)
=== MongoDB ===
mongoimport -d $DB_NAME -c $COLLECTION NAME --type csv --file $LOCATION_OF_THE_CSV_FILE.csv --headerline
mongoimport -d $DB_NAME -c $COLLECTION_NAME --type csv --file $LOCATION_OF_THE_CSV_FILE.csv --headerline --fields $LISTOF_YOUR_FIELDS
mongoimport -d $DB_NAME -c $COLLECTION_NAME --file
$LOCATION OF THE JSON FILE.json
number of records (documents) imported to the collection: db.pedestrian.find().count()
sensors in Melbourne Central:
db.pedestrian.find( f.Sensor_Name" : "Melbourne Central") ).pretty()
number of sensors in Melbourne Central: db.pedestrian.find({"SensorName":.Melbourne Central.)).count()
--- Derby (USING "ij") --- connect 'jdbc:derby:MyDbTest'; show connections;
show schemas;
// create schema $schema_name; show tables;
show tables in $SCHEMA; show tables in app;
//NOTE: you need to create the table before importing the data in it.
CALL SYSCS_UTIL.SYSCS_IMPORT_TABLE('$SCHEMA', '$TABLE', '$FILE', '$DELIMITER', null, null, 0);
CALL SYSCS_UTIL.SYSCS_IMPORT_TABLE('APP', 'PEDESTRIAN', 'shortPeds.csv', ',', null, null, 0);