Assignment Aim:
1. Code all requested components
2. Your written analysis of the output produced by your code
3. Aim for optimised code in terms of computational overhead.
It is not always possible to avoid loops, however you should aim to avoid loops where possible (e.g. use NumPy vectorisation as much as possible).
4. Use proper coding style
Code clarity is an important part of your submission. Thus you should choose meaningful variable names and adopt the use of comments - you don't need to comment every single line, as this will affect readability - however you should aim to comment at least each section of code.
5. Have the code run successfully when I try to run it
If you have special files not pre-supplied with the assignment, you should provide these as a final part of your submission (ask how if you're unsure). Also, **do not hardcode** your computer path directory into your program - **I should be able to open your .ipynb file and run the code successfully without editing your code.
6. Documentation of any code limitations including, but not limited to, the requested functionalities
For this assignment you are asked to write a program that will analyse and visualise mined feelings from the We Feel Fine data sets based on a default search and then user-driven searches. Note: You do NOT have to search for the phrases \"I feel\" and \"I am feeling\" as We Feel Fine have already done this work for you. We are going to analyse what they found.
There are five components in this job: user prompt, data loading, data analysis, plotting, report.
1. Prompt the user for the country which will be mined. If the user chooses to not provide this information, then assume a default search of the United States. Try to make your communication with a user as friendly as possible, that is, the least restrictive to how user should enter countries. E.g. no difference for small/large caps, accept some common abbreviations, like US or USA for United States, or UK for United Kingdom.
If an illegal value is entered (e.g. 'new transavia' for country), you can ask a, "gain or try to fix it - google for the Levenshtein distance. Then ask user to confirm your fix or change it to the right one. If your program fails to fix the illegal value for country name, then do not include it in the data loading routine.
You may wish to use a [text list of all countries in the world](world_countries.txt) to help define valid countries. Note that the We Feel Fine data set does not necessarily cover all of the countries in this list.
Please don't be overwhelmed with complexity of this part, start with basic prompt and then gradually increase functionality. Suggested features are desirable but not compulsory.
2. Allow the user a maximum of 5 countries to be successfully mined, although they are also allowed to enter less than 5 countries. Load corresponding data files from the folder **countries**. Successful mining occurs when the feelings for each country have been recorded and returned to your program.
3. For each feeling in the [full list of over 5000 feelings and their frequencies]
Determine the number of times each feeling appears in the mined text, for each country. For any counts that are larger than 0, you will need to retain the third column of information which is the hexadecimal equivalent of the colour of the prescribed feeling.
4. For each country, produce a plot of ellipses where each ellipse represents a feeling and have size proportional to the frequency of its occurrence and is coloured based on the full list of feelings referenced above. Ellipse position can be random. The code for this component is provided and explained below, however you will need to make a number of adjustments to it.
5. Run the base query of data file **World.txt** to determine the first 1500 feelings mined by We Feel Fine from anywhere in the world. We will compare these mined feelings with the chosen countries. There is a substantial hint below explaining how you need to do this.
When your code is ready you have to choose any five countries and provide the following output:
1. The constructed path to load a data file for the country selected by the user or yourself.
2. The most popular feeling across the 5 countries you have chosen to explore plus the base query. If there is no feelings mined, then report this fact.
3. A plot for each country of the ellipses generated by each country's feelings, as well as plot of the results of the base query from Step 5.
4. Assuming darker colours and blues correspond to negative feelings and lighter, happier colours correspond to positive feelings, write a short description summarising the nature of each country as being generally optimistic or pessimistic. This description is to be written by yourself (not your program, unless you want to be REALLY fancy!) and at most two paragraphs will be sufficient. For the purpose of this assignment, one paragraph is 6-8 lines long.
"You have to execute an analysis of the data and provide answers for the following research questions which are of great interest for many people in the retail industry:
1. Study the distribution of basket sizes measured by the number of items in a basket. What basket size is the most popular?
2. Study the relation between number of items in the basket and dollar value of the basket. Considering different \"popularity\" of different basket sizes from question 1, how much money does store get from each size of the basket? What kind of customers are more important - light (small baskets) of heavy (large baskets)?
3. What day of the week is the busiest for the supermarket in terms of a number of shopping trips (one basket = one shopping trip)? What day is the most profitable? For the last question please consider a total revenue or total sales as a proxy for the profit.
For each question you have to provide appropriate graph (or multiple graphs) and brief discussion to present your finding, answer the research question and explain your graph.
If you are studying the above-mentioned course in your academic curriculum and are often overloaded with complex and tricky assignments and homework, then Statistical Programming for Data Science Assignment Help service is the perfect destination for you.
Tags: Statistical Programming for Data Science Assignment Help, Statistical Programming for Data Science Homework Help, Statistical Programming for Data Science Coursework, Statistical Programming for Data Science Solved Assignments
Attachment:- Statistical Programming for Data Science.rar