Guidance – Assignment

Coursework Assignment – Tutors India
February 20, 2021
Advanced Analytics Coursework
February 20, 2021

Guidance – Assignment



Introduction

Set the scenario of your Report. Very brief and concise. This assignment submission will be via Turnitin.


1. TASK1 – Data Integration (ETL) and Maintenance

1.1 Original datasets

  • Create your datasets (tables) in Apex. You will need at least two db sources for the success of this task… two databases or tables from two databases that will feed directly data to your dimensions. Tables for the original sources that are not used in SS do not need to be included at this point.
  • Do you need to grow data in your table, or you have got a sufficient number of rows… Do it only if you know how to, otherwise it does not matter.
  • Evidence code, brief discussion and screenshot that it is running successfully

1.2 Star Schema

  • Have the diagram of your SS here.
  • Create your star schema with your dimension tables and fact(s). Remember tables (or dimensions) are created using a command that refers to datasets.
  • Have you got a surrogate key(s) in your dimension? Will you implement a sequence for your surrogate keys or natural keys?
  • Consider your measures for your reports, such as Avg xyz in a location, total, min… more aggregates…
  • Decide on the Slowly Changing Dim Type you will be using… Implement it within your SS.
  • You may have done many of these tasks already in Assignment 1. Now you may review based on our feedback.
  • Evidence code, brief discussion and screenshot that it is running successfully

1.3  Staging Area

  • Are you going to use Staging area approach? You may use Tableau for some parts of the task, but note that assessment is asking for a scripts. If you going to extract and load data straight into your dimension table? If it is the later one, which could be ok for some dimensions, not for all, you’ll not do this part. Note that for a good and excellent solution, we expect you to have done this stage.

1.3.1 Extracting of data into a temp table?

  • You’ll need to move data from a dataset to another dataset, so you can “prepare” data before you enter into the SS.  E.g. Code: Insert into temp table as select x, y from original_table
  • You can merge two datasets into one – concatenated data sets.  These datasets are then cleaned in the next part. E.g. Code@ data x; set dataset1, dataset2, dataset3; run;
  • Evidence code, brief discussion and screenshot that it is running successfully
  1. Cleaning of data
  2. Show here what data were missing data, delete unnecessary data, check if there is  incorrect format data?
  3. Produce Error log, if any, needed.
  4. Evidence this by presenting code, brief discussion and screenshot that it is running successfully in Apex.
  1. Transforming of data
  2. Would you need to transform data? When? Why? How? Would you need to have a table will hold error data? Null values, how will this be stored? Mearged fields?  Data type change? Have records stored in UPPER case in SS. Design this ETL script. Have all considerations in a table.
  3. Evidence code, brief discussion and screenshot that it is running successfully
  1. Loading data into SS
  2. This is cleaned and transformed data!
  3. Clearly show the code for each dataset (dimension) population.
  4. Define surrogate keys for dim: e.g. Code: data x_dim; set x_dim; counter=_N_; run;
  5. You will need to consider

How will you populate the FACT table? What calculations of measures have been made? Plan it! Have a table to show your plan and if necessary assumptions you’ve made.

How will you populate your Time Dim?

NOTE: you may have chosen to populate dimension directely from the original sources. In any case you will need this part only.

  • Evidence code, brief discussion and screenshot that it is running successfully
1.3.5 Ongoing Data Load (Data Maintenance (SCD)) implemented and evidenced as part to the solution.

2.TASK 2 – OLAP

  • Decide on the technology to use to support the OLAP (pivot table) investigation – Access and Excel or Tableu?
  • Decide on the model to use for to support your OLAP. A detailed table or a star schema model (fact and dimensions).
  • Design the OLAP investigations (dimensions – eg area: town, city, county and time: week, month, year)
  • Create a query in Access (or other software) which can be imported into excel and provide a useful pivot table.
  • Create screen shots of the data analysis provided in the pivot table (OLAP as far as this exercise is concerned).
  • This exercise is about understanding, you shouldn’t need to significantly rework anything, you will probably provide just a sub-set of the data.
  • Use literature to support your choices and arguments.
  • Note: you do not need to evidence the steps undertaken to e.g. Pivot table was created.

3.TASK 3 – Findings and Reflection

Check marking scheme and reflect upon your findings, methodology, processes, methods and possibly the architecture that you will find most appropriate to implement for your case study organisation.

When your evaluate the current and potential social and ethical issues for the case study organisation, consider some of the theories that will underpin your discussion, such as PAPA framework, Theory of Stakeholders, etc…