Course Code |
COMP9313 |
Course Title |
Big Data Management |
Units of Credit |
6 |
Course Website |
|
Handbook Entry |
http://www.handbook.unsw.edu.au/postgraduate/courses/current/COMP9313.html |
The formal pre-requisites for this course are Data Structures and Algorithms (COMP9024 or COMP1927 or COMP2521) and Database Systems (COMP9311 or COMP3311).
The knowledge that we assume from these courses are:
-
an understanding of data structures and algorithms to enable efficient and scalable management of massive amount of data;
-
experience with relational data model and SQL query language;
-
solid programming skills in Python.
The learning foci in this course are primarily lectures (theoretical knowledge) and projects (practical knowledge). The course will have an emphasis on problem solving for real applications.
Students will learn the main contents of the course through lectures. Labs are available to assist students in the context of the projects and real applications.
Note that there will be no physical lectures this trimester. Instead, pre-recorded lecture videos will be used.
This course aims to introduce the concepts behind Big Data, the core technologies used in managing large-scale data sets, and a range of technologies for developing solutions to large-scale data analytics problems. This course is one of the advanced database course series. Other advanced database courses include:
The course is designed to be practical. As such, real-life examples of big data issues and applications will also be used throughout the course.
Students successfully completing this course will be able to:
-
Describe the important characteristics of Big Data,
-
Understand key concerns in the management of Big Data,
-
Develop an appropriate storage structure for a Big Data repository,
-
Utilise the Map/Reduce paradigm and the Spark platform to manipulate Big Data,
-
develop efficient solutions for analytical problems involving Big Data.
See the course homepage (http://www.cse.unsw.edu.au/~cs9313/20T2/) for (up-to-date) information regarding Course Timetable, Course Staff and Course Schedule.
The assessment will have the following components:
-
Written assignment (20%): This component helps review the concepts introduced in lectures,
-
Programming project-1 (25%) and project-2 (25%): This component gives you the opportunities to apply big data technologies to solve real problems,
-
Written final exam (30%): This component assesses the various facts-and-knowledge level learning outcomes.
The final mark is calculated as:
Final Mark = 0.2*Assn + 0.25*Proj1 + 0.25*proj2 + 0.3*FinalExam
Note: There is no double pass in this trimester.
Grading Criteria: The grading criteria for each assessment will be detailed in the specification.
Late submission: Assignments/projects submitted late are subject to late penalties, which are specified in the assignment/project specifications.
Assignment submission: Assignment submission procedure is described in the assignment specification document, which will be linked to this page when the assignment specification becomes available. Generally assignments are submitted electronically using the give program running on the School’s computer systems (in labs, and on servers). Details are in the assignment specifications.
You should check your school e-mail frequently in case of announcements relating to this course. We assume that you read emails sent to your CSE account by the next working day during teaching sessions.
Copying assignments is unacceptable. Assignments will be checked. The penalties for copying range from receiving no marks for the assignment, through receiving a mark of 00 FL for the course, to expulsion from UNSW (for repeat offenders). Allowing someone to copy your work counts as plagiarism, even if you can prove that it is your work.
There are several on-line sources to help you understand what plagiarism is and how it is dealt with at UNSW:
Make sure that you read and understand these. Ignorance is not accepted as an excuse for plagiarism. In particular, you are also responsible that your assignment files are not accessible by anyone but you by setting the correct permissions in your CSE directory and code repository, if using. Note also that plagiarism includes paying or asking another person to do a piece of work for you and then submitting it as your own work.
UNSW has an ongoing commitment to fostering a culture of learning informed by academic integrity. All UNSW staff and students have a responsibility to adhere to this principle of academic integrity. Plagiarism undermines academic integrity and is not tolerated at UNSW. Plagiarism at UNSW is defined as using the words or ideas of others and passing them off as your own.
If you haven’t done so yet, please take the time to read the full text of
The pages below describe the policies and procedures in more detail:
There is no prescribed textbook for this course. Yet, the following resources are relevant for the topics we will cover in this course:
-
Hadoop: The Definitive Guide. Tom White. 4th Edition - O’Reilly Media
-
Data-Intensive Text Processing with MapReduce. Jimmy Lin and Chris Dyer. University of Maryland, College Park.
-
Mining of Massive Datasets. Jure Leskovec, Anand Rajaraman, Jeff Ullman. 3rd edition - Cambridge University Press
-
Learning PySpark. Tomasz Drabas and Denny Lee. O’Reilly Media
This course is evaluated each session using the myExperience system. Students are also encouraged to provide informal feedback during the session and to let the lecturer in charge know of any problems as soon as they arise. Suggestions will be listened to very openly, positively, constructively and thankfully, and every reasonable effort will be made to address them. Your feedback is important and will be considered seriously. Student feedback via the myExperience system will enable improvements to future offerings of this subject.
Yifang Sun