Data Science Project

  • Course: Data Science Project 2-COMP-4448-2
  • Class time: 04:00 PM 05:50 PM |Engineering & Computer Science | Room 300
  • Instructor: Pooran Singh Negi, pooran.negi@gmail.com
  • Office: 470
  • Office Hours: M, Wed, 2.00 p.m. - 4.00 p.m. Email for 1-on-1 help.

Books

other books

- Think Bayes

optional material

Course Description

This syllabus is subject to change at the discretion of the instructor

Students will work through individual or team projects applying course-work to the full data life cycle within a particular domain. Focus will also be on best software engineering practices and reproducible research.

syllabus

  • Students have to present 10 minutes progress every week on Thursday in class as per assigned/agreed work in the project.
  • Go over Python for data science topics in each class for 30-60 minutes. Student will choose a topics in data science and present it with the help of python notebook to fellow students.

See below for some tentative topics.

  • Answer/help students on their project
  • Go over some some topics as per students demands

Topics

studenttopicsDate
PooranBasic of numpy03-29-2018
MarnieGetting Started with pandas04-03-2018
PooranCode organisation and debugging04-05-2018
RosyData Loading, Storage, and File Formats04-10-2018
MoranData Cleaning and Preparation04-12-2018
Jingru MaData Wrangling: Join, Combine, and Reshape04-17-2018
EyerusalemPlotting and Visualization04-19-2018
MayuriData Aggregation and Group Operations04-24-2018
More Pandas topics from Wes McKinney books. May be starting from chapter 4
Python library for visualization Bokesh, matplotlib
Any data science topics(Different statistical test, distribution, Models, Measures. will be more specific)
Python libray sklearn for machine learning

Grading

There will be a final project and some coding homework assignments.

One has to do a project to full fill course requirements. There will be a final presentation of the work done during current quarter. You will be required to submit a final report.

coding Homework25%
final project presentation, 15 minutes, 31 May in class25%
final project report, due 31 May, please refer to above final report format for submission guideline50%

grade range [(‘A’, >=95), (‘A_minus’, >=90), (‘B_plus’, >=85), (‘B’, >=80), (‘B_minus’, >=75), (‘C_plus’, >=70), (‘C’, >=65), (‘C_minus’, >=60), (‘D_plus’, >=55), (‘D’, >=50), (‘D_minus’, >=45), (‘F’, < 45)])

Please respect DU Honor Yourself, Honor the Code

Projects

If you not doing to an internship or independent study then we have some Project identified for you. If you are not continuing your current project, Select the one by March 29th as per your preference and let us know.

You can choose your own dataset.

Software

R

How to setup R and rstudio

Resources about R

Book

There are lot of good book on R and Data Science.

Other resources

Python

Please install Anaconda for Python 3.6 data science platform. Please install it before coming in the class on Tuesday. See the youtube link Installing Anaconda, Jupyter Notebook. You can also go to my python for reproducible research github repository and start by running pythonBasic.ipynb notebook. I will go over basic of python and jupyter notebook.

Python learning resources

data analysis tools in python

Homeworks

No late hw will be accepted

HW nodesciption and link
Due date
1Finish assigned numpy notebook exercises
2pandas and pythonApril 10, 11.59 p.m
3Finish github and project activity as assigned in canvas hw3April 9, 11.59 p.m

Course Activity

DateReading/Coding Assignmentsclass activity
03-27-2018Finish excercise fromWent over jupyter notebook
Please install required python software as mentioned in software python section.
https://github.com/QCaudron/pydata_pandasRun jupyter notebook from the folder where your notebook resides to start working on notebook
ctr-enter to run the cell without creating new cell
ctr- alt to run the cell and create new cell
change cell type for markdown to write markdown text. Write math in $ $ symbol
ctr-s to save content of notebook
to close the notebook just close the browser tab. to close the server press ctr-c
Please remember handy command line commands cd, ls, pwd, cp, mv, mkdir
03-29-2018exercise numpy notebookCovered creation, indexing, slicing, linear algebra and array oriented programming for n-dimensional array.
numpy notebook partial solutionAlso remember any and all method for boolean arrays.
04-02-2018pandas notebook- Note that for getting slide dropdown in jupyter notebook
click on view(menubar on the top) –>cell toolbar –> slideshow. Slide selection will become active.
you can create slide and subslide, fragment etc. from drop down of slide type for each cell.
- click on cell –>All Output –> Clear to clear current output. Before submitting pull request to me for notebook addition, please do this
- T0 install slide show extension follow Rise Slideshow Extension link or
do conda install -c damianavila82 rise from Anaconda installation
- Check Runs and visualizes your python code in Python resource section for visual representation of code
- We also finalized schedule for notebook presentation. Check topics sections above for current schedule.
04-05-2018pandas, debugging, organising code etclearned apply, applymap in pandas dataframe and python builtin map and reduce along with debugging
above notebook as ran in the class