# | Resource |
---|---|
1 | The current version of the syllabus |
2 | Welcome video |
3 | What should you do the first week of the course |
4 | Instructor: Ron Zacharski, ron.zacharski@gmail.com, 575.680.4041 |
5 | Experience Point Sheet |
6 | the FIU Deep Learning Slack workspace |
7 | The Lab Submission Form |
As you will read in the details below, this class is a programming intensive course where you work at your own pace. Historically, about ⅓ of the students get an A, ⅓ an F, and ⅓ between an A and an F. What separates the 'A' students from the 'F' ones is that the 'A' students keep a regular schedule and consistantly submit their work. If they have a question or need help debugging they message me on Slack. They are not necessarily the most proficient programmers, or the best at math. The attribute that best defines them is self-discipline.
Data mining applications, data preparation, data reduction and various data mining techniques such as association, clustering, classification, anomaly detection.
This course provides an introduction to practical machine learning tools for data mining with an emphasis on XGBoost and Deep Learning.
Prerequisite Course: COP 3530 Data Structures
Corequisite Course: COP 4710 Database Management
Note: While very little material from either of these courses will be used in this course, these prerequisites give you a level of programming maturity that is required.
This class is asynchronous meaning there is no mandatory real-time interaction. You will be working through the Inquiryum Machine Learning Fundamentals Course. You can watch the videos anytime you want. You can play them at a faster speed, you can rewatch them or pause them. You can work on the course material in 20 minute blocks throughout a day, or devote a large contiguous block of time once per week. When you need help you can use the FIU Deep Learning Slack workspace to get assistance from me or your classmates.
The advantages of this approach is that it allows you great flexibility in when you want to work on the material and for how long. And, as described below under mastery learning, it allows you to work at your own pace.
Slack Office Hours: Tuesdays and Wednesdays 11-2pm ET
I will be sitting at my laptop on the Slack channel Tuesdays and Wednesdays from 11am until 2pm ET. This means that if you message me, I will respond within 5 minutes unless I am helping another student. My next level of availability is Tuesdays and Wednesdays from 2pm to 4pm and Thursdays from 11am until 2pm ET. My average response time during that period is 30 minutes. Feel free to message me outside of those times but my response delay might be significant. Often I turn off Slack notifications at midnight. There may be times during Friday through Sunday when I don't have cell or wifi coverage and I will not be able to receive your message. Also, there may other times when I don't have cell coverage. In those cases I will post a message on Slack beforehand. The reason for this is that while I am based in Santa Fe I often go off exploring the Southwest in my van and sometimes lose cell phone coverage. If your questions require something that can be better addressed over Zoom, we can arrange a meeting time through Slack. I also encourage those in class to help others (see my honor code policy below)
The above hours may be subject to change if other times benefit more students. These changes will be announced in the Slack channel.
Students will gain hands-on experience with the following algorithms and libraries, learning when and how to apply them to problems in data mining:
-
Numpy, Pandas, skLearn
-
entropy and decision trees
-
bagging and pasting
-
random forest
-
XGBoost
-
deep learning basics
-
Convolutional Neural Networks (CNN)
-
Clustering
-
Working with text
Students should be able to
- architect a scalable ML pipeline
- run ML jobs on a GPU using Jupyter Notebooks in Colab
- evaluate different ML models
- determine the best ML algorithm to use for an application
- reduce the dimensionality of a dataset
- develop different linear models to solve classification problems
- communicate effectively about ML applications (terminology)
Students should be able to
- apply decision tree algorithms to create a classifier
- use random forest techniques
- combine a number of weak classifiers into a strong one by using boosting.
- effectively use the XGBoost algorithm
Students should be able to
- build a simple deep learning system for image classification
- build CNNs for computer vision
- pre-process text datasets into a form usable for classification
- build CNN for text classification
- adjust hyperparameters to improve performance
The majority of effort in the course is in working on labs and project, which have different levels of expected knowledge and independence.
- In the form of Jupyter Notebook tutorials which provide detailed explanations and sample executable code.
- You are to:
- write a small amount of code to complete the task
- answer any non-coding questions the Notebook may ask.
- Follows examples shown in the course videos and in the labs.
- Builds off of concepts and skills you learned completing the labs.
- Project definition provides
- a dataset
- a short problem description
- You are to
- design and create the machine learning algorithm used to solve the problem.
- write the code in a Jupyter Notebook
- test and evaluate your solution.
- save your notebook to Github..
Traditional classes are time-based learning. You spend a specific amount of time on a topic and then you move on to the next topic. For example, in a traditional intro course on Python programming you might cover for loops in week 5, take a quiz on them, and then move on to Python dictionaries in week 6. Suppose you got a 75% on that quiz in week 5. That means that you did not learn 25% of the material. Then perhaps in week 10 you take a test on list comprehensions and get an 80% (you did not master 20% of the material). These gaps in your mastery start adding up, and eventually, in either in some future class or on the job, you hit a wall because your current task requires that you are skilled in areas that you failed to master.
This class doesn't work like that.
In contrast to time-based learning, in mastery learning you stay on the topic until you master it. You work at your own pace. This online class is based on this approach. You stay on a topic until you master it. As I mentioned, the lectures are a set of videos (mostly screencasts) that you can watch at anytime. If the material is easy for you, you can speed up the videos and watch them at 1.5 speed. If you find the material challenging, you can rewatch the videos, google for more information, interact with other learners on the Slack channel.
Obviously, the work-at-your-own pace approach will collide with the end of the semester and there will be some material that you will not cover. The course is designed so that the essential core information is presented first, to enable you to develop solid foundational skills with no gaps.
This course is work at your own pace. Other courses you might be taking have fixed deadlines, So, for example, you might have a gnarly project for a programming class due this week and a big operating systems project due next week. It is likely that you will work on those projects since they have immediate deadlines and ignore working on this course. It is human nature. Just block out a regular time each week to work on the course and you will do fine.
Order | Lesson |
---|---|
1 | JumpStart |
2 | Labs |
3 | Projects |
Again, the class is work-at-your-own pace, but I provide a suggested schedule below.
Week | Date | Unit | Topics | labs and projects |
---|---|---|---|---|
1 | 9 Jan | Intro | Intro to class & Quickstart to ML | Quickstart lab |
2 | 16 Jan | basics | Numpy, Pandas | Numpy & Pandas labs |
3 | 23 Jan | basics | kNN sklearn | sklearn lab |
4 | 30 Jan | basics | entropy and decision trees | decision tree lab |
5 | 6 Feb | basics | one-hot encoding, cross-validation, hyperparameters | working with data lab |
6 | 13 Feb | basics | Regression & Clustering | regression and clustering labs |
7 | 20 Feb | XGBoost | Intro to boosting, bagging & pasting | bagging and pasting lab |
8 | 27 Feb | XGBoost | random forest, patches, xgboost | XGBoost lab First Project |
9 | 6 Mar | DNN | our first neural network - classifying images | a first look at deep learning lab |
10 | 13 Mar | DNN | Neural Network anatomy & classification | -- |
11 | 20 Mar | DNN | Introduction to Convolutional Neural Networks (CNN) | CNN lab |
12 | 27 Mar | DNN | project work | Projects 2 & 3 |
13 | 3 Apr | DNN | CNNs and text classification | NLP & Embeddings lab |
14 | 10 Apr | DNN | CNN and text classification cont'd | Amazon Reviews Project |
15 | 17 Apr | RL | Generative AI | GAN lab |
16 | 24 Apr | FINALS WEEK | FINISH PROJECTS |
Deadlines will be announced in the Slack channel.
While the free Colab account is the minimum requirement, for the last 6 weeks of the class it may be beneficial to subscribe to [Google Colab Pro](Google Colab) for $9.99/mo
Laptop
Inquiryum’s Machine Learning Fundamentals Course
No purchases of books or equipment are required.
Slack is a work chat application that many tech companies use. We are going to be using Slack in a number of ways. First, all my announcements for the class will be in Slack. If you have a particular programming question you can ask it in a general channel and hopefully you will get an answer or suggestion quickly from either myself or fellow learners.
Twice per week one of our Slackbots will ask you three questions:
- What have you accomplished since the last class?
- What are you working on now?
- What is holding your back?
Failure to do the Slack check-in will result in the following deduction of points:
number of missed check-ins | points deducted |
---|---|
1 | 0 |
2 | 10 |
3 | 25 |
4 | 100 |
5 | 250 |
You will be responsible for logging into Slack on Tuesdays and Fridays to answer these questions. When you initially sign in to Slack make sure to join the scrum channel.
Grading is based on a method developed by Professor Lee Sheldon at Indiana University. It is based on obtaining experience points (XP). The number of XP determines what level you are at. You start the class at Level Zero and with 0 XP. The level you obtain at the end of the semester determines your final grade. Here is the chart:
Level | XP | Grade |
---|---|---|
Zero | 0 | F |
One | 550 | D |
Two | 740 | C |
Three | 800 | C+ |
Four | 840 | B- |
Five | 871 | B |
Six | 914 | B+ |
Seven | 950 | A- |
Eight | 990 | A |
Here are the ways of earning XP:
-
there will be around 15 labs. On average each will be worth 30xp
-
there are 4-5 machine learning projects. On average they are each worth 150xp
The Office of Disability Resources has been designated by the college as the primary office to guide, counsel, and assist students with disabilities. If you receive services through the Office of Disability Resources and require accommodations for this class, make an appointment with me as soon as possible to discuss your approved accommodation needs. Bring your accommodation letter, along with a copy of our class syllabus with you to the appointment. I will hold any information you share with me in strictest confidence unless you give me permission to do otherwise.
If you have not made contact with the Office of Disability Resources and have reasonable accommodation needs, (note taking assistance, extended time for tests, etc.), I will be happy to refer you. The office will require appropriate documentation of disability
Floridal International University's faculty are committed to supporting students and upholding the University’s Policy on Sexual Harassment and Sexual Misconduct. Under Title IX and this Policy, discrimination based upon sex or gender is prohibited. If you experience an incident of sex or gender based discrimination, we encourage you to report it. While you may talk to me, understand that as a “Responsible Employee” of the University, I MUST report to FIU's Title IX Coordinator what you share. If you wish to speak to someone confidentially, please contact the confidential resources described on the []FIU Title IX webpage](https://dei.fiu.edu/crca/title-ix) They can connect you with support services and help you explore your options. You may also seek assistance from FIU’s Title IX Coordinator.
The general policy for any computer science class is
-
You must write all programs yourself (without help from others or from websites), unless specified. You are not to communicate to others in any way about your assignments. You are also not to get code for your projects from Google, StackOverflow, Chegg, YouTube, or any other website unless permitted in writing.
-
Do not share your code with other students, either this semester, or in any future semester. Remember that giving unauthorized help violates the Honor Code just as much as receiving unauthorized help does.
-
Do not post your code or class materials anywhere. You may not upload your solutions to any publicly-available website, post part of your solution on StackOverflow or any similar site, or post assignments/notes/etc from the course, even if they were instructor-authored materials.
-
Explicitly cite any sources you use
-
Do not look at solutions from previous semesters. Professors evolve and reuse assignments over many years in order to perfect them. If someone does leave their code (or other materials) lying around from a previous offering of the course, you may not look at them when completing your own.
-
Be prepared to explain anything you submit. Your instructor may, at any time, call you in to his/her office to explain any part of your program. You will be expected to convincingly walk him/her through your code, demonstrating your thought process behind it. If you cannot, this may be considered an Honor Code violation.
-
When in doubt, ask your instructor what constitutes plagiarism. If you’re not sure whether you need to cite a source for a quotation in a paper, or list the URL of a website from which you got some code, ask. If you do not ask, and the instructor deems it to be unauthorized help, this may be considered an Honor Code violation.
From The University of Mary Washington Computer Science Department Honor Code Policy
The amendments to this general policy are as follows (the numbers related to the numbers in the policy):
- I am more flexible than the policy "you are not to communicate to others in any way about your assignment." My rule of thumb is What would a responsible adult do on the job? If you have a deadline on the job at a startup and didn't know how to do something, the responsible thing wouldn't be to sit at your workstation just getting more and more frustrated and depressed and missing the deadline. The responsible person would get whatever help was necessary to complete the task. On the other hand, a responsible person wouldn't let someone else do all the work and present it as his own. That would be a violation of this policy.
- Regarding " Remember that giving unauthorized help violates the Honor Code just as much as receiving unauthorized help does." Again, I refer to the 'responsible adult' mentioned above. I would like people to help each other but yet do the work to learn the material. Sharing a complete assignment violates this point, but helping a person debug one cell of a notebook is fine.
- Sadly, this contradicts what you want to do in your professional life. In your professional life, you want to post solutions to things you figured out as a way of helping people in the community. In fact, we are going to be using some material people posted in this class. However, to prevent plagiarism, you will only post your material to a private github repository. Sorry.
- You should acknowledge the people that helped you in writing in your submission. For example, "Ann Mulkern helped me with the code to divide the dataset into training and testing sets"
- All the rest of the conditions of the computer science policy hold as is.
During the first week of class you will need to fill out the Avatar Form for your avatar name, pseudonym, whatever. This is the name that will appear on the Experience Point Google Spreadsheet that will be viewable by everyone in the class. If you wish to remain anonymous, don’t share your avatar name with anyone. To further protect the anonymity of those who wish to remain anonymous, the spreadsheet will also be populated by fictitious avatar names.