For your first CS Build Week, you'll be implementing a few watershed algorithms that are very often used in the data science world. You've probably heard of these algorithms before. You might even have an idea of how they work. But to really solidify your understanding of them, you're going to implement them and then use your implementation just like how you would if you'd imported the algorithm from a data science or machine learning library.
Drop into your DS cohort channel if you have questions that are beyond the reach of the CS TLs and/or instructor.
For the first part of this Build Week project, you'll pick one of the following algorithms to implement:
- K-Means Clustering
- K-Nearest Neighbors
- Decision Tree Learning
- Naive Bayes Classifier
- DBSCAN Clustering
Your algorithm, implemented as a Python class, should have the following
methods: fit
and predict
. You are only allowed to use base Python,
numpy
, and scipy
for the implementation of the core algorithm of
your choice. (For visualization and analysis, you can use other
libraries.) You may reference any outside materials that you need, but
copying and pasting another open-source implementation is strictly
prohibited.
You'll then use your implementation on an appropriate set of data and compare the results that you got from your implementation against the results yielded by the versions in the sklearn library.
For the second part of this Build Week project, you'll be writing up a HOW-TO blog entry that describes the algorithm, how to implement it, and what it's useful for.
Your target audience should be other developers who haven't seen the algorithm before.
There's no size limit, but a reader should be able to begin implementation of the algorithm based on the information presented.
Post your entry to any blog site, either your own or a platform like Medium.