
This is the design archive for CSCD01, a 4th-year CS course at the University of Toronto, offered during the Winter 2021 term.

CSCD01H3: Engineering Large Software Systems, An introduction to the theory and practice of large-scale software system design, development, and deployment. Project management; advanced UML; requirements engineering; verification and validation; software architecture; performance modeling and analysis; formal methods in software engineering.

Our Goal

Our team aims to contribute to scikit-learn by addressing the following five issues from their GitHub repository:

Easy level Medium level Hard feature
#18837, #11209 #597, #14257 #18968

Our Strengths

  • #18837: This issue focuses on fixing problems with default kwarg in most functions in scipy.linalg. We are skilled in debugging code, so we anticipate minimal difficulties in inspecting the codebase to identify the modules requiring changes.

  • #11209: This issue involves implementing the Imputation transformer to complete missing values in a matrix. We find the description of the imputation strategy and parameters straightforward.

  • #597: We have taken algorithm courses that equip us with clear reasoning for addressing this issue. Additionally, we are familiar with the data structures mentioned.

  • #14257: This feature request aims to combine TimeSeriesSplit (which provides train indices) with the group awareness of other cross-validation strategies. We chose this feature because existing cross-validation split strategies can serve as useful references for developing and testing.

  • #18968: This new feature will allow NaN values to pass through OrdinalEncoder, which currently cannot be fitted with NaNs. We selected this feature because it likely encounters issues similar to those in #579, which we have previously examined.


Zhifei Song songzhif

Kevin Zhu zhukevi6

Yifei Gao gaoyife5

Chunang Xu xuchunan

Xinzheng Xu xuxinzhe

Shu-Shian Wang wangs314