/CSVAnalyser

Primary LanguagePythonMIT LicenseMIT

CSV-Analyser

Build test MIT license GitHub top language GitHub contributors DOI

Welcome to Group 7's repository for 22 fall Software Engineering homework 2 & 3!

This project is intended to read and analyze CSV files. Based on the example source code written in LUA, we implemented multiple functions in Python as listed below. To suppot these functions, we defined 5 classes with specific methods as described below.

Installation

git clone https://github.com/yzhu27/CSVAnalyser.git
cd ./CSVAnalyser
python ./main.py -e ALL

*Notice: run main.py in the root directory directly.

Functions

Read CSV

  • Import the input file to a dictionary line by line, separated by given separator.

CLI

  • Update information through command line. Help string would be printed if run "-h".

Generate Statistical Summaries

  • This function is for column data. For each column, the data is either numeric (which denoted with a leading upper case letter) or symbolic (which denoted with a leading lower case letter). Employ different statistical variebles to describe both types of data.

Classes

Cols

  • Record column names and variables, differentiating dependent variables and independent variables by leading letters of column names.

Rows

  • Record data by row.

Num

  • Num class is for calculating features of numeric data. Methods of add, mid and div are included, among which mid is stand for the middle value of the sorted data, while div means standard deviation of this column of numbers.

Sym

  • Sym class is for calculating features of symbolic data. Methods of add, mid and div are included, among which mid represents the most common symbol in the set; div is the entropy of these symbols.

Data

Test

The test cases are given by LUA source code and https://github.com/yzhu27/CSVAnalyser/blob/main/data/auto93.csv. Test coverage is:

Coverage

Coverage Report
FileStmtsMissCoverMissing
src
   Cols.py2011 95%
   Data.py3022 93%
   Num.py520100% 
   Row.py60100% 
   Sym.py270100% 
   init.py00100% 
   csv.py120100% 
   the.py271515 44%
   utils.py3855 87%
TOTAL2122389%