If using asdf
version manager, install the following by issuing asdf install
at the project root:
- Python 3
- Node.js
- PostgreSQL (use
pg_ctl start
to start) - Redis (use
redis-server
to start)
- Django
- Django REST Framework
- Pandas
- Pytest
- React
- React Router
- TypeScript
- Webpack
- Material UI
- Jest
Issue bin/setup
from the project root to install dependencies. The setup
script assumes the minimal requirements above are installed and running and
will:
- Create and activate a virtualenv at the project root
- Install Python dependencies
- Create a
.env
file to set development-mode config variables - Create and migrate the database
- Install npm dependencies
- Build assets
- Start the development server at http://127.0.0.1:8000
To tear down setup, use bin/setup --down
.
The project contains the following apps:
core
: container for project settings and middlewareapi
: a CSV file browser APIclient
: a React web client
GET / service root (serves a bare template that loads React app)
GET /api/ api-root (browseable API docs)
GET /api/csv-files/ list csv files
POST /api/csv-files/ create a csv file
GET /api/csv-files/:id/ get a csv file
DELETE /api/csv-files/:id/ delete a csv file
GET /uploads/<path> csv download
A CsvFile
model is used to store uploaded CSVs and prevent the storage of
duplicates.
# api/models.py L10-15 (bbfe0fd9)
class CsvFile(models.Model):
name = models.CharField(max_length=256)
digest = models.CharField(max_length=128, unique=True)
file = models.FileField()
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
[source]
When viewing details for a given CSV, Pandas is used to clean the data and perform aggregations efficiently.
# api/services.py L8-22 (890c78ce)
def parse_csv(csv_file):
"""
Load the given `csv_file` into a Pandas DataFrame, clean data, calculate
statistics. Return a CsvData object.
"""
csv_df = pd.read_csv(csv_file.file)
csv_table = cleaned_data_table(csv_df)
count_by_year = count_by_year_table(csv_df)
aggregations = {
'count_by_year': {
'name': 'Count by year',
'data': count_by_year,
}
}
return CsvData(table=csv_table, aggregations=aggregations)
[source]
Results are cached to prevent inefficient recomputation:
# api/views.py L48-61 (f8b30f3e)
def retrieve(self, request, pk=None, format=None):
try:
data = cache.get(f'csv:{pk}')
if not data:
# . . .
cache.set(f'csv:{pk}', data)
return Response(data)
[source]
In the client app, AppRouter
handles navigation to the main views, CsvList
and CsvDetail
:
// client/src/components/AppRouter/index.tsx L9-12 (8d1eb8aa)
<Switch>
<Route path='/csv/:id' component={CsvDetail} />
<Route path='/' component={CsvList} />
</Switch>
[source]
State is managed with useState
and useCallback
hooks.
// client/src/components/CsvList/index.tsx L32-46 (8d1eb8aa)
export default function CsvList() {
const classes = useStyles()
const [csvList, setCsvList] = useState([])
const [isUploaderOpen, setIsUploaderOpen] = useState(false)
const fetchCsvList = useCallback(async () => {
try {
const resp = await api.csvList()
setCsvList(resp)
} catch {
setCsvList([])
}
}, [])
useEffect(() => { fetchCsvList() }, [fetchCsvList])
[source]
- Deployment
- Automated testing
- Smarter cookie parsing
- Multiple file upload handling