see tracebacks.md and widgets.md for answers to the other challenges
Create a web application that
- can ingest the attached example file (see modified file at: census_2009b.txt) and any other flat file with a viable target column.
- validates Benfordβs assertion based on the '7_2009' column in the supplied file
- Outputs back to the user a graph of the observed distribution of numbers with an overlay of the expected distribution of numbers.
- The output should also inform the user of whether the observed data matches the expected data distribution.
- The delivered package should contain a docker file that allows us to docker run the application...
- Bonus points for automated tests.
- Stretch challenge: persist the uploaded information to a database so a user can come to the application and browse through datasets uploaded by other users. No user authentication/user management is required here
The .env
file found in the app/env folder has most of the environment
variables you need to set up already filled in.
There are also hard-coded values for logging in the app/config folder
in config.py
.
Possible improvements
- move hard-coded values into one or two (secrets in one file, public values in another)
- enable initialization of Flask
app
via factory method to enable flexible environments
Save this directory somewhere, unzip if necessary.
If this is your first time running it,
cd into the app
folder, then run:
docker-compose --env-file=./env/.env up --build
otherwise you can run:
docker-compose --env-file=./env/.env up
If the above was successful, open your browser to localhost:6050
Once you first run the app, you need to create the tables using the
button provided. This sends a GET to /admin/db
, and you should get
a green alert bar if it is successful.
Access pgAdmin at localhost:5050.
Check the env/.env
file for the pgAdmin user email
and password (l7db_user@l7db.com and l7db_password).
)
Connect to the server by clicking on Server (on the left),
then give it a name, and the host should be host.docker.internal
with port 5432
,
and use postgres user and password from the .env file
(l7db_user, l7db_password).
Future/Possible improvements Future improvements would include using Alembic or other libraries for real migrations and setup.
Also on the homepage is a button for running tests (/admin/tests
).
This is a limited number of unittests created to fit the
POC nature of this project. The output that would normally be
viewable in the console will appear as a string in the page.
Future/Possible improvements
As this app only has a limited number of tests, the first step would
be increasing coverage.
After that testing could be expanded by using other testing libraries
such as Selenium for UI testing, API/endpoint testing, etc. as well as
adding missing class, method, and module docs along with examples to enable tests to be run using doctest
(see
example below).
def example_generator(n):
"""Generators have a ``Yields`` section instead of a ``Returns`` section.
Args:
n (int): The upper limit of the range to generate, from 0 to `n` - 1.=
Yields:
int: The next number in the range of 0 to `n` - 1.
Examples:
Examples should be written in doctest format, and should illustrate how
to use the function.
>>> print([i for i in example_generator(4)])
[0, 1, 2, 3]
"""
If you go to create/upload from the homepage, you can upload a file and choose to save it to the database; the instructions are at the top.
This was implemented as simply as possible for this POC as user-submitted file uploads can get complicated.
To try it out, you can upload one of the two test files in
/app/static/resources
:
Once you upload data, you should see two Plotly line graphs, a results table breaking down frequency counts, percentages, and expected results (to run the Benford's test).
Addtionally, there are 3 tests to inform the user if the data matches the expected distribution.
Once either the /view/jobs
or /view/data
pages return over 1000 results,
there is some rudimentary pagination by offset in place. You can paginate with the arrows,
or update the request parameter.
This project uses networked Docker containers and volumes for storing data.
Flask is a Python-based, lightweight Web Server Gateway Interface web application framework. Flask depends on a few other Pallets projects that together, handles web pages, web forms, routing requests, etc.
The database and SQL operations are managed with SQLAlchemy. SQLAlchemy is a library that consists of an SQL toolkit and an ORM.
The database (stored in a Docker volume) is PostgreSQL. One of the containers is running pgAdmin to enable better database maintenance, viewing, management, etc.
The pages are rendered with Jinja. Jinja uses the curly-brace annotation to serve data dynamically. You will find this data in the templates folder. the render_template functions (see below) are seen mostly in run.py.
<!-- my template file-->
<h1>Welcome! <small>My message to you: {{ page_data.hello_world }}</small></h1>
This is currently using Bootstrap 5 and Fontawesome with other ancillary JavaScript libraries. It also uses Plotly to render graphs.
π parent_folder
.gitignore -- unused, but nice to have for github
README.md
tracebacks.md
widgets.md
π app
__init__.py
run.py
docker-compose.yml
Dockerfile
π config
π data_access
π env
π models
π services
π static
π templates
π tests
π util
Found in the main app folder, the run.py
file contains
the things that are closest to the front-end, essentially.
The delineated allowable URLs (routes) are defined here, as well as
any custom Jinja template functions.
This is also where the app itself is created and app-specific
configurations are configured.
This houses a few, small functions for verifying the required database config variables have been provided in the env file and a dict with a mostly hard-coded logging config to set the format for the output, optional output file, etc.
data_acces/db_dto
contains a handful of functions for interacting with
the database. These are called by methods in run.py
This contains some SQLAlchemy mixins (utility functions to allow for easier introspection, validation, etc.) and model definitions for the tables in the database. If you need to update the table definitions, add new tables, this is where you do it.
This contains most of the functionality of the app like interacting directly with the database, calculating the Benford's values, differences, etc., reading input files, and creating Plotly graphs to visualize the output.
Basic webpage stuff like images, js, css libraries, etc.
Jinja templates for main views along with partials (specific snippets to render things like a table, a header, etc.).
Simple unit tests based on some of the services.
Small library with functions that, in theory, could be more widely-used
across the app, though there are only a few in common.py
.