/oslcrs

Open Source License Compliance Reporting System

Primary LanguageJavaScriptGNU General Public License v2.0GPL-2.0

			License Reporting Prototype


Purpose and Background
----------------------

As part of an open source license compliance program, we have developed
software to explore techniques that can accomplish license scanning
for large volumes of open source software, in a production setting.
The results of this license scanning should facilitate review of the
licenses being redistributed as well as reporting license content to
customers.  This project contains the data structures and code necessary
to manage a prototype open source license reporting service.


Operational Overview
--------------------

First, it's helpful to understand that there are two distinct portions
of oslcrs:

  1. oslcrs includes a set of database tables that maintain a "manifest"
     of open source packages.  Each manifest is tied to the "release"
     or "version" of a product, and the manifest includes the name,
     version, and release (NVR) of each open source package.  These
     release manifests may be organized around individual containers,
     organized into products, and finally product families.

  2. oslcrs maintains a library of open source packages that have been
     scanned for license and copyright content.  Additionally, certain
     package metadata is also captured and maintained.

If a customer requests a report or we desire to examine the license
content of the release of a product, the above two items are essentially
"joined", in a database sense.  The specific release manifest dictates
the exact set of packages involved, then the package metadata, license,
and copyright results from each manifested package are combined to
produce a report for the release.

As an aside, this structure allows us to report on custom sets of
packages, sets that might not correspond to a particular release of
a product.  All that is necessary is for us to create a "manifest"
for the desired report, then query the report for that "manifest".

Finally, be aware that there's an additional piece of complexity in
oslcrs.  We envisioned the eventual system needing to be loosely-coupled,
with respect to the above two portions of oslcrs.  We expected that users
might want to submit an individual package for scanning and view the
license/copyright results for that package.  This might happen without the
package being part of a product release.  So #2 above, needs to operate
independently.  Similarly, we also envisioned receiving a release manifest
(#1) for a product, even before all the packages in that manifest have
been scanned.  So oslcrs needed to be designed to accept manifests that
include versions of packages that have yet to be scanned.

One additional note: Although the web UI is coming along in the area
of exploring packages and licenses, we haven't put any effort into
loading data through a UI.  Right now, all data is supplied through
specifically-formatted json data structures.  We did this because it
was the easiest way to get started, and ended up being a simple way of
interfacing oslcrs to other tools.  If you're just kicking the tires,
this will be an impediment.  Hopefully, this situation will change in
a future release, or some kind soul will contribute some good UI code.

With the above as background, let's consider one use case.  We might
add some other use cases in an updated version of this README.

CASE #1: Customer Requests a Report on a Specific Release of a Product

Here, the first task is to obtain a list of packages that are associated
with that product's release.  Obtaining this list is not within the
scope of these tools.  Each product might do things differently, so
it's likely that you might need to contact the product team to get
the list of packages in that specific release.  (A sister repository,
component-registry, is an example of a tool being used to collect the
list of packages associated with product releases.)

Assuming that you have a list of packages, one per line, the "manifest"
script can help convert your list into a json structure that can be
imported into oslcrs.

If oslcrs doesn't already have an entry for this product, you'll need to
add that as well.  As of this writing, there's no tool for this task, and
it's probably easiest to grab a copy of the oslcrs/examples/product.json
file, modify it with a text editor, and import that into oslcrs.
Whenever you edit json files, be careful with quotes.  (The "jq" tool
can help validate your json structures.)

Once the manifest is imported, oslcrs can immediately produce a report for
the packages it already knows about.  But at this point, you will probably
find that additional packages need to be scanned.  The packages that need
scan data show up as "Missing Packages" on the "Release Details" page.
Clicking into this "Missing Packages Report" and using the "json format"
link will allow you to capture this list of packages that need analysis.


Software Overview
-----------------

Within this project, a "database" subdirectory contains the SQL statements
and other tools necessary to create and maintain the database that backs
the system.  See the README there for more details.

The main application is "oslcrs.py".  It is a Flask application.
Other files may be documented here as needed.


Setting Up the Application
--------------------------

In a subdirectory of your choice, create a new subdirectory that will
hold the application and all it's dependencies.  For example:

  mkdir oslcrs-app

Check out the current oslcrs code from github, for example:

  cd oslcrs-app
  git clone git@github.com:RedHatProductSecurity/oslcrs.git

This will create a new subdirectory called "oslcrs" containing the main
application code.  Within that new subdirectory is a shell script that
will set up the application and any dependencies:

  cd oslcrs; ./setup

The application needs access to a postgres database that contains the
application data.  We don't want to be creating this database with the
"setup" script, or else your old analysis data will get wiped every time
you do an update.  Instead, cd into the "database" subdirectory and see
the instructions there in order to create your first oslcrs database.

Since the database itself is outside control of the setup script, the
application needs to know certain database parameters: database host,
database user, etc.  These will need to be configured by you.  The "setup"
script will create a template file to set these environment variables,
but you'll need to adjust the values to suit your database configuration.
The "setup" script will leave you some instructions.


Running the Application
-----------------------

In order to run locally, you just need to run the oslcrs script.  Since
the application may want to run unattended for long periods of time, you
might want to us "nohup", or otherwise arrange for it to run as a service.

There is a local file, oslcrs.service, that can be used from within
/etc/init.d/ in order to automatically start oslcrs on a machine.
To use this file, probably move it to /etc/init.d/, call it "oslcrs",
fix permissions, etc.  The new service will need to be registered.
The steps to do this are specific to the particular flavor of Linux
you're running, and beyond the scope of this README.