relaton/relaton-nist

Refactor NIST PubID to use pubid-nist gem

Opened this issue · 13 comments

mico commented

@andrew2net @ronaldtse What is the format for input data for RelatonNist::NistBibliography.search or RelatonNist::NistBibliography.get?
It's not new NIST PubID at the moment, should it be?

In the specs I found input could be:

NISTIR 8200
SP 500-304
SP 800-189(PD)
SP 800-67r1
NIST SP 800-67 Rev. 1
NIST SP 800-57pt1r4
NIST IR 8011v4

In README.adoc I found you can use only code:

RelatonNist::NistBibliography.get("8200", "2018", {})
[relaton-nist] ("8200") fetching...
[relaton-nist] ("8200") found NISTIR 8200
=> #<RelatonNist::NistBibliographicItem:0x007fc06aa2b480

Should we keep it all the same way? (Then I need to write parser from this format to NIST PubID)

Also, I'm struggling with specifications for docidentifier format for JSON data from
https://csrc.nist.gov/CSRC/media/feeds/metanorma/pubs-export.zip (relaton-nist using it to return bibliographic data).
Is it old NIST PubID specification? Where I could get it?
For example docidentifier there could be like SP 800-189 (Draft), as @ronaldtse mentioned before (metanorma/pubid-nist#15 (comment)) Draft in old NIST PubID version is a Final Public Draft in the new version.

Depending on our decision, what we will use as input (new NIST PubID or old or something else) there could be several ways how to convert input to docidentifier to search through JSON data.
How it works now:
SP 800-189(PD) -> SP 800-189 (Draft)

With new NIST PubID as input:
NIST SP 800-189(FPD) -> SP 800-189 (Draft)

With updated data:
SP 800-189(PD) -> SP 800-189 (Final Public Draft)

With new NIST PubID as input and updated data:
SP 800-189(FPD) -> SP 800-189 (Final Public Draft)

mico commented

BTW, @ronaldtse what is "Retired Draft" in the new NIST PubID?
We use the same code to find "Retired Draft" as for "Draft":

SP 800-189(PD) -> SP 800-189 (Draft)
SP 800-80(PD) -> SP 800-80 (Retired Draft)

Why?

@andrew2net @ronaldtse What is the format for input data for RelatonNist::NistBibliography.search or RelatonNist::NistBibliography.get? It's not new NIST PubID at the moment, should it be?

It should support the old format, and also support PubID.

In the specs I found input could be:

NISTIR 8200
SP 500-304
SP 800-189(PD)
SP 800-67r1
NIST SP 800-67 Rev. 1
NIST SP 800-57pt1r4
NIST IR 8011v4

In README.adoc I found you can use only code:

RelatonNist::NistBibliography.get("8200", "2018", {})
[relaton-nist] ("8200") fetching...
[relaton-nist] ("8200") found NISTIR 8200
=> #<RelatonNist::NistBibliographicItem:0x007fc06aa2b480

The README only provides one sample, it's not representative of the patterns supported.

Should we keep it all the same way? (Then I need to write parser from this format to NIST PubID)

Yes.

Also, I'm struggling with specifications for docidentifier format for JSON data from https://csrc.nist.gov/CSRC/media/feeds/metanorma/pubs-export.zip (relaton-nist using it to return bibliographic data). Is it old NIST PubID specification? Where I could get it? For example docidentifier there could be like SP 800-189 (Draft), as @ronaldtse mentioned before (metanorma/nist-pubid#15 (comment)) Draft in old NIST PubID version is a Final Public Draft in the new version.

pubs-export.zip uses pre-PubID identifiers. The NIST PubID is not yet active at NIST.

We will still need to support all documents in the pubs-export.zip

Depending on our decision, what we will use as input (new NIST PubID or old or something else) there could be several ways how to convert input to docidentifier to search through JSON data. How it works now: SP 800-189(PD) -> SP 800-189 (Draft)

With new NIST PubID as input: NIST SP 800-189(FPD) -> SP 800-189 (Draft)

I'm not sure how this works. How can "FPD" => "Draft"? They are different things.

With updated data: SP 800-189(PD) -> SP 800-189 (Final Public Draft)

Why is "PD" => "FPD"?

With new NIST PubID as input and updated data: SP 800-189(FPD) -> SP 800-189 (Final Public Draft)

"FPD" => "Final Public Draft" is only for the longer form of PubID output, right?

In any case, we need to take any (PubID + legacy identifier) input, and produce only PubID output.

BTW, @ronaldtse what is "Retired Draft" in the new NIST PubID? We use the same code to find "Retired Draft" as for "Draft":

SP 800-189(PD) -> SP 800-189 (Draft)
SP 800-80(PD) -> SP 800-80 (Retired Draft)

Why?

  • PD is Public Draft
  • Retired Draft is a draft, in any form (PD, WD, PRD...) that has been "retired". Such a draft exists in its original form. For example. if SP 800-80 was a "PD", then the "PD" got retired without a next stage, then it would still exist as "SP 800-80(PD)".

I don't fully understand the question. What do you mean by "find"?

relaton-nist uses 2 datasets:

  • NIST Tech Pubs
  • NIST CSRC pubs-export
mico commented

It should support the old format, and also support PubID.

Is old format specifications available anywhere?

The README only provides one sample, it's not representative of the patterns supported.

Should we support search by partial data? (just "8200" instead of "NISTIR 8200")

I'm not sure how this works. How can "FPD" => "Draft"? They are different things.

You mentioned here metanorma/pubid-nist#15 (comment) "PD" is something like "FPD". When I have "(PD)" in the original request, I should look for "(Draft)" in NIST CSRC pubs-export's docidentifier.

I don't fully understand the question. What do you mean by "find"?

RelatonNist::NistBibliography.get("SP 800-189(PD)", nil, {}) returns document with docidentifier SP 800-189 (Draft)
RelatonNist::NistBibliography.get("SP 800-80(FPD)", nil, {}) returns document with docidentifier SP 800-80 (Retired Draft)

mico commented

The datasets NIST Tech Pubs nor NIST CSRC pubs-export don't contain any identifiers with draft stages like IPD/FPD/2PD, only "Draft" which is "(PD)".
And there are some data are missing, for example, we don't have there NIST SP(2PD) 1800-13B (https://www.nccoe.nist.gov/sites/default/files/legacy-files/psfr-mobile-sso-nist-sp1800-13b-draft-v2.pdf)
Seems we don't have any documents using new NIST PubID there.

So I need a separate parser for this or include legacy parser and converter (PD -> Draft) to nist-pubid.

I doubting if it's the right moment to use NIST PubID parser for relaton-nist while we don't have any publications NIST PubID on datasets.

@ronaldtse Any thoughts on that?

Is old format specifications available anywhere?

There is no particular specification but just a convention. Check the Relaton-NIST code, the pubs-export.zip file and the NIST Tech Pubs XML file for the patterns used.

Should we support search by partial data? (just "8200" instead of "NISTIR 8200")

Probably not. We should support variants though, e.g. "NISTIR 8200" and "NIST IR 8200".

You mentioned here metanorma/pubid-nist#15 (comment) "PD" is something like "FPD". When I have "(PD)" in the original request, I should look for "(Draft)" in NIST CSRC pubs-export's docidentifier.

This is a major confusion that I need to clarify:

  1. NIST PubID is NOT YET in use. You will NOT find any official PubID because the scheme is NOT IN USE YET. This work we're doing here is facilitating this migration to happen.
  2. The point of the nist-pubid gem is not only to parse NIST PubIDs. It is to parse ANY NIST document identifier and translate the old document identifier into NIST PubID.
  3. Therefore, the relaton-nist gem is to use the nist-pubid gem to handle both cases:
  • Now: data sources are all not in the new NIST PubID format
  • Later: data sources are all in the new NIST PubID format

Does this explain all the questions above?

"PD" is something like "FPD". When I have "(PD)" in the original request, I should look for "(Draft)" in NIST CSRC pubs-export's docidentifier

What I meant by "PD" (Public Draft) is "like" "FPD" (Final Public Draft) is this:

  1. NIST CSRC (a department of NIST) uses multiple stages for a document. i.e. WD, PRD, IPD, 2PD, ... FPD, Published.
  2. Some NIST labs/departments use less draft stages. i.e. WD, PD, Published
  3. When I say some "PD" is like "FPD", I mean that this is the last development stage before official publication.

When I have "(PD)" in the original request, I should look for "(Draft)" in NIST CSRC pubs-export's docidentifier

Yes.

Notice that right now, both of these public data sources

  • DO NOT CONTAIN INTERNAL STAGES. i.e. you will not find any "non-public" drafts in these data sources
  • Hence in the current (old) document identifier format, NIST CSRC stages are NOT INCLUDED

RelatonNist::NistBibliography.get("SP 800-189(PD)", nil, {}) returns document with docidentifier SP 800-189 (Draft)
RelatonNist::NistBibliography.get("SP 800-80(FPD)", nil, {}) returns document with docidentifier SP 800-80 (Retired Draft)

In the case of "SP 800-80(FPD)":

Again, right now, all the "Drafts" in the data sources are "PD"s (Public Drafts). There are *NO FPDs, IPDs, 2PDs, ...etc.

NOTE: RelatonNist::NistBibliography.get("SP 800-189(PD)", nil, {}) would certainly be easier if it was just RelatonNist::NistBibliography.get("SP 800-189(PD)").

Related to this: #62 (comment)

@mico the prefixes like ISO, NIST, etc are used in the relaton gem to route requests to appropriate gem (relaton-iso, relaton-nist, etc). The relaton-nist ignores the NIST prefix in references.

@ronaldtse currently we download the CRSC file on local computer and search through it. If we start using pubid-nist as a IDs parser it will slow down the search because the parslet is quite slow. Maybe we need to transform the CRSC to data repository with index similar other relaton-data-* repos, don't we?

@andrew2net I agree that we should have a relaton-data-nist repo. Let's create it based on both CSRC and NIST-Tech-Pubs content. Thanks!

@mico @andrew2net is this task ready? Thanks!

@ronaldtse the issue blocks this.