Metadata Proposal for Docs
ovflowd opened this issue · 84 comments
FYI: This Description is Outdated! (Need update)
As discussed in our Collaborator Summit 2022 edition, we discussed a series of proposals within the current way we structure the metadata of our API docs. This proposal will eventually deprecate specific proposed changes here.
Within this issue, we will adhere to naming the proposal as an "API metadata proposal" for all further references.
The API Metadata Proposal
Proposal Demo: https://github.com/ovflowd/node-doc-proposal
Introduction
What is this proposal about? Our API docs currently face a few issues varying from maintainability and organization to the extra tooling required to make it work. These are namely the following:
- The current infrastructure for doc generation is non-standard and not easy to contribute/update for newcomers as it does complex ASTs with
unified
. Making it harder to debug, update or change how things are done - We use a specifically crafted Remark Plugin (and ESLint config) to make some non-conforming rules work. Ultimately the ESLint plugin is neither ensuring that certain things are valid Markdown.
- Our API docs use non-conforming Markdown, which is incompatible and not standard. As most of the Markdown parsers and linters are becoming stricter, eventually (and already for specific parsers such as MDX), it will fail. Namely, for example, our inline YAML snippets are also not validated. Hence, some have "invalid" YAML syntax.
- We require our infrastructure to interpolate content from Markdown and guess what is being done. For example, to get the Stability Index, the Level of Heading, or if the section refers to a class or method.
- Some Markdown files are way too big. This outright makes the build process complex, and some pages become massive for the Web, being unreasonable for metered internet connections.
- Not to mention that from a maintainability standpoint, this is unfeasible.
- This proposal will also achieve better-generated doc metadata that can be used by projects such as TypeScript
- This proposal will also allow Internationalisation to be done as the metadata is separated from the actual Markdown files.
There are many other issues within the current API docs, from non-standard conventions to ensure that rules are appropriately made, from maintaining those files to creating sustainable docs that are inclusive for newcomers and well detailed.
The Proposal
This proposal, at its core, boils down to 4 simple changes:
- All the actual API structure/metadata gets extracted to dedicated YAML files
- Each YAML file has its corresponding Markdown file
- E.g.,
doc/api/modules/fs/promises.metadata.yml
hasdoc/api/modules/fs/promises.en.content.md
- The folder structure for API docs gets updated in a tree fashion for the modules
- Each class has its YAML and Markdown file
- TL;DR files are broken down into their minimal section (being a class)
- Markdown file is responsible for:
- Descriptions
- Introductions
- Examples
- References
- Real-world usages
Re-structuring the existing file directory
In this proposal, the tree of files gets updated by adopting a node approach (pun intended) for how we structure the files of our API docs and how we name them.
Notably, these are the significant changes:
- The nature of a file categorizes the top-level folders; for example, anything related to a Node.js module will reside within
modules
. Globals, will, for example, reside withinglobals
- There's no concrete list of all the possible-top level folders for now; for example, "About this documentation," "How to install Node.js," or another kind of general Documentation related to Node.js would probably not fit on any of these folders. A suggestion would be a
misc
folder, but this is open for debate as this is not a crucial point.
- There's no concrete list of all the possible-top level folders for now; for example, "About this documentation," "How to install Node.js," or another kind of general Documentation related to Node.js would probably not fit on any of these folders. A suggestion would be a
- The second level of folders, in the case of
modules
, is the name of themodule
(top-level) import. For example, "File Systems" would be "fs
" Resulting indoc/api/modules/fs
- Any other level of sub-directories would be a sub-namespace of the module. For example,
node:fs/promises
would bedoc/api/modules/node/fs/promises
. - Finally, the last level would be the name of a Class
e.g., doc/api/modules/node/fs/promises/file-handle.yaml
, Whereas for thepromises
import itself, it would bedoc/api/modules/node/fs/promises.yaml
- You will notice in the first case
promises
is a folder and in the second a YAML file; that's because we're following a Node approach, just like a Binary-Tree.
- You will notice in the first case
Accomplishing this change
This can be quickly done by an automated script that will break down files and generate files. Using a script for tree shaking and creating this node approach would, in the best scenarios, work for all the current files existing on our doc/api
and, worst case scenario 98% of the files, based on the consistency of adoption and how modules are following these patterns.
Extracting the metadata
As mentioned before, the Markdown files should be clean from the actual Metadata, only containing the Description, Introduction (when needed), Examples (both for CJS and MJS) and more in-depth details of when this class/method should be used, and external references that might be useful.
Extracting the metadata allows our contributors and maintainers to focus on writing quality documentation and not get lost in the specificities of the metadata.
What happens with the extracted metadata?
It will be added to a dedicated YAML file containing all the metadata of a particular class, for example. (We created a new tooling infrastructure that would facilitate this on being done here.
The metadata structure will be explained in another section below.
The extraction and categorization process can be automated for all modules and classes, reducing (and erasing) the manual work needed to adopt this proposal.
Enforcing the Adoption of best practices
The actual content of the Markdown files will be "enforced" for Documentation reviewers and WGs for specific Node.js parts, possibly by the adoption of this PR.
The Metadata (YAML) schema
Similarly to the existing YAML schema, it would namely be structured as this:
name: 'api/modules/crypto/certificate'
source: "lib/crypto.js"
stability: stable
tags:
- "certificates"
- "digital certificates"
history:
- type: added
versions: [v0.11.8]
methods:
- name: exportChallenge
stability: deprecated
static: true
history:
- type: added
versions: [v9.0.0]
pullRequest: "https://github.com/nodejs/node/pull/35093"
details: "crypto.certificate.method.exportChallenge.history.[0].details"
params:
- name: spkac
optional: false
types:
- String
- ArrayBuffer
- Buffer
- TypedArray
- DataView
- name: encoding
details: "crypto.certificate.method.exportChallenge.params.[1].details"
optional: true
types:
- String
defaults:
- "UTF-8"
returns:
- type: Buffer
details: "crypto.certificate.method.exportChallenge.returns.[0].details"
constants:
- name: S_IXUSR
import: "fs.constants.S_IXUSR"
The structure above allows easily to structure and organise the metadata of each method available within a Class and quickly describe the types, return types, parameters and history of a method, Class, or anything related.
I18n and ICU on YAML files
The structure is also I18N friendly, as precise text details that should not be defined within the Markdown file can be easily referenced using the ICU format. These details can be accessed on files that match the same level of a specific module. For the example above, for example, doc/api/modules/node/fs/promises.en.i18n.json
contains entries that follow the ICU format such as:
{
"fs.promises.tags": ["writing files", "creating files", "file systems"],
"fs.promises.method.lchmod.returns.[0].details": "The lchmod method returns a Boolean when specific parameters are ....",
...
}
Specification Table
The table below demonstrates the entire length of the proposed YAML schema.
Note.: All the properties of type Enum
will have their possible values discussed in the future, as this is just a high-level specification proposal.
Top Level Properties
Field | Optional | Type | Description |
---|---|---|---|
name |
No | String |
The Heading ID identifier for that module, should usually be the path of module on the doc folder. |
import |
No | String |
The canonical import of the module (i.e. the string used to import this class/module). This will generate on CJS/MJS imports usages |
stability |
No | Enum |
The Stability of a given module. It follows the widely adopted "Stability Index" from our existing docs. |
tags |
Yes | Lang ID |
A translation ID for tags used to identify or help users to find this method with Search engines. |
history |
Yes | Array<History> |
An array of history entries to decorate the notable historical changes of that module |
methods |
Yes | Array<Method> |
The methods of that class/module |
constants |
Yes | Array<Constant> |
If the Language is enabled and currently supported by the website. It should only be enabled if both the I18n team and Nodejs.dev team agrees that sufficient content for that page was translated. |
source |
Yes | String |
The path to the source of that class/module |
History
Field | Optional | Type | Description |
---|---|---|---|
type |
No | Enum |
The type of the change |
pullRequest |
Yes | String |
An optional Pull Request for the related landed change |
issue |
Yes | String |
An optional Issue link for the related landed change |
details |
Yes | Lang ID |
A translation ID for extra short details of that change. Actual details should usually link to a PR or Issue |
versions |
Yes | Array<String> |
An array containing the versions this change impacted initially |
when |
Yes | String |
A date string following the ISO-8601 (https://en.wikipedia.org/wiki/ISO_8601) |
Method
Field | Optional | Type | Description |
---|---|---|---|
name |
No | String |
The Heading ID identifier for the method. It should also reflect to the actual name that is imported |
stability |
No | Enum |
The Stability of a given module. It follows the widely adopted "Stability Index" from our existing docs. |
tags |
Yes | Lang ID |
A translation ID for tags used to identify or help users to find this method with Search engines |
history |
Yes | Array<History> |
An array of history entries to decorate the notable historical changes of that method |
returns |
No | Array<ReturnType|Enum> |
An array containing the return types of the method |
params |
Yes | Array<MethodParam> |
An array containing the parameters of the method |
MethodParam
Field | Optional | Type | Description |
---|---|---|---|
name |
No | String |
The name of the parameter of the method |
optional |
No | Boolean |
If the parameter is optional or not |
defaults |
Yes | Array<ParameterDefault> |
An array containing the default values of the Parameter |
types |
No | Array<ParameterType|Enum> |
An array containing the types of the Parameter |
ReturnType, ParameterType, ParameterDefault
Field | Optional | Type | Description |
---|---|---|---|
details |
Yes | Lang ID |
A Translation ID for the details of this return type |
type |
No | Enum |
The type of the return type |
Incorporating the Metadata within the Markdown files
As each Class has numerous methods (possibly constants) and more, the parser needs to know where to attach the data within the final generated result when, for example, building for the web.
This would be quickly done by using Markdown compatible Heading IDs
# File Systems {#api/modules/node/fs/promises}
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Quisque non tellus orci ac. Maecenas accumsan lacus vel facilisis volutpat est velit egestas. Placerat in egestas erat imperdiet sed euismod. Egestas maecenas pharetra convallis posuere morbi leo urna molestie at. Ultricies mi eget mauris pharetra et ultrices neque ornare aenean. Sodales ut etiam sit amet nisl purus in. Nunc pulvinar sapien et ligula ullamcorper malesuada. Pulvinar neque laoreet suspendisse interdum. Lectus proin nibh nisl condimentum id. Habitant morbi tristique senectus et netus et malesuada fames ac. Nulla porttitor massa id neque aliquam vestibulum morbi.
## Method: LCHMOD {#lchmod}
Curabitur gravida arcu ac tortor dignissim convallis. Urna id volutpat lacus laoreet non curabitur. Sem integer vitae justo eget. Amet purus gravida quis blandit. Posuere urna nec tincidunt praesent semper feugiat nibh sed pulvinar. Nunc eget lorem dolor sed viverra ipsum nunc. Dignissim cras tincidunt lobortis feugiat. Maecenas pharetra convallis posuere morbi leo. Volutpat lacus laoreet non curabitur gravida arcu. Leo a diam sollicitudin tempor id.
....
The parser would map the Heading IDs to each YAML entry's name
fields to the associated Heading ID. Allowing you to write the Heading as you wish by still keeping the Heading ID intact.
Naming for Markdown files
To ensure that we have a 1:1 mapping between YAML and Markdown, the Markdown files should reside in the same folder as the YAML ones and have the same name, the only difference being the Markdown files have the .md
extension in lowercase. They're suffixed by their languages e.g. .en.md
.
Note.: By default, the Markdown files will default to .en.md
extension.
The Build Process
Generating the final result in a tangible readable format for Humans and IDE's is no easy feat.
The new tooling build process would consist of two different outputs:
- Generating JSON files from the YAML metadata.
- These are namely used for JSDocs or IDE scanning/IntelliSense, such as TypeScript (cc @nodejs/typescript)
- Generating MDX Buffers that our Websites can use
- MDX is a JSX-in-Markdown format that allows us to insert Reactive-Components within our Codebase
- The idea here is, during the build process, to generate a Buffer that is the combination of the plain Markdown + React Components that are used to render the Metadata.
- This is more tooling required for the end-users of the documentation and is also helpful in previewing the documentation. This must be discussed on a separate Issue to address topics such as:
- Where should the tooling reside
- How to generate documentation previews just containing the documentation (not the whole website) and also allow generating docs only of what you changed (e.g., generating previews of a specific file)
- How would be the categorization of the files
- How would the links for the files and redirects from the old API schema to the new one
Example of the file structure
An essential factor in easing the visualization of how this proposal would change the current folder structure is to show an example of how it would look with all the changes applied. The snippet below is an illustration of how it would look.
Note.: The root directory below would be doc/api
.
├── api
│ ├── en.navigation.md
│ ├── documentation.en.content.mdx
│ ├── modules
│ │ ├── en.navigation.md
│ │ ├── fs
│ │ │ ├── en.navigation.md
│ │ │ ├── index.metadata.yml
│ │ │ ├── index.en.content.md
│ │ │ ├── promises.metadata.yml
│ │ │ ├── promises.en.content.md
│ │ │ └── ...
│ │ ├── streams
│ │ ├── crypto
│ │ │ ├── en.navigation.md
│ │ │ ├── webcyrpto.metadata.yml
│ │ │ ├── webcyrpto.en.i18n.json
│ │ │ └── webcrypto.en.content.md
│ │ └── ...
│ ├── globals
│ ├── others
│ ├── packages.en.content.md
│ └── ...
└── ...
The Navigation (Markdown) Schema
Navigating through the API docs is as essential as displaying the content correctly. The idea here is to allow each module
to define its Navigation entries and then generate the whole Navigation by aggregating all the navigation files.
Book of Rules for the Navigation System
- The Navigation file is made in Markdown and has a reserved name (
navigation.md
) - A navigation file can be on any sub-level of any directory
- Navigation files are not imported automatically
- The build-tools specify the main Navigation file (e.g.:
build-docs --navigation-entry=doc/api/v18/navigation.md
) - The order of items is respected as-is
- Each Item can be either a:
- Heading without a link
- Heading referring to an entry (YAML file)
- Heading referring to another Navigation file (To import the entries there)
- Cool part is that Navigation items can be anything you want, not limited to something generated.
Note.: The Navigation source would be on Markdown, using a Markdown List format with a maximum of X-indentation levels.
The Schema of Navigation
The code snippet below shows all examples of the Schema and how it would be generated in the end.
File: doc/api/v18/en.navigation.md
* [About this Documentation](documentation.en.content.md)
* [Modules](modules/en.navigation.md)
* Some Header
* Sub-Levels Supported
* To a certain-max-level
* [An External Link](https://nodejs.org/certification)
File: doc/api/v18/modules/en.navigation.md
* [File System](fs/en.navigation.md)
* [Streams](streams/en.navigation.md)
File: doc/api/v18/modules/fs/en.navigation.md
* [About File Systems](fs.en.content.md)
* [File System Promises](promises.en.content.md)
* ....
Example output in Markdown
* [About this Documentation](documentation.en.content.md)
* Modules
* File System
* [About File Systems](fs.en.content.md)
* [File System Promises](promises.en.content.md)
* Streams
* ....
* Some Header
* Sub-Levels Supported
* To a certain-max-level
* [An External Link](https://nodejs.org/certification)
It is essential to mention that the final output of the Navigation would be Markdown and can be used by the build tools to either generate an output on MDX or plain HTML or JSON.
Conclusion
As explained before, the proposal has several benefits and would significantly add to our Codebase. Knowing that the benefits vary from tooling, build process, maintainability, adoption, ease of documentation, translations, and even more, this proposal is faded to succeed! Also, all the items explained here can be automated, ensuring a smooth transition process.
@nodejs/tsc @nodejs/documentation
We'll need to make sure this process doesn't add any work for releasers. (I don't think it would, but writing it here just in case.)
This will also be a good opportunity hopefully to fix our version picker quirks, at least for future versions of Node.js.
I like this a lot, although of course we'll see what kinds of unforeseen practical problems (if any) arise in the course of implementation.
I wonder if 20.x and forward is more realistic than 18.x and forward. I wouldn't complain if we got this working sooner than 20.x though.
Can we try to determine which parts of this can be done incrementally and which need to happen all-at-once? I'm trying to understand how many steps are involved here. (And if it's one big step, that's OK, but of course we'll want to automate everything because keeping the docs in synch with the current version will be an annoying problem otherwise.)
Is the idea that this would work on the current nodejs.org as well as on nodejs.dev or is the vision here that the nodejs.dev tech/build stack replaces what's on nodejs.org and that's a prerequisite for this to work?
Is the idea that this would work on the current nodejs.org as well as on nodejs.dev or is the vision here that the nodejs.dev tech/build stack replaces what's on nodejs.org and that's a prerequisite for this to work?
In theory, it could also work on nodejs.org, as if we enter the topic of "The build process" if we outsource the tooling created on the nodejs.dev repo (which should be pretty much independent of whatever static-framework stuff you use). Yes. A few tweaks would be needed, but in the end, we could reuse the HTML generation part of the existing nodejs/node/tool/doc
.
For nodejs.dev
no extra steps are needed, yet, I would like to outsource the tooling.
Can we try to determine which parts of this can be done incrementally and which need to happen all-at-once? I'm trying to understand how many steps are involved here.
I foresee 4 major steps:
- Reach a consensus on the properties of the YAML (the schema)
- Reach a consensus on the tooling and where it should reside
- Outsource and update the tooling to extract the stuff correctly (this would be an one time change)
- This would generate the YAML and MD files with the correct directory tree, it would require some changes to the tooling we made on nodejs.dev, but it's not a complicate change.
- Update the final tooling to also generate HTML and JSON.
That's it. Basically the migration itself can be mass done safely.
I wonder if 20.x and forward is more realistic than 18.x and forward. I wouldn't complain if we got this working sooner than 20.x though.
Indeed, I was trying to think about retroactively updating till v18, as v18 is the first version of the API docs that are the most Markdown conforming. (I'm referring to the v18 git tree, also on that tree seems like all the doc pages follow the current doc specs, at least for the metadata, hence why migrating at once would be seamless).
Proposal Updates
I'm going to update the main proposal adding the following missing sections
- How Navigation would be structured and generated (The order of each item, their titles and stuff)
- Example of a folder structure with all files
Really great proposal ! A lot of topics are covered which is really great as this give a good overview of everything that will require some work.
Good choice to not address all the subjects here as it would be too long, but good thinking mentioning them here (tooling, i18n...) which will allow easily link the PR
Just a few questions:
- Versioning doc: keep all the versions accessible on the website ? How to easily update across multiple versions ? Doc on odd version or just even ? or all ?
- Build process: include a way to generate the doc from source? To generate part of doc ? Generate whole doc ? Pdf too ? (Maybe better to discuss about that when we will talk about the tooling ?)
- I am not against yaml but why not have directly the json and not the yaml ? is there some technical stuff blocking us from that ? or is it DX related ?
- maybe on the tooling part, we should add / ensure full compliance of the doc ? Way to tests if the heading Id exists for example
Following @Trott comments I would agree that v20 would be the best time to have it. Will be short for the others version before that. But do we want to provide a retroactive doc for stuff before v20 ? if yes which version ? should we have all the LTS covered ?
A lot of question from my side :)
Build process: include a way to generate the doc from source? To generate part of doc ? Generate whole doc ? Pdf too ? (Maybe better to discuss about that when we will talk about the tooling ?)
As I mentioned before, the building tools will allow you to build just a subset of files if you want. I don't think HTML, PDF and JSON generation should be part of the core of the tooling, but could be added on top of it such as:
import docTooling ....
const result docTooling.generateDocs();
return myPdfLibrary...
We could add all kinds of output generation on top, but the core tooling is responsible for creating a JavaScript object tree with the "metadata" and content aggregated. Initially, the idea is to be a JSX Buffer (MDX), but we could also just return the result into a JavaScript object with the metadata and content. And then have a plugin that generates to MDX, as, for example, we would have for HTML, PDF, JSON...
E.g. (Of the object) for the promises
module:
{
"promises": {
... all the metadata fields,
details: "the content from the Markdown file",
}
}
Versioning doc: keep all the versions accessible on the website ? How to easily update across multiple versions ? Doc on odd version or just even ? or all ?
This is not a responsibility for this proposal.
I am not against yaml but why not have directly the json and not the yaml ? is there some technical stuff blocking us from that ? or is it DX related ?
YAML is more accessible to write than JSON and easier to read. Also less overhead on the transition period. JSON is just a JavaScript object, is not really human friendly (to a certain point) (IMHO)
maybe on the tooling part, we should add / ensure full compliance of the doc ? Way to tests if the heading Id exists for example
If it is not compliant, it wouldn't even build (give an error), but this should not be a responsibility of the tooling; it could be part of the build process by using tools such as Remark, and ESLint, for example.
YAML is more accessible to write than JSON and easier to read
I think that's debatable, YAML can be very hard for humans as well (e.g. multiline strings is non-intuitive, the type guessing makes it that sometimes one mistakes a string for a number, etc.). Other markup languages, such as e.g. TOML or JSON, do not have those problems. I'm not saying those are deal breakers for using YAML, or that we should not consider YAML for this use-case, but I think we should not disregard the problems of that syntax.
l (e.g. multiline strings is non-intuitive, the type guessing makes it that sometimes one mistakes a string for a number, etc.).
Gladly that none of those apply to our schema 😛
Other markup languages, such as e.g. TOML or JSON, do not have those problems. I'm not saying those are deal breakers for using YAML, or that we should not consider YAML for this use-case, but I think we should not disregard the syntax problems.
Every markup language has its pros-and-cons. I just personally (please take it with a grain of salt) belive that, in this case, the pros of using YAML are better.
Thanks for comprehensive proposal !
I think this
Example of a folder structure with all files
and will denfinitely help me understsand/consume what you are suggesting.
It seems like the "move the YAML to a separate file" part can happen pretty much at any time as long as someone is willing to update the relevant tooling. Would it be beneficial to do this right away so that there's one less structural change to make the rest of this proposal happen?
It seems like the "move the YAML to a separate file"
Hmm, the way how the YAML is structured right now in the Markdown, it would possibly have no benefits in extracting it. At least to a certain degree the proposed YAML structure needs to be implemented.
I also think I got tasked in making a demo repository with example contents 🤔
@ovflowd we had discussed an example of what the directory would looke like for a single API, is that what you meant about a demo repository with example contents ?
Yup, pretty much!
I had a meeting with @mhdawson, and here's the execution plan for this proposal:
- Write a tool to convert the old doc format (the files from
doc/api
) to the proposal format here. This can pretty much be reused from here- The tooling can be updated instead of outputting an MDX file to gather the data and perform all operations to output the metadata in YAML, split the Markdown files, and create the new folder structure.
- Update the GitHub Actions Workflows to introduce a new linting step that always runs the new tooling in a "staging/dry-run" fashion but that breaks if anything is invalid. This is useful to enforce any new changes to the doc files to conform with the doc standards.
- Introduce a new core tooling for parsing the new doc file format and sub-modules (plugins) to generate output in numerous formats such as:
- HTML (Plain HTML output to mimic current doc generation)
- JSON (As the current JSON format)
- MDX (For the new Website)
- Sniff test to check if the generated HTML files, MDX files and JSON files work correctly and test if the tooling is working.
- Switchover to the new docs format by making a big-bang PR (runs the converter) with all the file changes.
Original source: https://docs.google.com/document/d/1pRa7mqfW79Hc_gDoQCmjjVZ_q9dyc2i7spUzpZ1CW5k
@mhdawson I'm going to proceed with the demo (example) (mentioned here #166 (comment)) very possibly during December.
Following the discussion during the last next-10 meeting, it could be great to create another meeting / discussion channel and only keep the update during the next-10 meeting.
This topic being really complexe and having a lot of impact it will take and "block" others globals topic. What do you think @ovflowd ? Also because you are leading this initiative when would be the best time for you ? (we can discuss it on slack it could be easier)
Once the demo is in place, I'll get a presentation to the TSC onto the TSC agenda, likely at a meeting in Jan.
@ovflowd ? Also because you are leading this initiative when would be the best time for you ? (we can discuss it on slack it could be easier)
Hmm, let's talk about this on the next Next-10 meeting so we can get in sync about this! :D
Ok great, but I don't see what it brings compared to the docs on nodejs.dev? except more files to manage
Ok great, but I don't see what it brings compared to the docs on nodejs.dev? except more files to manage
I don't want to sound rude, but I think you lack the context behind this proposal 🤔
The API Docs you see on https://nodejs.dev are generated through a script that processes the source API Documentation files. This proposal aims to address several long-standing issues from those files that are the source of the documentation.
And to answer your question, yes, there are more files to manage. The pros-cons are all outlined on the proposal.
What I meant was that if we wrote (on nodejs/node) like on nodejs.dev wouldn't it be easier?
And you're not rude at all
What I meant was that if we wrote (on nodejs/node) like on nodejs.dev wouldn't it be easier?
Nope, it wouldn't be easier at all. The current files on Nodejs.dev are "generated" ones. The meaning of generated being, that they're generated to be compatible with a technology we use called MDX. Think about them as "output of a build system". They're no improvement at all for the Developer Experience of the average contributor of Node.js
I didn't see it as an mdx file.
So I validate your idea!
Ok great, but I don't see what it brings compared to the docs on nodejs.dev?
Anything that requires core developers to have to go to a different repo to see what doc changes will look like is a dealbreaker. Anything that requires more work for core developers to validate documentation changes than they do right now is a dealbreaker.
So, if you're suggesting "move the nodejs.dev documentation generation process to core and then core devs can run make doc-only
like they do now and see what the website will look like", then sure, that's a possibility.
But if you're suggesting that the website have a different process to generate docs than core, and that the docs on the website look different from core unless core devs take an additional step, that's not going to work.
@sheplu @mhdawson here's the repository containing an "example" of how the metadata proposal would look like https://github.com/ovflowd/node-doc-proposal
@nodejs/crowdin-managers What do you think of this change, how will it impact crowdin?
@AugustinMauroy this has nothing to do with Crowdin...
@ovflowd The question was to know if the structure modification will work with the Crowdin tool
I repeat myself, this has nothing to do with Crowdin.
Crowdin is not even used for Node.js API docs. And I don't see an easy way of implementing it, neither if we should for the time being. Also the Crowdin managers (the people you pinged) only manage the instance.
For your information nodejs have an Crowdin for Api docs but the GitHub integration was broken.
We might have a "group" inside Crowdin, but API Docs were never integrated with Crowdin. I'm quite sure about that, but of course, I could be wrong. Still, this is off-topic @AugustinMauroy, pretty please, let's stay on-topic here.
Has any thought been given as to how we handle the switchover/migration? In particular how this will affect porting stuff between main and any versions of Node.js on the new system and LTS/older versions of Node.js on the old one? For example, presently when we merge something into LTS the release commit from the LTS release is cherry-picked to main
and that (generally) takes care of updating the "added in" metadata.
@richardlau it was written in one of the comments: #166 (comment)
In particular how this will affect porting stuff between main and any versions of Node.js on the new system and LTS/older versions of Node.js on the old one?
As we spoke about, including on Next-10 meetings, the metadata proposal applies only for new versions of Node.js, not going to be ported to old versions of the docs (as this is pointless).
For example, presently when we merge something into LTS the release commit from the LTS release is cherry-picked to main and that (generally) takes care of updating the "added in" metadata.
The idea is to release this proposal on the next LTS version. I'm not sure I got exactly what you're asking here, so it would be nice if you could explain it better :)
@richardlau it was written in one of the comments: #166 (comment)
As we spoke about, including on Next-10 meetings, the metadata proposal applies only for new versions of Node.js, not going to be ported to old versions of the docs (as this is pointless).
For example, presently when we merge something into LTS the release commit from the LTS release is cherry-picked to main and that (generally) takes care of updating the "added in" metadata.
The idea is to release this proposal on the next LTS version. I'm not sure I got exactly what you're asking here, so it would be nice if you could explain it better :)
@ovflowd I mean that we frequently port things between releases and the main branch. Maybe examples will make this clearer:
e.g.
- Backports, e.g. nodejs/node#44976. This is taking commits from
main
and backporting them to older versions. - Forward port. e.g. nodejs/node@a14244c is an example of a release commit for an LTS released being cherry-picked onto
main
to add the changelogs and update the doc metadata. I really want to minimise any additional work releasers have to do.
If the metadata is now in different formats between the branches being picked from and to, that's extra work to convert between the formats.
Backports, e.g. nodejs/node#44976. This is taking commits from main and backporting them to older versions.
Well, in this case, the docs of the change on main
when backported through cherry-pick will of course need a during-cherry-pick edit (like as when you do interative rebase).
It is the pain of transitioning from one standard to another and due to the docs being coupled to the commit of the change itself. I can imagine this will not be often, and as we move forward all the "backported" and "forward-ported" versions will use the new metadata proposal.
This is another reason why we want to release this together with a major semver, like v20. Yes if we need to backport or forwardport things to/from v18 we will need to edit the cherry-pick in-time, or possibly have a separate commit for the docs.
If the metadata is now in different formats between the branches being picked from and to, that's extra work to convert between the formats.
I agree, but this is a short term issue as far as I can see.
@ovflowd I'm sorry but I'm afraid this might be a major blocker to the proposal (I apologize but I should have caught it during your presentation on Wednesday).
I agree, but this is a short term issue as far as I can see.
v18 would be the last LTS version containing the previous docs and it goes end of life on April 2025, granted that this proposal lands in time for v20. Unfortunately I don't believe this to be a short term issue.
I'm happy to take some time to chat about it, show how fundamental the backports are to the LTS release lines in our current release model and brainstorm ways to improve that migration story.
Hey @ruyadorno I don't think this will be an issue at all. The way I see it, is that to make backports easy and feasible the tooling that generates the docs from the old (current) API doc format to the new format (from this proposal) should be able to generate it back to the old format.
I'm thinking in something like this:
# generates from the old format to the new one making all the generated files to the out directory
node-api-tool -c node/doc/api/buffers.md -o out/
# generates from the new format back to the old format
node-api-tool -b -c out/module/buffers -o out_old_format/
It's just an example, but this could at least automate the backporting of the old doc format. Note that forward-porting is not an issue because the proposal initially already aims to have tooling for transforming the old format into the new one.
What do you think?
I believe the workflow we need to preserve for backporting is the ability to git cherry-pick
any commit that touches documentation on main
back to prior release lines. We rely on automation and scripts that do this for us. If the proposal results in us hitting a conflict each time we backport a documentation change from main
, and have to manually apply the diff to a different file in the tree, that would be a significant amount of added work for releasers. With potentially ~150 commits per current release, many would touch documentation (particularly the semver minors), so it's a lot of effort to manually apply those changes. And as @ruyadorno mentions, that divergence would need to be handled until the EOL of Node.js 18.
(Sorry, my understanding is limited, but I believe changing the directory structure would impact our ability to git cherry-pick
back from main
cleanly.)
From the little I've dug in the proposal brings some great benefits (appreciate your efforts @ovflowd!).
Perhaps there's some Git magic/mapping or automation we can create to mitigate that in our tooling, but we'd need to prove it out and have it ready to go. An alternative may(?) be to manually backport/land the proposed new structure to all active release lines at time same time... but that would involve a lot of additional efforts and coordination.
Perhaps there's some Git magic/mapping or automation we can create to mitigate that in our tooling, but we'd need to prove it out and have it ready to go. An alternative may(?) be to manually backport/land the proposed new structure to all active release lines at time same time... but that would involve a lot of additional efforts and coordination.
Well, thanks for your insights! Really Appreciate it. Here are some ideas we can try to plan out:
- I believe that migrating previous versions of Node.js API docs to the new format can be done, but it depends on how much back we want to go with backporting the changes. Afaik, v16 is the minimum version where all the API Markdown files consistently follow the current (old) format. v14 already has files not following the format, and things get messier the further we go back.
- If we don't want to migrate older versions, we can still add the "cli" tool I mentioned to the backporting workflow. If we have a workflow that backports docs files, we can inside this (bash script? js script?) make it execute the CLI while doing an interactive cherry-picking, which means, of course, the original commit hash for the cherry-pick will differ, but it would require 0 manual work).
Let me know what you think :)
- I believe that migrating previous versions of Node.js API docs to the new format can be done, but it depends on how much back we want to go with backporting the changes. Afaik, v16 is the minimum version where all the API Markdown files consistently follow the current (old) format. v14 already has files not following the format, and things get messier the further we go back.
I was just thinking about the timelines, this may be a reasonable option. At the point when Node.js 20 is released the release lines may be in a state where it's managable to only backport the proposal to Node.js 18:
- Node.js 19 - likely to have no more releases after April, EOL by June 2022.
- Node.js 18 - still in active development, will have regular backports
- Node.js 16 - maintenance, EOL in September 2022
- Node.js 14 - likely to have no more releases, EOL in April 2022
Maintenance releases are typically very small (10-20 commits), so it might be a manageable amount of work to handle the divergence for Node.js 16 for the 5 months until it's EOL in September 2022. Perhaps backporting this proposal only as far back as Node.js 18 is a feasible option.
- If we don't want to migrate older versions, we can still add the "cli" tool I mentioned to the backporting workflow. If we have a workflow that backports docs files, we can inside this (bash script? js script?) make it execute the CLI while doing an interactive cherry-picking, which means, of course, the original commit hash for the cherry-pick will differ, but it would require 0 manual work).
I think I'd need to think about this in more detail and maybe trial it out, but yeah, something like this may work so long as we can keep a handle on the individual/logical commits.
(cc: @nodejs/releasers, perhaps @targos has thoughts)
I agree that we'll need a solution for Node.js 18, and could possibly manage without one for the remainder of Node.js 16 (assuming the change lands for Node.js 20).
I think the best would be to backport the refactor to Node.js 18 (not necessarily at the same time as v20, but we should schedule a release for it). I agree that we don't have to care too much about v14 and v16.
could we have two versions of the same documentation being available ? keeping the v18 as the current state (to help the release team to keep the tool and processes) while we work on it and have a v18-beta with the new way to handle the documentation ?
Not sure if this would ask a lot more work ? But could be used as a way to slowly release that without bad impact
could we have two versions of the same documentation being available ? keeping the v18 as the current state (to help the release team to keep the tool and processes)
Hmmm... Is there any reason? Once we have the new tooling, things should be seamless.
Not sure if this would ask a lot more work ? But could be used as a way to slowly release that without bad impact
I'm not sure about the benefits of maintaining two versions.
I think the best would be to backport the refactor to Node.js 18 (not necessarily at the same time as v20, but we should schedule a release for it). I agree that we don't have to care too much about v14 and v16.
@targos thanks for the feedback, that sounds appropriate (v18 only)
@mhdawson do we have any progress with the TSC? Can we present the proposal as it is or do you believe more modifications are needed?
@ovflowd I was just waiting to hear from you that you are ready. Let me know which of the upcoming TSC meetings on https://calendar.google.com/calendar/u/0/embed?src=nodejs.org_nr77ama8p7d7f9ajrpnu506c98@group.calendar.google.com work for you and we can get it set up.
@ovflowd I was just waiting to hear from you that you are ready.
Just out of curiosity, was I supposed to prepare some sort of material, like slides? Because if yes, I completely missed on that 😅
I'm fine with the next meeting on the 25th :)
Just out of curiosity, was I supposed to prepare some sort of material, like slides? Because if yes, I completely missed on that sweat_smile
The key part is that you can explain the proposal, along with key issues in a way the TSC members will be able to consume in the limited amount of time that will be available.
Okay, let me figure out some slides then. I'll let you know once I have material ready :)
@ovflowd ack. It is looking like we are going to not have a public section of the TSC meeting this week anyway.
The current infrastructure for doc generation is non-standard and not easy to contribute/update for newcomers
Is the proposed method standard? Is this particular YAML schema or this API generation mechanism used elsewhere? I like the idea in general, but I find it hard to believe that custom tooling around Markdown and YAML schemas and JSON is going to be more newcomer friendly.
This would be quickly done by using Markdown compatible Heading IDs
Neither classic Markdown nor CommonMark nor GitHub Flavored Markdown appear to mention heading IDs in their respective specs. Since one of the goals is to move away from non-standard Markdown, which Markdown spec are you relying on?
Is the proposed method standard? Is this particular YAML schema or this API generation mechanism used elsewhere?
Maybe "standard" wasn't the best choice of words, but you can see that the current tooling is difficult to read/understand and hard to extend. The idea is to create tooling that is easy to extend and well-documented.
but I find it hard to believe that custom tooling around Markdown and YAML schemas is going to be more newcomer friendly.
The proposed solution is newcomer friendly. The YAML for the metadata is well-defined and easy to understand; of course, I expect iterations to happen on the YAML with feedback over time.
I would also like to mention that the proposed solution allows the following:
- Internationalisation to be done quickly, as the Markdown files only contain the actual human-readable text that does not mix custom syntax, metadata, internal information, or anything of the sort. Also, 100% supported by the i18n platform we use, Crowdin.
- For newcomers contributing to metadata changes, the YAML is self-explanatory and easy to understand and will have complete documentation of what each field means and does.
Neither classic Markdown nor CommonMark nor GitHub Flavored Markdown appear to mention heading IDs in their respective specs. Since one of the goals is to move away from non-standard Markdown, which Markdown spec are you relying on?
They're not part of the Spec as they're not specifically Markdown features but part of what many Markdown processors support. In other words, headings alone do not have "anchor" links, per CommonMark specification. But in the "web" world, it is widespread for headings to have anchor links. Every Markdown processor I know of supports the interpretation of custom Heading IDs.
Important to mention that custom heading IDs do not violate (they're valid) the CommonMark specification; they're just "part of the heading" for all ordinary purposes. But for Markdown processors, they will be used as "custom heading IDs".
Oh! I recalled now what I meant with standard in, The current infrastructure for doc generation is non-standard and not easy to contribute/update for newcomers
, @tniessen
The current API docs use many non-standard CommonMark features that would crash any Markdown processor. There's a lot of AST manipulation on the doc-to-html generation tooling, which is hacky.
You might see that https://nodejs.dev/api/ currently has API docs, which is achievable as I created a "translation" layer from the invalid-non-standard "syntax" that the current doc uses to valid-conforming CommonMark. The translation layer is available here, and there are inline documentation blocks that explain what is happening.
There's also a script that I've created that automatically syncs the API docs source to the translation layer available here.
Feedback from TSC meeting (nodejs/TSC#1341).
- @mcollina, @ChALkeR: translations of API docs are hard to keep up to date and it generally does not work, might want to do it for a small subset of languages.
- @mcollina: 3+ separate files for each API are more difficult to maintain than a single markdown file. It also does not make it easier to contribute.
- @tniessen: Questionable if markdown plus YAML plus JSON is really more newcomer-friendly than a single markdown file.
- @tniessen: We already export JSON from the parsed docs. If the goal is to provide a better source of TypeScript definitions, why do we use YAML instead of TypeScript for parameters etc.?
- @BethGriggs: We need to plan how this affects cherry-picking across release lines. Is being discussed in the issue.
- @mcollina: This proposal simplifies / standardise the processing of the docs at the expense of the collaborator experience.
- @mcollina: Static HTML output is a requirement and should be uploaded along with the release. MDX is not sufficient. Much of the added flexibility does not benefit us in practice.
- @tniessen: MDX would only be used by nodejs.org if we use Next.js. Static HTML docs would still be available as before.
- @mcollina: Anchors must remain the same.
- @mcollina: API docs should not be embedded in main website. Availability, bugs, etc. are more likely issues if docs are not separate, static documents without SSR. Dynamic website would be a nightmare. Collaborator workload should be reduced, not increased. Some proposed changes are non-goals.
- @mhdawson: Maybe we should backport this proposal far enough so that we don't regularly need to manually backport doc changes. Otherwise, we can never change the doc structure.
- @richardlau: No definite solution for backporting, but is being discussed in this issue. Focus on active/LTS release lines, not maintenance.
I think this proposal has two fundamental problems:
- it introduces the concept of docs translations. Translations are usually forgotten, not updated and makes maintenance harder. They also increase the work of maintainers for no user benefit.
- splitting into multiple files will makes it harder for collaborators to keep the metadata and docs in sync. Moving from having to edit one file to have to edit
@tniessen: Questionable if markdown plus YAML plus JSON is really more newcomer-friendly than a single markdown file.
Note that JSON would be optional for ICU translations, YAML only if you have metadata associated with that Markdown file. And in most cases, I do not foresee both a Markdown file and a YAML file needing to be updated.
It's friendlier to have the metadata separated in a YAML. It's easy to read, update, to work with.
We already export JSON from the parsed docs. If the goal is to provide a better source of TypeScript definitions, why do we use YAML instead of TypeScript for parameters etc.?
Embedding TypeScript types sounds like a hassle. The YAML is a schema used both for documentation generation and for other purposes, such as TypeScript/IDEs.
MDX would only be used by nodejs.org if we use Next.js. Static HTML docs would still be available as before.
HTML output will always be available, regardless of MDX.
@mhdawson: Maybe we should backport this proposal far enough so that we don't regularly need to manually backport doc changes. Otherwise, we can never change the doc structure.
Yup, we could run the conversion tool to update all the docs from v16, v18, v19... to the new format so that if we ever need to backport changes with cherry-picking, we don't need to convert the new format into the old format through the CLI.
it introduces the concept of docs translations. Translations are usually forgotten, not updated and makes maintenance harder. They also increase the work of maintainers for no user benefit.
It doesn't. It allows translations to happen as we can use Crowdin for translations, but in essence, the tooling is not designed for translations, but it could easily support them.
splitting into multiple files will makes it harder for collaborators to keep the metadata and docs in sync. Moving from having to edit one file to have to edit
It depends. Currently, metadata is split across different areas with no "defined" format. For example, we have all sorts of ways to define the return types of a function, the props, and all sorts of ways to describe that. It feels tough to navigate through the docs. Maybe for long-term contributors, such as you, you've got the hang of it, but I see the issue for newcomers (including myself) 😅
Note that with the new folder directory structure, each Markdown file will be relatively small (same for the metadata). And with YAML, we can support IntelliSense and auto-complete.
Hey folks 👋 after a lot of thought on the topics pointed out by the TSC, I've come to propose the following changes to the current proposal:
Reconsider supporting Internationalisation for now.
Internationalisation and Accessibility are strong pillars that surround this project. And while initially, the thought of supporting multiple-language API documentation, it became apparent that keeping it meaningful would be challenging.
- Most online translation tools can already quickly "translate" the pages
- API documentation changes fast. And supporting outdated translated documentation would bring more harm than benefit
- Some technical wording can be hard to get localised and, if not done by experienced people, can have a non-intended meaning.
- API documentation is highly technical; if we want to educate the next generation of developers with language barriers, let's do so by curating or forwarding our audience to already incredible courses and content done by our community.
Whilst dropping i18n, for now, is a hard decision, we're still going to strive towards a tooling and build system for the new API documentation that is highly hackable. It should be doable if we ever want to support the localisation of our docs.
A single file to win them all.
Whilst initially, i18n would force us to separate the metadata from the Markdown file, now we can simplify our stack and tooling and the current proposal to reduce the footprint created by the new folder structure, Meaning we can now embed the Metadata (Schema) as Frontmatter or Graymatter of our Markdown files.
This still supports all the unique features of having a new standardised schema for the metadata of our documentation without breaking the CommonMark spec. It would also allow any third-party Markdown parser that supports Frontmatter, such as Remark.
This is a foundational step in the right direction, allowing us to use highly visible and largely maintained packages behind our tooling.
Branch PRs and Living-Docs
With the adoption of Frontmatter and (still) the proposal of removing all the technical parts (such as method descriptors, method parameters, inline-YAML, et cetera) and moving to our standardised schema, it also means that reading our API docs directly through GitHub UI's could become a hassle.
For that, @mcollina suggested that we introduce Branch Previews through Vercel (if we ever adopt Vercel) and a Living-version of our API docs (as many other Frameworks and Tools do), which is a non-semantically-versioned "version" of our API docs that are up-to-date with main
.
These are the significant changes to the proposal. I want to get the @nodejs/tsc and @nodejs/next-10 opinions on this matter so I might update the description of the Issue and the Demo repository and move the proposal to its next step.
Thank you all for the feedback!
Putting back on the agenda so that @ovflowd can present the updated proposal at the TSC meeting on the 29th.
Thanks for the presentation during today's TSC meeting @ovflowd. Given that the issue description is outdated, we should probably re-evaluate the benefits and drawbacks of the current proposal.
That being said, it should not be the TSC nor should it be the website team making this decision. It should be those who maintain the API documentation. A lot of the discussion on this topic has been speculation, and I'm afraid none of us can endorse or reject or get a feeling of the proposed structure and processes until we have some kind of implementation. Do you have a plan for introducing some of the proposed features in a way that allows us to see if it actually makes maintaining the docs?
I think multiple orthogonal aspects are part of the most recent proposal that can be considered separately:
- Moving as much metadata as possible to YAML blocks at the beginning of Markdown documents.
- Splitting Markdown documents into smaller ones.
- Providing some kind of preview because the new documents might be less readable through GitHub UI than the current ones.
I'll just leave a few more questions here, feel free to skip any that you feel are not relevant or have already been discussed:
-
I'm sure we are not the first project trying to improve their doc structure and workflows. Have we considered how this proposal compares to existing approaches?
-
The current infrastructure for doc generation is non-standard and not easy to contribute/update
Could you elaborate on what the new infrastructure would be, and how it would be more newcomer-friendly? I imagine it would still be non-standard.
-
Our API docs use non-conforming Markdown, which is incompatible and not standard.
What exactly would the requirements for the new format be, given that it also does not match any existing Markdown specification? I assume it should be syntactically valid CommonMark (which technically includes header IDs, which are recognized by some popular tools, even if not part of the spec) plus frontmatter (again, not part of the spec, but recognized by some popular tools).
-
- Some Markdown files are way too big. This outright makes the build process complex, and some pages become massive for the Web, being unreasonable for metered internet connections.
- Not to mention that from a maintainability standpoint, this is unfeasible.
Extremely long pages might indeed be an issue for metered internet connections. I don't know why this negatively affects the build process, but I'll take your word for it.
As a user of Node.js, being able to CTRL+F through an entire subsystem at once is incredibly valuable. The same is true when I update documentation through the GitHub UI or through vim or so. There are many module-level factory functions that then return instances of some class, and these being within the same document seems valuable to me.
Could you elaborate on the maintainability standpoint?
- Some Markdown files are way too big. This outright makes the build process complex, and some pages become massive for the Web, being unreasonable for metered internet connections.
-
- This proposal will also achieve better-generated doc metadata that can be used by projects such as TypeScript
Have we heard from any such project if they intend to use it?
-
Regarding the metadata itself (which is now the main aspect of the proposal), you mentioned that maintaining TypeScript types instead of this custom metadata would be a hassle. Then again, if this metadata is supposed to be accurate and potentially used to derive types from it, then it must be at least as expressive as the subset of relevant TypeScript types. I am curious how we are going to represent complicated function signatures. For example,
tls.connect()
accepts a variety of options, some are inherited from other APIs, some are overriden. In TypeScript, this can be represented easily, and I am curious how we intend to represent it in this custom metadata format without effectively replicating TypeScript types.
@tniessen I haven't forgotten about you! Just haven't had time to go through your text yet 🙇
Hey @tniessen, I'm a little bit busy with other things right now, but I talked with a few Core Collaborators about some of the changes I'm planning to make to the API Docs Proposal. I will give you a detailed answer to your questions soon!
But TL;DR:
That being said, it should not be the TSC nor should it be the website team making this decision. It should be those who maintain the API documentation. A lot of the discussion on this topic has been speculation, and I'm afraid none of us can endorse or reject or get a feeling of the proposed structure and processes until we have some kind of implementation. Do you have a plan for introducing some of the proposed features in a way that allows us to see if it actually makes maintaining the docs?
Yes, there's a development plan to make gradual adoption of the tooling to then finally test/experiment with the new folder structure, navigation structure and actual changes to the Markdown files.
I'm sure we are not the first project trying to improve their doc structure and workflows. Have we considered how this proposal compares to existing approaches?
Other projects use API Docs structures that are far different from what we do, such that it becomes unfeasible to adopt a similar model without radical changes that are close to a full rewrite of the API Docs (current structure and et cetera)
Could you elaborate on what the new infrastructure would be, and how it would be more newcomer-friendly? I imagine it would still be non-standard.
The current API Doc tooling is, at best, complex. It mixes a few different technologies and does a lot of manual interpolation, regexes, manipulation and tree modifications (AST).
The new doc aims to use standard libraries without extra hacks and to follow clean code and some best practices to make the code maintainable and hackable.
The new tooling would become newcomer friendly by being highly documented, simple and well-structure/followed best-practices. I know this is a vague definition, but it's honestly what I got.
What exactly would the requirements for the new format be, given that it also does not match any existing Markdown specification? I assume it should be syntactically valid CommonMark (which technically includes header IDs, which are recognized by some popular tools, even if not part of the spec) plus frontmatter (again, not part of the spec, but recognized by some popular tools).
Indeed, by using Graymatter, we're deviating from the Official Spec, but the goal is that the contents of the Markdown itself are within the specification. We currently have straight invalid CommonMark/Markdown within the Markdown itself.
With the metadata on Graymatter, we de-pollute the Markdown and make it spec-conforming.
Extremely long pages might indeed be an issue for metered internet connections. I don't know why this negatively affects the build process, but I'll take your word for it.
It's more about the tooling involved. In this sense, it's more about how each Markdown compiler manages memory heaps or the actual bundling of each page.
The actual concerns that I want to address:
- We want to break down the content by smallest unit (e.g. classes).
- It makes finding content more accessible and keeping references from where you are easier
- Even with an editor such as VIM, you can easily have different files from the same module opened and search across them
- The tooling can still generate aggregated versions of a module (append all directory files in one file). TL;DR This is more of a UX approach and depends on how we implement Navigation, Search and other aspects of the news API Docs pages.
- This process itself is detached from the tooling, and the tooling aims to be backwards compatible with the current HTML template we generate on nodejs.org/api
- This also applies to metered connections and even service workers; it's not the tooling that will be responsible for deciding these aspects but the consumers of the tooling. As the tooling aims to be a "pipe" that outputs metadata and allows consumers to generate outputs as they wish (JSON, Fancy MDX, regular MDX with everything together, HTML pages, etc.)
Could you elaborate on the maintainability standpoint?
Maintaining smaller files that are well indexed and organised (folder structure) allows better maintenance of the API Docs. This is, of course, non-scientific and subjective. Still, contributors would have an easier time reviewing smaller files than gigantic files for finding/editing/updating/maintaining content.
Have we heard from any such project if they intend to use it?
Microsoft's TypeScript team was interested when we discussed this at Collab Summit 2022 in Dublin.
Regarding the metadata itself (which is now the main aspect of the proposal), you mentioned that maintaining TypeScript types instead of this custom metadata would be a hassle. Then again, if this metadata is supposed to be accurate and potentially used to derive types from it, then it must be at least as expressive as the subset of relevant TypeScript types. I am curious how we are going to represent complicated function signatures. For example, tls.connect() accepts a variety of options, some are inherited from other APIs, some are overriden. In TypeScript, this can be represented easily, and I am curious how we intend to represent it in this custom metadata format without effectively replicating TypeScript types.
I pondered a lot about this and had the pleasure of chatting with some Core Collaborators/TSC to get their grip/feeling on the latest changes I intend to incorporate based on your comment.
The latest revision for the API Metadata Proposal would include the following:
- Instead of having API-specific metadata (such as method parameters, return options, aka the actual type of each class, global, method) on the top of the Markdown file as Graymatter, it would be an inline TypeScript CodeBlock.
- For each heading (class, global, constant, module, method) definition, an immediate code block containing the type definition of whatever it is following would follow.
- For example, for
http.request
(https://nodejs.org/api/http.html#httprequestoptions-callback), it would be followed by the following code block:
interface RequestObject { // Controls [Agent](https://nodejs.org/api/http.html#class-httpagent) behavior. // Possible values: undefined (default): use [http.globalAgent](https://nodejs.org/api/http.html#httpglobalagent) for this host and port. // Agent object: explicitly use the passed in Agent. // false: causes a new Agent with default values to be used. agent: http.Agent | boolean; } interface Request { url: URL | string; options: Object; }
- These code blocks would be generated by the migration tooling, which would make the inference and generate these TypeScript code blocks as close as possible to the text's description.
- For the actual tooling, we will use a library that converts TypeScript types into JSON objects, and those get forwarded to the "pipe" as part of the metadata of that method
- For other metadata, such as "source link", "history table", "when added", etc., they will still be represented at the top (Graymatter).
- This addresses concerns such as creating a whole new schema needing to be easier to adopt. Since TypeScript is well documented, contributors will have an easier time updating a class method's "API metadata" as they will only write/update a TypeScript code block.
@ovflowd I like the idea of using TypeScript, but having a bit of trouble mapping to the example. In the case of options, would we not be able to do much better than just "Object" ?
@ovflowd its been a few months so don't completely remember the context but I think I mean to ask if we could not be more specific that just object. Instead an object with this list of properties which of which are of type X.
@ovflowd I've added the tsc-agenda. The next meeting is 9 ET on Wednesday June 14th so if you can make that time we can plan to have you present an update then.
@ovflowd its been a few months so don't completely remember the context but I think I mean to ask if we could not be more specific that just object. Instead an object with this list of properties which of which are of type X.
I'd say that's what the types are about? To describe the actual methods and their properties of each class? 🤔 the example above is just a sample snippet taken from one of our API docs.
Also, @mhdawson thanks, I'll attempt to attend to this week's TSC meeting then. (June 14th)
cc @nodejs/tsc for anyone that didn't join the meeting today to give a last round of feedback, as on the meeting we agreed to move this proposal to its next stage:
- This issue description will get updated with the latest agreed gist of information
- The demo repository will also get updated with the latest agreed info
- A discussion will be opened on nodejs/collaborators with an easy-to-understand resumed version of this proposal with clear description, goals, next steps, what it means for a collaborator to 👍 the proposal and etc.
Please 👎 if you still have strong disagreements with the proposal (please read the latest comments as the body of the issue was not updated yet)
I'll leave one more day to see if we get any rejection, but it seems that so far the TSC is OK with this.
If we don't get any rejection till EOL of Saturday (UTC) this proposal is moving to its next step as mentioned on the document above.
@ovflowd Do you consider the issue description to be up-to-date? I'm asking because in your previous comment you said it was not updated yet, but it still hasn't been updated according to GitHub. I'd like to be sure I review the right version of the proposal.
Hey, @targos as I mentioned the issue body/description is not updated. The next step is to update it and update the demos.
I'd like to be sure I review the right version of the proposal.
On a side note, any TSC member can still object to the proposal during the next stages, just like any other collaborator. We are only looking for consensus that the proposal is ready to be discussed with the broader collaborator base at this point :)
Exactly as Tobias described 🙃
Removing agenda tag at suggestion of Claudio, will re-add once there is something to discuss again.
cc @nodejs/next-10 as the Node.js Website Redesign is virtually done, we're now focusing on transforming it into a Monorepo. Then I'll start working on the fundamental redesign of the API Docs Website. This would also include a revamp of the current build tooling.
Note that these changes won't touch any of the source Markdown files. At the moment, the changes mentioned above will be an intermediate step that would allow us to implement the "Metadata Proposal for Docs".
The stages at the moment are:
- Transform Node.js Website into a Monorepo with a shared package for the UI Components
- Update Node.js API Docs Build Tooling, inspired on https://github.com/nodejs/nodejs.dev/blob/main/util-node/getApiDocsData.js that will generate a build-time MDX and output it into HTML
- The idea here, is that all the pages will remain the same, and be static.
- We will be using some utilities from the Node.js Website repository which will:
- Grab the original Markdown Source with the YAML metadatas
- Transform it into MDX within build-time (in-memory)
- Compile and Render the MDX into JSX (in-memory)
- Use ReactDOM to transform the JSX into HTML
- Generate the HTML files with the appended HTML to the HTML templates
- In this initial iteration, only the tooling will be updated; the styles will remain the same. This is pretty much what we did on the Node.js website: We first migrated the infrastructure/tooling into something new.
- Note that we won't be using Next.js, just plain React for starters. We might switch to a Framework if needed.