gohugoio/hugo

Build pages from data source

regisphilibert opened this issue · 83 comments

Currently Hugo handles internal and external data source with getJSON/getCSV which is great for using data source in a template.

But Hugo cannot, using a data set of items, build a page for each of them plus related list pages like it does from the content directory files.

Here is a fresh start on specing this important step in the roadmap.

As a user, I can only see the configuration aspect of the task.

I don’t see many configuration tied issues except for the mapping of the key/values collected from the data source and the obvious external or internal endpoint of the data set. The following are suggestions regarding how those configurations could be managed by the users followed by a code block example.

Endpoint/URL/Local file

Depending on use cases there may be a need of one or several url/path.

For many projects, not every page types (post, page etc…) may be built from the same source. Type could be defined from a data source key or as a source parameter.

I suppose there could be other parameters per sources.

Front Matter mapping

User must be able to map the keys from the data source to Hugo’s commonly used Front Matter Variables (title, permalink, slug, taxonomies, etc…).
Every keys not referenced in the map configuration could be stored as is a user-define Front Matter available in the .Params object but this should not be default as there maybe way too many.

Example:

This is a realtor agency, a branch of a bigger one.

Their pages are built with hugo's local markdown

They have an old wordpress site whose > 100 blog posts they did not want to convert to markdown. So they load those blog posts from a local data file on top of the local Hugo's own markdown posts.

They use a tier service to create job posts when they need to fill a new position. They want to hosts those job listing on their site though. Their jobs are served by https://ourjobs.com/api/client/george-and-son/jobs.json

The most important part of the website are their realty listings. They add their listing to their mother company's own website whose API in turn serves those at https://api.mtl-realtors/listings/?branch=george-and-son&status=available

Configuration

title: George and Son (A MTL Realtors Agency)

dataSources:
  - source: data/old_site_posts.json
    contentPath: blog
    mapping: 
      Title: post_title
      Date: post_date
      Type: post_type
      Content: post_content
      Params.location.city: post_meta.city
      Params.location.country: post_meta.country

  - source: https://ourjobs.com/api/client/george-and-son/jobs.json
    contentPath: jobs
    mapping: 
      Title: job_label
      Content: job_description

  - source: https://api.mtl-realtors/listings/?branch=george-and-son&status=available
    contentPath: listings/:Type/
    grabAllFrontMatter: true
    mapping: 
      Type: amenity_kind
      Title: name
      Content: description
      Params.neighbourhood: geo.neighbour
      Params.city: geo.city

Content structure

This results in a content "shadow" structure. Hard lines dir/files are local, while dashed ones are remote.

content
├── _index.md
├── about.md
├── contact.md
├── blog
│     ├─── happy-halloween.md
│     ├─── merry-christmas.md
│     ├- - nice-summer
│     └- - hello-world
├- -listings
│     ├- - appartment
│     │   ├- - opportunity-studio
│     │   ├- - mile-end-condo
│     │   └- - downtown-tower-1
│     └- - house
│         └- - cottage-green
└- - jobs
      ├- - marketing-director
      └- - accountant-internship
bep commented

Thanks for starting this discussion. I suspect we have to go some rounds on this to get to where we want.

Yes, we need field mapping. But when I thought about this problem, I imagined something more than a 1:1 mapping between an article with a permalink and some content in Hugo. I have thought about it as content adapters. I think it even helps to think of the current filesystem as a filesystem Hugo content adapter.

So, if this is how it looks like on disk:

content
├── _index.md
├── blog
│   └── first-post
│       ├── index.md
│       └── sunset.jpg
└── logo.png

What would the above look like if the data source was JSON or XML? Or even WordPress?

It should, of course, be possible to set the URL "per post" (like it is in content files), but it should also be possible to be part of the content section tree with permalink config per section, translations etc.. So, when you have 1 content dir + some other data sources, it ends up as one merged view.

As most data sources are usually a flat list of items, I suppose building the content directory structure will require some more mapping.

There are the type and section keys to be used as well as maybe others which would help position the item in the content structure.
There could also be urlsource parameter designed the same way as the global config one except it would take one of the mapped key's as pattern (I'll update my example after this):

url: /:Section/:Title/

I suppose there is no way around having many source configuration params/mapping which Hugo may need to best adapt the data source to the desired structure. Maybe even having to use some pattern/regex/glob to best adapt those like the url suggestion above.

As for default structure. If there is no configured data source with a type parameter of blog, then Hugo will build it from content, the rest would be build from data source (supposing we a Page Bundle toggle, media mapping). See this real content merged with data source "phantom" structure:

content
├── _index.md
├── blog
│   └── first-post
│       ├── index.md
│       └── sunset.jpg
├ - - - recipe  (from data source)
│        └- - - first-recipe
│               ├ - - - index                
│               └ - - - cake-frosting.jpg
└── logo.png

@bep now I understand more fully what you meant (I think). The config needs to tell Hugo how to model the content structure so it can build its pages from that.
In a sense we are not building pages from data source we are building a content structure from both local content and remote data source which Hugo will interpret and build pages from.

To reflect this here I added to the desc a better project example to illustrate both configuration possibilities and the resulting "content" structure.
This is a project we can add to in order to maybe better spec what this feature should achieve.

bep commented

@regisphilibert I have been thinking about this, and I think the challenge with all of this isn't the mapping (we can experiment until we get a "working and good looking scheme"), but more the practical workflow -- esp. how to handle state/updates.

  • As an editor, I would love it if my site (including content) was as static as possible at commit time (v1.3.0 of Hugo Times is this).
  • That is, if I the editor, looked at the Netlify preview on GitHub and push merge, I would be sadly disappointed if I then ended up with something completely different.
  • I think this is an often overlooked quality of static sites: Versioned content.

I understand that in a dynamic world with JS APIs etc., the above will not be entirely true, always. But it should be a core requirement whenever possible.

A person in another thread mentioned GatsbyJS's create-source-plugin.

I don't think their approach of emulating the file system is a good match for Hugo, but I'm more curious about how they pull in data.

Ensure local data is synced with its source and 100% accurate. If your source allows you to add an updatedSince query (or something similar) you can store the last time you fetched data using setPluginStatus.

This is me guessing a little, but if I commit my GatsbyJS with some create-source-plugin sources to GitHub and build on Netlify, those sources will be pulled completely on every build (which I guess also is sub-optimal in the performance department). I suspect setPluginStatus is a local thing and the updatedSince is a way to speed up local development.

Given the above assumptions, the Gatsby approach does not meet the "static content" criteria above. I'm not sure how they can assure that the data is "100% accurate", but the important part here is that you have no way of knowing if the source has changed.

So, I was thinkering about:

  1. Adding a sgllite3 database as a "build cache"
  2. Adding a "prepare step" (somehow) that exports the non-file content sources out into a merge-friendly text format (i.e. consistent ordering etc.)

The output of 2) is what we use to build the final site.

There are probably some practical holes in the above. But there are several upsides. sqllite3 has some very interesting features (which could enable more cool stuff), so if you would want to make that the "master", you could probably edit your data directly in the DB, and you could probably drop the "flat file format" and put your DB into source control ... This is me thinking out loud a little.

That is, if I the editor, looked at the Netlify preview on GitHub and push merge, I would be sadly disappointed if I then ended up with something completely different

I'm not sure about this. And I already apologize if some of my lack of understanding of the technology/feature at hands bias my view.

I guess most of the use cases for this will be using contentful or WordPress Rest API or FireBase to manage your content, and let Hugo build the site from this remote source plus maybe a few other ones (remote and local).
In this use case, the editor will not see markdown and probably not the Netlify preview or that merge button, but only the contentful or WordPress dashboard and create/edit their content from there.
When a new page is published out of the draft zone, the editor will expect it to be visible on the site with few regard to the repo status. On bigger sites where several editors work at the same time, Hugo's built speed will help making sure the website can be "refreshed" often in order to keep up with content editing.

But this does not change the fact that we need caching and being able to tell the difference between the cached source and the remote one efficiently.

In order to handle the "when", by this I mean the decision between calling the remote source or using the cached one, I was thinking about a setting per source indicating at which rate it should be checked.
If setting is one hour, then Hugo would check cached source time and if older than one hour, call remote. It would then use and cache the remote source only if it differs from the cached one. (Maybe using a hash to compare cached vs remote ?)

I'm not sure I understand the process described with sqlite3, would this mean having a database inside Hugo ? 🤔

bep commented

My talk about "database etc." clutters the discussion. This process cannot be stateless/dumb, was my main point. With 10 remote resources, some of them possibly out of your control, you (or I) would want some kind of control over:

  1. If it should be published.
  2. Possibly also when it should be published.

None of the above allows for a simple "pull and push". So, if you do your builds on a CI server (Netlify), but do your editing on your local PC, that state must be handled somehow so Netlify knows ... what. Note that the answer to 1) and 2) could possibly be to "publish everything, always", if that's your cup of tea.

Note that the answer to 1) and 2) could possibly be to "publish everything, always", if that's your cup of tea.

Yeah, maybe some people want it or default to it but offering more control is definetly a must have I think.

So, if you do your builds on a CI server (Netlify), but do your editing on your local PC, that state must be handled somehow so Netlify knows ...

True but I didn't really saw it as Hugo's business. In my mind, a CI pipeline would have to be put into place above Hugo.
So when the source is edited (using contentful or other) the CI is notified and can run something like hugo --fetch-source="contentful".

Or a simple cronjobs (don't know what to call those in the modern JAMstack) could be set in place so website is build every hour with hugo --fetch-source="contentful" and every day with hugo --fetch-source="events,weather"

bep commented

OK, I'm possibly overthinking it (at least for a v1 of this). But for the above to work at speed and for big sites, you need a proper cache you can depend on. I notice the GatsbyJS WordPress plugin saying that "this should work for any number of posts", but if you want this to work for your 10K WP blog, you really need to avoid pulling down everything all the time. I will investigate this vs Netlify and CircleCI.

but if you want this to work for your 10K WP blog, you really need to avoid pulling down everything all the time

Yes. Time is essence!
I can't imagine how long Gatsby would take to build a 10K WP blog considering it already takes 18s to build the hello-world starter kit.

And this is precisely why big content projects want to turn to Hugo.

After spending time with playing with the friendly competition and its data source solutions.
It becomes apparent that one of the biggest challenges of the current issue (now that Front Matter mapping will be taken care of by #5455) will be how the user can define everything Hugo needs to know in order to

  1. efficiently connect to remote or local a data source,
  2. retrieve the desired data,
  3. and merge it into its content system (path etc...).

3 will be unique to each project and potentially source.
On the other hand 1 and 2 will be for the most part, constant for many data sources, like WordPress API or Contentful.
For example, for a source of type WordPress REST API, Hugo will always use the same set of endpoints plus a few custom ones potentially added by the user.
It will also systematically uses the same parameter to fetch paginated items.

We could group the settings of 1 and 2 into one Data Source Type (DST).
Then, in the line of Output Formats and MediaTypes, any newly defined Data Source could use X or Y Data Source Type.

This way any DST could be potentially:

  • Reusable among one project without repeating same lengthy settings (2 different WordPress APIs for one website)
  • Shared among users as setting files.
  • Built-in

Rough example of DataSourceType/DataSources settings:

DataSourceTypes
  - name: wordpress
    endpoint_base: wp-json/v2/
    endpoints: ['posts', 'page', 'listings']
    pagination: true
    pagination_param: page=:page
    [...]

DataSources:
  - source: https://api.wordpress.blog.com/
    type: wordpress
    contentPath: blog/
    [...]

I wanted to throw this into the discussion because it's a demonstration of how I generated temporary .md files from two merged sets of JSON data (Google Sheets API). These .md files are only generated and used during compilation and are not saved into the repository.

https://www.bryanklein.com/blog/hugo-python-gsheets-oh-my/

This is a fairly simple script, but you can see that I needed to filter the data source and map the 2 source JSON data sets to front matter parameters per page.

@bwklein Thanks for this very informative input but... this belongs in a "tips and tricks" thread in the discourse which could mention this issue. Not the other way around :)

PS: This really belongs there, people would love to read this I'm sure.

@regisphilibert I'm exactly looking for the same thing!
Having json parts in a page is simple, but generate post from json... Headless CMS to Hugo 👍

That goes without saying that the “DataSourceTypes” mentioned above could be distributed as Hugo Components (theme conponents)

Will having some logics in md files also be a feature on the roadmap:

For example:
{{ $jsonResponse := getJSON $apiUrl "https://example.com/xyz/blabla }}
{{ range $jsonResponse.data }}
---
title: "{{ .id | "default title" }}"
---
{{- end -}}

So this way, the title can be dynamic for routes, permalinks, etc.

suzel commented

👍

So this thing has been bugging me for a while now (especially as I wanted to make it easier for contributors on my website to add content without manually creating folders & files).
So I wrote a temporary solution: kidsil/hugo-data-to-pages

It's a JS wrapper that generates pages from data (and it cleans after itself!).
I'm currently using this to generate pages from YAML files and it seems to be working perfectly.
Would definitely appreciate some feedback.

Cheers

With #6041, it seems convenient that any data source could be assigned to a "directory" in the file system.

If this route is chosen, we could, in order to add the remote "jobs" source from this issues's desc example, simply add a content/jobs/_index.md to the project and handle any config/FM/data source info from there.

Using same example as above:

# config.yaml
DataSourceTypes
  - name: wordpress
    endpoint_base: wp-json/v2/
    endpoints: ['posts', 'page', 'listings']
    pagination: true
    pagination_param: page=:page
    [...]
# content/jobs/_index.md
DataSource:
    source: https://api.wordpress.blog.com/
    type: wordpress
cascade:
   permalinks: /our-job-offers/:title

I am looking into Hugo and this would be the killer feature allowing me to reach my goal. I am trying to integrate Doxygen content into a Hugo managed website.
My current workaround is in two steps using external scripts:

  1. generate YAML data from the Doxygen XML
  2. generate md pages under content/ from the above YAML

Here I do not look at the first step that is unrelated to this issue. Only at the second one.

Ideally, I would like to be able to drop my YAML file in the data/ folder and use templates to specify how to actually present this to the user.

This issue and #4485 seem very close from my newcommer point of view. Is there a subtle difference I missed. In that case, which specific feature request would be the one matching the workflow I described?

I really wish this would be bumped up in priority. Would be an absolute game-changer

@jbigot and @chris-79 I'm also eagerly waiting for this functionality. You might want to keep an eye on issue #6310 which is the current proposal for building pages from data. If you read though the updates, there have been quite a few changes in the background preparing for this

I really wish this would be bumped up in priority. Would be an absolute game-changer

I agree, if this was possible we would literally be able to connect Hugo to every headless CMS source e.g. craft, wordpress, contentful, etc! Relying on forestry.io has been a pain..

This would help teams and organizations to be able to work in their native cms to organize content in any way possible. :)

@regisphilibert The Paginator object can convert an index page into multiple pages. One simple solution to have pages from data might be extend this to be able to paginate not just a list of pages but an arbitrary list. Then with page size 1, we have created one page per entry in a JSON.

We might need enhancements to the paginator object to be able to customize the front matter of of the generated pages. But I think enhancements like those can be incremental. I don't know the internals of Hugo to be able to say how difficult is the task, but as a user, this would fall inline with the current way of using Hugo and we will not need to learn a new concept.

This feature would be a game changer for Hugo. I hate having to use Netlify CMS to achieve wh.at I want, Strapi would be my go to CMS

fwiw, I ended up making my own scripts to process custom JSON output from our Wordpress.com site.

It's not pretty code, and may not follow correct Go conventions (as I'm super new to it), but it's working for us.

Hope this helps someone until we get official support within Hugo :)

Articles

For building Markdown files in a flat, single directory for one of our Hugo sites.

Vuepress-Pages

For building files into hierarchical directories based on the "link" key. Yes, this is for VuePress, but it should work just fine with Hugo with minor tweaking (like changing /README.md to /_index.md).

Example Yarn build command (runs in Netlify):

{
  "scripts": {
    "build": "curl -s -H 'Cache-Control: no-cache' \"https://example.com/wp-json/custom-routes/v1/internal-resources\" --output pages.json && curl -s -H 'Cache-Control: no-cache' -sL -o main.go \"https://gitlab.com/snippets/*******/raw\" && go run main.go && vuepress build docs"
  }
}

My proposal for this feature is that it should be three features:

  • Define a format for content files in JSON. Eg you’d have sources/myfile.json with a new line delimited JSON stream of objects with keys “filepath”: “stories/whatever.md” and “body”: “whatever” and that gets represented as though it’s a bunch of files in the file system.

  • Allow STDOUT plugins that have a line in Hugo’s conf like “plugin-cmd = ‘myscript.py’” and those can spit out JSON in the format above. It gets rerun at some interval, either per request or per time duration.

  • FCGI gateway that works like the plugin except as requests come in, they get passed to the app which can respond with JSON.

@carlmjohnson Could you clarify what you mean when you mention "requests" here? As Hugo is a static site generator, it's unclear what per-request behavior would mean.

Good question. Per request means using hugo serve. For example, if you want to be able to create a live preview, you could have hugo serve running somewhere and the plugin intercepts the request, generates some content, and then passes that to Hugo to build the actual page.

I noticed that @bep starred https://github.com/natefinch/pie on Github the other day. I think that approach (using JSON-RPC over STDIN) would work for plugins. Apparently LSP plugins like Gopls etc. work on a similar basis.

Let me talk through some sites I have built with Hugo and how I did them and how I wish I could have done them to give more background for this.

I made one site that listed a lot (several hundred) of candidates for local elections. We emailed the candidates and had them fill out a Survey Monkey survey. We cleaned up the data into a big Google Sheet. I wrote a script that downloaded the Google Sheet and turned it into a JSON file for each candidate, which we put into the content directory as a .md file (because content files have to end in .md). Code here: https://github.com/baltimore-sun-data/voter-guide-2018

I more recently did a similar site that lists expert sources in Pennsylvania. Again, survey, Google Sheet, JSON files in the content directory. Code here: https://github.com/spotlightpa/sourcesdb

Writing the script to download a Google Sheet and turn it into a bunch of small JSON files was not a big task for me, but it was some amount of work, and I imagine a lot of people who know HTML/CSS might not be comfortable doing it. It would have been easier if I could have dropped a single CSV/JSON into the repo, instead of many small files. It would be even better if I could write a plugin that automatically connects to Google Sheets when you run hugo or hugo serve.

I have another site where we schedule stories to publish at a certain time. We do this by saving the stories in a database and then a task runs periodically to see if any stories are ready to publish, and if so, it adds them to Github at the proper location, which in turn triggers Netlify to deploy the site. It would be nice if I could just tell Hugo to get the stories out of my database, and then to do a scheduled publish, I would just trigger a Netlify build at the proper time (no middle step with Github saving the content).

@carlmjohnson I'm trying something similar (without much luck for the past few weeks). Its a personal project, and I am trying to build a repository of assignments and work done by students in our college. I was planning on having a front-end Google Form for user-submitted content where each row is a new submission. After this, we set up a script to read that as a JSON file and hopefully break it into individual content files, which can be pushed to Github, which triggers a Netlify deploy.

Unfortunately, my experience with Python (which I'm assuming you also used?) so far has been limited to NLP and not so much in this area. Your last website implementation seems most ideal to me, where the Sheets document could act as the database and I only build new content pages if there are new rows. I don't mind doing a manual trigger for connecting to Sheets every now and then.

I tried going through the repos you linked but I can't seem to be able to find the code that does the conversion from Sheets to JSON in the content folder. Would you be able to help me out? If it is okay with you and as time permits, I would like to take your help with this since you seem to have done it before.

@bwklein This is perfect! Thank you so much for sharing it. I've been poking around with it for the past half hour, the pipeline looks wonderful. Would it be okay if I could reach out to you via some channel, since I had some questions and I don't want to clutter this thread? I'm very, very new to this and I would appreciate your help.

Here is another usecase for a plugin architecture. This Hugo reader wants his em-dashes handled a certain way: https://www.sidewalken.com/non-breaking-em-dashes/ There could be some kind of custom template function plugins to do this.

@bep Is this feature still scheduled? I see it's marked for milestone 0.78 but we are on 0.8. Thank you for the great job

Here is a one-liner, that splits a JSON array into different markdown pages.
The filename is based on the value of the first parameter in each element.
The script depends on jq and it should work in any UNIX environment:

jq -cr '.[] | .filename, .' *.json | awk 'NR%2{f=$0".md";next} {print >f;close(f)}'

Optional step: Script to convert Markdown files to Hugo Page Bundles

Obviously this little script does not address the scope of this issue.

But I am posting it here because it can help those who need a quick and simple way to split a data file and consume its contents as Markdown content files in Hugo, while we wait for a native way to handle Pages from Data.

JLKM commented

Direct import from a database into Hugo would IMO be not only a gamechanger. It would also make Hugo absolutely central (almost inevitable...) in the Jamstack world.

Just a couple of sources, I came across. For inspiration/motivation.

  1. Daptin Database API seems to be able to Create and Build HUGO static sites - repo
  2. Cyrill Schumacher has developed his own Built-in SQL support for GoHugo.io

Since this thread has become of one of workarounds, it's worth mentioning that Sanity.io, an excellent editor, that uses an interesting open source query language, GROQ (developed by the same team), has posted a how-to on pulling data from their API to Hugo using Netlify plugins: https://www.sanity.io/guides/sanity-and-hugo-with-netlify-plugins

Hello, we are also searching some kind of workaround to generate "page" from JSON files.
Currently, it's easy to parse the data on a specific page, but we are not able to "generate" lots of page.
We are (with @DavidVergison) heading to a go script that will generate *.md with correct Front Matter.
Question is : better to put all necessary in the *.md, or only the ID and add a Hugo processing based on this id (retrieved in the json data) ?

@Frackher if you generate the pages at build time from the JSON files, then you don't need to store anything in the repository for those files, you just use the script to write the markdown files into the appropriate 'content' location before running the Hugo command to build the site.

Yeah, put the data in front matter. I'd rather write {{ .Title }} than {{ index site.Data.pages .Params.id "title" }}.

I'm doing just what @bwklein said and grabbing the data from a headless CMS in a build script (in Netlify) and using a .go script to split the JSON into individual .md files

In the process, I discovered that you don't even have to put the data in YAML format in the generated page files. I have .md files with just JSON inside like this:

{
  "id": 42920,
  "link": "covid-19-updates-and-resources",
  "title": "COVID-19 Updates and Resources",
  "image": "https://example.com/AMU4Wgn.png",
  "page_blocks": [
    {
      "section": {
        "section_content": "The College of Education has created this page to provide our faculty, staff, and students with the latest updates and resources on the coronavirus (COVID-19) pandemic. Stay informed by visiting this page regularly.\r\n\r\nThe main source of updates and information is the [University's COVID-19 page](https://www.uga.edu/coronavirus/info.php), so please check it regularly. Also, [University Libraries](https://guides.libs.uga.edu/COVID19/) includes library resources for online classes."
      }
    }
  ]
}

@chris-79 the only downside to content in front matter is that Shortcodes don't work in there, they have to be in the "Content" section of the file.

@bwklein Yeah, true 👍

@chris-79

Since this issue has become about workarounds, please share the Go script to split the JSON, otherwise for general discussion please use the Hugo Forums.

I am using Hugo with Directus 9 https://directus.io/ and Netlify.
It's not that hard to create separated md-files from the json-data into Hugo.
With Netlify onPrebuild its easy to setup. You can use it to fetch the json and create the md-files before the Hugo build starts.
Same solution as Chis-79 mentioned on 03/24.
Here is my code, maybe someone find it useful.

module.exports = {
    onPreBuild: async ({ utils, packageJson }) => {
        const fs = require("fs-extra");
        const fetch = require("node-fetch");
          
        fs.readdir("./content/blog/", (err, files) => {
            if (err) console.log(err);
            else {
                files.forEach((file) => {
                console.log(`Deleting: ${file}`);
                fs.unlink(`content/blog/${file}`, (err) => {
                    if (err) throw err;
                });
                });
            }
        });

        //fs.remove('./public/blog');

        try {
            await fetch('https://URL/')
            .then(response => {console.log('response: ',response);
                return response.json(); })
            .then((data => {

                for (let value of Object.values(data.data)){
                    let filename = "./content/blog/"+value.slug+".md"
                    
                    fs.writeFile(filename, JSON.stringify(value), function (err) {
                      if (err) return console.log(err);
                        console.log(`file ${filename} written`);
                      });
                }
          
              }))
        } 
        catch (error) {
            utils.build.failBuild('Failure message', { error })
        }
      },
    }
bbarr commented

Would this problem be at all simplified by removing the need to deal with remote data? Say there was a way to autosync a remote data source (Sanity, Wordpress, etc) to your Hugo project's /data directory via git or something. Then, if Hugo could just provide a way to output pages based on data files, a CI tool could rebuild/deploy whenever external data was updated and pushed to github repo. Am I at all understanding where the complexity lies here?

@bbarr many of us are already doing that, the hope here is that we can setup direct connections to content sources and build pages without downloading everything into the local content folder before build to generate pages. Hugo already has the ability to pull data from remote sources but not to build pages, only to generate content within pages that are represented in the content directory.

Then, if Hugo could just provide a way to output pages based on data files, a CI tool could rebuild/deploy whenever external data was updated and pushed to github repo

This issue is supposed to be in two phases.

  1. Build from data (generating pages from a data file)
  2. Build from data directly from a remote source (generating pages from a remote source, API etc...)

An issue raised since 2018 still isn't implemented and we're almost in 2022?!

I'm working on an ecommerce store that pulls products from Shopfiy via the API and generates pages. NextJS can do this, but was hoping to come back to Hugo. Problem is Hugo doesn't support it and I can't understand why it's taken so long to implement. Is Hugo's architecture that messed up that this can't be implemented?

@JR2Media Wow, what an impressively entitled and condescending attitude. Comments like these are actively harmful in OSS as they sap the spirits of maintainers and encourage burnout.

This comment is also factually wrong. It says absolutely nothing about the quality of the Hugo codebase that this hasn't been implemented yet. Why hasn't it been implemented? I'll suggest a few possible reasons.

  1. Hugo is maintained by a very small group of people
  2. A feature like this would take a ton of work and require lots of planning and discussion. If that hasn't transpired yet, it's because other things have been prioritized.
  3. As it stands, it is already possible to use other tools to cover this functionality. I've done so myself. Having built-ins in Hugo would be nice, but probably better to prioritize Hugo's core.

Get bent.

This has been an issue since 2013! #140 (comment)

@lucperkins The point is this has been asked for since 2013. The developers don't listen to what the users want, people will drop eventually. Probably why NextJS has now taken over. If you can't adapt the changes and the requests of the users, expect them to leave.

An issue from 2013, hasn't been implemented. Says a lot about the Hugo developers really.

@JR2Media Last time I checked, the project is OSS. Perhaps you'd like to take a crack at it?

@lucperkins that lame old response again. It's the classic rebuttal given when a user questions why a feature hasn't been implemented.

Luckily I'm a Go developer. Yes most certainly I would implement this if I knew:

  1. That it wasn't going to get rejected (as others attempts have)
  2. I can be briefed on the architecture either by Bep, or other developers who know the system better than I do.

Can we have a voice chat to discuss the architecture and ways of possibly implementing this?

Anyone looking at the https://github.com/gohugoio/hugo/graphs/contributors page can see that Hugo is basically a labor of love by @bep. It's very entitled to flame him, when he's been adding tons of features that expand Hugo over the years. As a long time Hugo user, I would never want to go back to the Hugo of 2018 or 2013. It's particularly weird to flame him now, since he just released v0.90, which adds support for remote resources.

Truth be told I love Hugo, used it since 2015. I've had to use NextJS for the last year and it's alright, but I'm not a fan of being stuck to React to develop a static site. I've been developing static sites using NextJS to pull data from Shopify and generating the product pages. Something like this has to exist in Hugo because of the current trend with "HeadlessCMS" platforms.

Without it, developers are forced to use alternatives. When they learn an alternative such as NextJS, they likely won't come back. And it's a shame, because Hugo is an otherwise awesome SSG.

Such an implementation would be a game changer.

@carlmjohnson definitely not flaming @bep. I've used Hugo for a long time too. I checked back today to see if this was possible yet so I could look at using Hugo over NextJS.

Can we have a voice chat to discuss the architecture and ways of possibly implementing this?

Any ticket is a good start for a discussion.

Luckily I'm a Go developer. Yes most certainly I would implement this if I knew...

Usually you simply comment a ticket with your offer to help and then ask your questions. (Rather than starting with a negative comment). Maybe not too late (all help is welcome) you could start fresh on this ticket: #6310

#6310 sounds quite complex; I think that we can utilize most of the existing functions if we would allow users to add a page as follows:

.Site.AddPage MARKDOWN FRONTMATTER

Example:

{{ with resources.Get "https://api.example.com/pages" }}
    {{ range . }}
        {{ .Site.AddPage .Content (dict
            "date" .Date
            "draft" "false"
            "title" .Title
            "description" "This is frontmatter"
            ) }}
    {{ end }}
{{ end }}

If I understand it correctly, the Site.AddPage function could create a markdown file in the virtual /content directory based on the parameters.

Do you think this would be possible and a good approach @bep?

bep commented

This needs to wait until after what I'm in the process of doing (which is a little more than I can write down in a simple comment here).

well said

@bep I suspect this feature will become increasingly important as a major HUGO based CMS (Forestry.io) has announced an imminent EOL and many developers will be seeking (scrambling for) an alternative to Forestry.

Forestry's successor product Tina.io has announced per site pricing in their Discord, and lets just say it feels very premium. I know that Forestry is not the only game in town, but certainly it is a large part of our HUGO ecosystem. In the Forestry Slack there are over 1000 developers in the HUGO channel.

So anyway, myself, I am awaiting this feature with much anticipation, as I evaluate other CMS options for my clients. I really like the idea of adapting Directus.io with a HUGO extension which my team could develop and offer in the Directus marketplace. It would be an excellent model for a HUGO CMS, and this feature of mapping data sources to pages would be integral.

It's such an important feature,I would perhaps even offer to sponsor it! Do we have any new thoughts about what is feasible? I wish I had the Go development experience in my shop to contribute more meaningfully. If a feature sponsorship is possible, I am here to discuss.

With gratitude for everyone's efforts!

CloudCannon and FrontMatter CMS are viable alternatives to Forestry.io and do not require data driven pages. Having said that, there are other situations where data based pages would be a significant advantage for Hugo over other systems like SvelteKit.

@bwklein I appreciate your thoughts. I am desperately seeking the next great CMS for HUGO. I thought Forestry was close. It would be great if there were a no compromises solution. FrontMatter CMS is hard to give to non-technical users due to VSC requirements, and I have found CloudCannon to be somewhat mixed. Menu management, for instance. In this use case, I want to achieve ultimate usability for small business clients.

But I don't want to derail the discussion. Data driven pages are something that I think could create massive possibilities. It could achieve a best-of-both-worlds flexibility to adapt to both GIT-backed, document driven SSG, and to leverage APIs creatively. @vanbroup's solution just leapt off the page to me as something inspired. Finding this thread and seeing that the feature is in discussion gives me great hope and optimism. I hope it can be included in a near-term 2022 release. Again, I would sponsor this.

I would sponsor it too. Using Strapi for alot of APIs and Changing Texts on VueJS Web Apps. I'm using it also to manage the landingpage texts and images in hugo. I would love to have this feature for Blogposts.

I can't believe nobody has solved this yet. It's absurd. Where is the sponsorship money going?! I'm calling @bep out as a noob. I'm only responding to this because I keep getting email notifications. Folks, go with NextJS, or Gatsby. Hugo is now just a turd, that can't keep up with what the community wants.

I await the "Oh but it's open source you can contribute if you want ...". Many have tried. Bep rejects the PRs.

Hugo, go to sleep, for you are dead. I await the hate, and downvotes. I'm calling @bep out for what he is. An imposter.

Ps. Check out Zola if you're into Rust. https://github.com/getzola/zola ... it's very similar to Hugo and solves the problem everybody has had since at least 2018.

Edit: Wooohooo that's a lot of thumbs down. Keep them coming, I'm aiming for a record. Most downvotes on Github for speaking facts.

So you call out the author/maintainer/contributor of a project:

  • that you're not sponsoring
  • that you're not maintaining or otherwise helping
  • have 0 public github activity (aka have never been in the place of maintaining/contributing open source software)
  • the 1-2 issues that you've closed on other repos have SUPER minimal info and when you fix them the answer is "fixed it"
  • YOU ARE A COMPANY THAT USES THIS SOFTWARE

In general what you do is like calling a random person on the phone and nagging that they're "noobs" because they didn't have the calling tune that you liked.

You don't get the whole open source idea, in a whole other level. You're lightyears away of how things work around here. Be nicer. Nobody has to do ANYTHING. Heck, they can delete this repo just because they feel like it

Hello, dear colleagues!

For many years I have been using Wordpress to create websites, without using themes, only the Wordpress platform, only my own PHP on the server and JS on the client.

Recently, I made one site on HUGO, and I really liked this concept - speed, reliability, simplicity.

However, the customer didn't like editing the .MD files on disk to make changes to the site. He wants to go into the admin panel on the site and edit pages and posts using the input boxes and "save" button.

Can you please tell me if you can now (or in the near future) use Wordpress (or another CMS) as a headless CMS for HUGO?

It is IMPORTANT that CMS should not be a service on vendor's site or depend on some services (like Ntlify), but should be completely independent CMS and free, installed on customer's VIRTUAL hosting (Wordpress meets these conditions).

Thank you!

@geshov I'm not sure this is the place for your question. I can answer though because I use both Hugo and Wordpress. You're going to need to use headless Wordpress. You can then pull the content from Wordpress and save them as markdown files for Hugo. I don't know if there's an off the shelf solution for this. I have written my own tools to do this.

James, thank you for your prompt reply!

I'll think about your suggestion, but that's not exactly what I asked.

My question wasn't about that:

You can then pull the content from Wordpress and save them as markdown files for Hugo

It was about this:

Build pages from data source (title of this topic)

More specifically:

But Hugo cannot, using a data set of items, build a page for each of them plus related list pages like it does from the content directory files.

These are quotes from the first post of this thread, so that's where my question is most relevant.

Thank you!

Please head over to https://discourse.gohugo.io/ and ask your question. I will reply to you there.

@geshov I have an example of using Notion as hugo backend, if it is of any help: https://whynot.fail/coding/notion-blog/

knutov commented

What is the current progress of this feature?

I have the similar problem - generate multiple pages from CSV file with one page from one line of original csv

@knutov
I ended up just writing a small python script that generates the pages for me. The main problem is that it doesn't run automatically when the data changes like it would have if it was a built in hugo feature.

geshov commented

@knutov I ended up just writing a small python script that generates the pages for me.

and eventually I switched to another SSG generator that can do this out of the box,

what exactly am I using now - I can answer in a personal message so that it is not regarded here as advertising

In my mind, the amazing Regis has solved this, and for my company it's not an issue any more. I give the strongest recommendation to Regis's article on the subject on The New Dynamic. It's the "monster spotting" example.

https://www.thenewdynamic.com/article/toward-using-a-headless-cms-with-hugo-part-2-building-from-remote-api/

My team implemented this on 50+ sites. It's no more complicated than Forestry ever was. We can set up a modest site in about a day or less. It's massively flexible, which we value, and provides both a "built" solution in line with the use case of HUGO as well as a strong CMS-agnostic integration path.

We trigger builds either from our CMS on change, using a Netlify build hook, or we have a manual "publish" action which cUrl's the web hook. Many clients seem to prefer this idea of "wait and publish all at once" as opposed to every change being instant.

Anyway. Regis solved it. It's not a problem any more. I promise, just follow his lead. This is the path.

+1 for the amazing @regisphilibert

Before searching for documentation on how to do this, my expectation was that any support would be incredibly basic/rudimentary and I would probably be dealing with some special index file that explodes data into a directory of markdown files.

It took me a couple times reading that "headless cms" solution, but now I understand that it's just using hugo to generate content for hugo; it's literally, "code generating code," which is the same thing we're doing with smaller "stub generator" scripts. We just have to add logic to read hugo configuration data, which hugo has built-in. [edit: It appears that hugo-forum-topic-45433 from @jmooring (below) is a simplified version of this.]

To be honest, the hardest part of reading (and understanding) that post was my assumption that it was showcasing a new hugo feature that would just work out of the box. The existing options still seem "more enjoyable," but the solution here is actually pretty elegant and seems to answer a lot of existing questions, particularly with regard to mapping data.

Is it possible to "add a new phase" to hugo execution that processes templates in a _stubs/ (or _prebuild/) directory using a format such as (_stubs/albums.md):

FS_FLUSH: "/jazz_albums/*.md"
{{ range $.Site.Data.jazz.albums }}
FS_PATH: "/jazz_albums/{{ .ID }}.md"
---
title: {{ .ID }}
layout: album-demo
---
{{ .Summary  }}
<!--more-->
{{ end }}

I'm sure the "FS_" flags are wildly inaccurate, but what about the rest of that model?

For anyone else stumbling across this, I created a simple example a couple of weeks ago:

git clone --single-branch -b hugo-forum-topic-45433 https://github.com/jmooring/hugo-testing hugo-forum-topic-45433
cd hugo-forum-topic-45433
rm -rf prebuild/public && hugo -s prebuild && hugo server

Reference https://discourse.gohugo.io/t/45433

Maybe good to link this issue on the Hugo roadmap page?
And link it to #6310 as well.

In my mind, the amazing Regis has solved this, and for my company it's not an issue any more. I give the strongest recommendation to Regis's article on the subject on The New Dynamic. It's the "monster spotting" example.

https://www.thenewdynamic.com/article/toward-using-a-headless-cms-with-hugo-part-2-building-from-remote-api/

My team implemented this on 50+ sites. It's no more complicated than Forestry ever was. We can set up a modest site in about a day or less. It's massively flexible, which we value, and provides both a "built" solution in line with the use case of HUGO as well as a strong CMS-agnostic integration path.

We trigger builds either from our CMS on change, using a Netlify build hook, or we have a manual "publish" action which cUrl's the web hook. Many clients seem to prefer this idea of "wait and publish all at once" as opposed to every change being instant.

Anyway. Regis solved it. It's not a problem any more. I promise, just follow his lead. This is the path.

Used this solution also and it worked perfect !!
I am using it together with directus and netlify... love it !
Thank you !

Regis' solution with a "nested" Hugo project is quite amazing and saved me lots of time. 🙏 Properly integrating pages generated from data into Hugo would be great, because running two instances of Hugo during development is very error-prone. 😅

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.