microlinkhq/mql

Nested query results

lirbank opened this issue · 1 comments

Context: I'm using mql via the https://microlink.io/ service (Node.js API).

I'm looking for a way to put custom query results in a nested structure.

Eg instead of getting a result like:

{
  title: '',
  description: ',
  lang: 'en',
  author: null,
  publisher: '',
  ...
  someCustomData: '',
  moreCustomData: ''
}

It'd be great to nest the custom data something like this:

{
  title: '',
  description: ',
  lang: 'en',
  author: null,
  publisher: '',
  ...
  myData: {
    someData: '',
    moreData: ''
  }
}

On https://microlink.io/docs/mql/getting-started/overview there is an example that displays nesting of queries, like this:

const mql = require('@microlink/mql')

const twitter = (username) =>
  mql(`https://twitter.com/${username}`, {
    data: {
      stats: {
        selector: '.ProfileNav-list',
        attr: {
          tweets: {
            selector: '.ProfileNav-item--tweets .ProfileNav-value',
            attr: 'data-count',
          },
          followings: {
            selector: '.ProfileNav-item--following .ProfileNav-value',
            attr: 'data-count',
          },
          favorites: {
            selector: '.ProfileNav-item--favorites .ProfileNav-value',
            attr: 'data-count',
          },
        },
      },
    },
  })

And the result is nested under stats. Since it seems the nested queries iterate over the parent selector, I did this:

{
  data: {
    // This works
    jsonld: {
      selectorAll: 'script[type="application/ld+json"]',
      attr: 'html',
    },
    // This seems to be solid
    myTitle: {
      selector: 'title',
      attr: 'text',
    },
    meta: {
      selector: '*',
      attr: {

        // This works mostly but sometimes the result is padded with a bunch of css selectors etc.
        title: {
          selector: 'title',
          attr: 'text',
        },
        ogTitle: {
          selector: 'meta[property="og:title"]:not([content=""])',
          attr: 'content',
        },
        twitterTitle: {
          selector: 'meta[name="twitter:title"]:not([content=""])',
          attr: 'content',
        },
        description: {
          selector: 'meta[name="description"]:not([content=""])',
          attr: 'content',
        },
        ogDescription: {
          selector: 'meta[property="og:description"]:not([content=""])',
          attr: 'content',
        },
        twitterDescription: {
          selector: 'meta[name="twitter:description"]:not([content=""])',
          attr: 'content',
        },

        // This always returns null, while putting it at the top level works, see above
        jsonld: {
          selectorAll: 'script[type="application/ld+json"]',
          attr: 'html',
        },
      },
    },
  },
}

But as you can see in the comments above, there are some issues with this approach.

  1. Is there a better way to do this?
  2. Why is the JSON+LD selector always empty when used in the nested way?

If this is not the way to do this, I'll rename this issue a feature request and just use top level queries for now.

Big thanks for an awesome product!

Hello,

That's a thing more related with the target URL rather than MQL itself.

Specifically, looks like the adblock enabled by default is interfering into the expected response.

I just disabled it, you can see a live demo here:
https://runkit.com/kikobeats/mql-x-meetup

🙂