warp-contracts/warp

feat: allow to pass interaction transaction input in transaction "data" field

ppedziwiatr opened this issue ยท 27 comments

I don't know how feasible this would be but one way to make this feature extremely versatile would be to allow having a manifest in the data field (as described at https://github.com/ArweaveTeam/arweave/blob/master/doc/path-manifest-schema.md) and require one of the files listed to be named, e.g., input.json. The sequencer would know that this file would be where to grab the interaction input.

Being able to specify other arbitrary files along the interaction input would allow for NFT minting, for example, to have a tx that includes the mint interaction input and any files attached to the NFT (images, music files, or basically anything else).

And this way, all these files could be uploaded separately as bundle txs, so the Sequencer would only have to manage the upload of the manifest file, which should, in most cases, be pretty lightweight.

That's a neat idea, but one issue I'm having with this flow is that (if I understand correctly) it would be now users' responsibility to post the input transaction to Bundlr first (and then post the complete manifest to our sequencer)..so they would need to have a Bundlr account created, funded, etc...
One of the ideas behind the sequencer (apart from making the interactions instantly available) is to reduce the friction for the end users - all they need is an Arweave wallet (because we still need to identify them somehow in contract's code), but they don't need to have any ARs, Bundlr accounts, etc...

Ok, how about this "hybrid" approach:

  1. User first sends all the non-input transactions to Bundlr (for most users/protocols - this will be an optional step - in such case they need theirs own Bundlr account founded, etc)
  2. User sends the input transaction (already signed by his/her own wallet) AND a manifest file (optional - again - most users probably don't need this and simply want to send the input transaction) to Warp Sequencer
  3. Warp Sequencer does its sequence-related trickery AND
    1. creates a binary data that consists both from the input transaction and the (optional) manifest (in a format similar to a ANS-104 bundle and data-items - i.e. the first "data-item" would be the input, the second (optionally) the manifest)
    2. sends this data to Bundlr
    3. caches/indexes on its side the original input transaction AND the manifest file (to make both instantly accessible) (and obviously the txId generated by the Bundlr itself)

I'm not sure I'm getting point 3, the Sequencer would create a new tx with a data containing the original tx's input tag plus the manifest file? What happens to the first tx? Also, would this data field be interoperable with existing the tooling (sonar, viewblock, ...)? Ideally the manifest file should be very easy to discover, given the tx id, so that the files that it refers to are also easily discoverable.

Also, maybe I'm missing something but I'm not sure whether the idea of allowing a manifest to contain both the input and arbitrary data is really incompatible with the current flow. The writeInteraction method could have an additional optional parameter that would be a map of file path/name -> tx ids and it would create itself the manifest containing everything this map + the input with a reserved path (e.g. /input.json). This way there is only one tx , no need to manage data items and Viewblock could (I'm not sure if they're already doing it) easily display the content of the data which would just be a application/x.arweave-manifest+json.

ok, to sum-up your idea in simple steps:

  1. user posts all the additional transactions to Arweave (directly, via bundlr, whatever)
  2. user posts input transaction to Warp Sequencer - and an optional map file path/name -> tx
  3. Warp Sequencer assigns the sequence and posts the input tx to Bundlr (as a "data" of the data-item)
  4. Warp Sequencer generates a manifest file that contains:
  • txId of the data-item that contains the original input tx (from point 3.)
  • all the items sent in the optional parameter of the writeInteraction
  1. Warp Sequencer sends to Bundlr a tx that contains the manifest.

Yes I think that could match what I was thinking.

For point 3, when you say assigns the sequence, are you talking about setting the sortKey? If yes, shouldn't this be set on the tx of the manifest rather than the input one?

Ideally the manifest tx should be considered as the "interaction" tx, the input one just being a piece of data of this interaction. The manifest tx would be the one showing up in contract.readState().cachedValue.validity for example, or when checking the interaction on the sonaar/viewblock. This way the arbitrary data referenced in the manifest would be as much a part of the interaction than the input itself.

side-note: this implementation should be made compatible with the "nested-bundles" example presented here ArweaveTeam/arweave-standards#22 (comment)

Yes I think that could match what I was thinking.

For point 3, when you say assigns the sequence, are you talking about setting the sortKey? If yes, shouldn't this be set on the tx of the manifest rather than the input one?

Ideally the manifest tx should be considered as the "interaction" tx, the input one just being a piece of data of this interaction. The manifest tx would be the one showing up in contract.readState().cachedValue.validity for example, or when checking the interaction on the sonaar/viewblock. This way the arbitrary data referenced in the manifest would be as much a part of the interaction than the input itself.

Ok, I get the idea. In my ideal world I would prefer the original tx (created and signed by the user) to be considered as the SmartWeave interaction (not the "manifest tx") - because that is what client is creating and signing with his/her wallet.

The owner of the transaction (from the SmartWeave protocol perspective) must be set to the original input tx owner.

So here's another issue - from one hand we want to use the manifest tx id (e.g. in the validity report, etc), on the other hand - we cannot use its owner/signature - because that will always be set to the Warp's jwk that is used to create and sign the "manifest" transaction.

I'll need to think a bit more about this idea...

Ok so I have been trying to wrap my head around ANS-666 and refresh my mind on ANS-104. Taking inspiration from the structure you described (ArweaveTeam/arweave-standards#22 (comment)):

Bundlr-Bundle -> Data-Item -> Warp-Bundle -> User-Data-Item (the contract interaction tx)

I see two scenarios regarding linking arbitrary data to a contract interaction.

1. Using a User-Bundle

1.a. Tx structure

Here User-Data-Item could instead be itself a bundle, let's call it a User-Bundle. It could contain multiple User-Data-Item, amongst which would be the interaction tx. It would look like this:

Bundlr-Bundle -> Data-Item -> Warp-Bundle -> User-Bundle -> |
                                                            | -> User-Data-Item 1: The contract interaction tx
                                                            | -> User-Data-Item 2: An arbirtrary data, e.g. a music file
                                                            | -> User-Data-Item 3: Another arbitrary data, e.g. an image file
                                                            | -> [...]

1.b. Creation flow

  1. User calls contract.writeInteraction(input, { includeData: [{ path: "local/path/to/overtherainbow.flac", contentType: "audio/flac"}, { path: "local/path/to/cat.png", contentType: "image/png" }]})
  2. Warp locally create the data items for the input, overtherainbow.flac and cat.png and sign them with the user's wallet
  3. Warp locally create the User-Bundle with previous data items and sign it with the user's wallet
  4. Warp sends the User-Bundle to the Sequencer, and the Sequencer assigns the sequence to it

The downside of this method is that Warp would have to take care of the upload of possibly very larges and numerous files which would take a toll on its infrastructure and would also end up costing a lot of ARs to the gateway's operator if there is no way for users to pay for the tx fees.

2. Using a manifest as the User-Data-Item

2.a. Tx structure

Here the data field of User-Data-Item, instead of containing the input, could contain a manifest as described here https://github.com/ArweaveTeam/arweave/blob/master/doc/path-manifest-schema.md. It would look like this:

Bundlr-Bundle -> Data-Item -> Warp-Bundle -> User-Data-Item -> | (inside the manifest)
                                                               | -> input.json: The contract interaction tx
                                                               | -> arbitrary/path/overtherainbow.flac: An arbirtrary data, e.g. a music file
                                                               | -> another-random/cat.png: Another arbitrary data, e.g. an image file
                                                               | -> [...]

2.b. Creation flow

  1. User has previously created & uploaded (via the Bundlr Network for example) 2 arbitrary data tx (let's call their respective ids data1TxId and data2TxId)
  2. User calls contract.writeInteraction(input, { includeData: { "arbitrary/path/overtherainbow.flac": "<data1TxId>", "another-random/path/cat.png": "<data2TxId>" }})
  3. Warp locally creates the interaction tx using provided input argument and signs it with user's wallet
  4. Warp locally generate a manifest file using provided includeData argument and the tx id of the tx created at point 3 (let's call this tx id inputTxId). The manifest would look like:
{
  "manifest": "arweave/paths",
  "version": "0.1.0",
  // The index could optionally point to the inputTxId
  // "index": {
  //   "path": "input.json"
  // },
  "paths": {
    "input.json": {
      "id": "<inputTxId>"
    },
    "arbitrary/path/overtherainbow.flac": {
      "id": "<data1TxId>"
    },
    "another-random/path/cat.png": {
      "id": "<data2TxId>"
    }
  }
}
  1. Warp locally creates the manifest tx using point 4 and signs it with user's wallet
  2. Warp normally sends the manifest tx to the Sequencer, and the Sequencer assigns the sequence to it

This scenario is probably advantageous in regard to the Sequencer as it wouldn't have to upload the data, it would already be taken care by the user. It also gives more liberty to the user as to how they want to upload their data to the blockchain.


Both of these scenarios would be optional, Warp/the Sequencer should have to be able to detect whether the input is simply located inside the User-Data-Item or whether it's referenced in the User-Data-Item's manifest / inside the User-Bundle.

Regarding your last comment, the manifest or the bundle would be signed locally using the user's wallet, so there shouldn't be an issue here, except if I'm missing something.

Unfortunately number 1. is out of the question...for the exact reason that you've mentioned...we really want our GW to stay as lightweight and fast as possible.
I understand that manually uploading first might a bit cumbersome for the user - but we could probably mitigate this by adding some helper methods to the SDK, with an api like

await warp.
 .connectBundlrAccount()
 .uploadFile()
 .uploadFile()
 .uploadFile()
 .writeInteraction()
  • that would automatically keep track of all of the posted data-items ids and add them automatically to the writeInteraction method call

I like the 2nd. approach the most and that's what we will probably will implement - but I believe it needs some more analysis.

E.g. one thing that I don't fully understand in the proposed flow is who is actually sending the input tx to bundlr - i.e. the transaction created in point 3. of your flow.

If all the Warp Sequencer gets is the manifest file with the tx-ids - the Warp Sequencer will need to first to load the input tx metadadata - owner, tags, etc. (by the <inputTxId>) and its data (that now may contain the actual input to the contract) and index it/cache locally .

I'm not sure if it's even possible right now - in case of data-items, the gateway and Bundlr nodes offer only the "data" endpoint (e.g. arweave.net/{txId}).
The arweave.net/tx/{txId} does not work for data-items.

Also - if we force users to send the input data item to Bundlr - again, they will probably need to have the Bundlr account funded - which is also out of the question.

So, the only way I see it right now is:

  1. user uploads arbitrary files
  2. user creates and signs "input" data-item.
  3. Warp SDK generates a manifest file from 1 and 2 and sends to sequencer:
    1. the manifest file
    2. the original input data-item
    3. The sequencer assignes the sequence and creates its nested bundle, that contains two data-items:
  • a data-item with the manifest file
  • a data-item with the original input tx

and then sends this whole nested bundle to Bundlr.

Just putting in my two cents:
We do agree that the 2nd approach proposed by Noomly is a better one as it doesn't require passing a large volume of data through sequencer (which is not meant to serve this purpose).
However, adding a manifest to every transaction may be an overkill.
In general we need to handle situations where we may have an extra input payload (more than could fit in the "input" tag) or extra content (files) attached to the transactions. Let us look at different scenarios:

  1. Basic transaction: no content, no extra input
  2. Input heavy transaction: no content, extra input
    -> we may put the input directly to the data field and mark the transaction with a tag data-type="extra-input"
    The extra input will be fully indexed and available during the contract code evaluation
    Therefore, there will be a limit for the extra input size.
  3. Transaction with extra content: single content, no extra input
    -> we may put the content directly in the data field and mark the transaction with a tag data-type="extra-content"
    The extra content won't be indexed.
  4. Transaction with both extra input, content or multiple content files
    -> we need to create separate bundles for extra input and content files, create a manifest file and mark the transaction with a tag data-type="manifest"

So, point 4 is basically the solution we've been discussing. I'm just proposing a shortcut for less complex scenarios where we could fit everything into a single tx.

Yup, but I believe both Akord and Pianity (the first real life users of this feature) fall into point 4 - so there's a question whether it makes sense to implement the simpler versions - since in real life, only the 4th (at least now) will be used (plus it handles the 1, 2, 3 cases as well).

Btw. re.3 - I'm not sure if pushing large amount of data through a sequencer is a good thing (even if it won't be effectively indexed/stored by our gw) - i.e. it may quickly drain sequencer's memory. I also believe that this feature is already handled by the "true" AtomicNFT feature that @asiaziola is currently implementing (warp-contracts/gateway#124 "register atomic contract").

Also - we need to define some 'realistic' limits for the input itself. As far as I remember - Akord need few kilobytes...
So maybe we could start with sth like 100kB? @jakub-wojciechowski , can you estimate what would be a useful limit for sth like NLP?

It's still much more than 2048 bytes (effectively less, cause 2048 is for all tags, not only the input tag)

On the other hand - it would be cool to offer sth that is "significantly" better than (for example) limits on Ethereum, Solana, etc. (
https://ethereum.stackexchange.com/questions/1106/is-there-a-limit-for-transaction-size
https://blog.bitmex.com/ethereums-new-1mb-blocksize-limit/
https://solana.wiki/docs/solidity-guide/transactions/#data
).

The other thing is data encoding - I wonder if the input stored in the data field shouldn't be by default encoded with sth like msgpack.

As for the limits 100kB allowance from Bundlr seems like a pragmatic approach.
I totally agree with msgpack.

I also agree that we may skip the 3rd variant (direct content storage). However, I still see value in the 2nd variant of extending the payload in the data field without the overhead of separate bundles and manifest. As far as I remember that feature was exactly what Akord asked for.

The 2nd variant would be ideal for Akord, we could then merge the input JSON directly into the contract state during contract evaluation. As @jakub-wojciechowski says, it would keep things simpler.

Then of course we can also use the 4th variant, I believe it would remain similar to what we are doing now: merging the current JSON data state outside the contract, bundling it as a data item and referencing it in the contract state.

In case of point 4. - it would be up to our infra and execution engine to supply the input (stored in this case as a part of the manifest) directly into the contract (without having to manually load it by txid in the contract code).

In other words - in case of a manifest file - our infra "extracts" the input content and indexes it internally - and supplies it for the state evaluation (just like for the point 2.). From the contract state evaluation perspective - it would work exactly the same as for point 2.

Having two completely different formats (point 2 and point 4) maps to me to a more complicated code - but we will need to analyse this a bit more.

IMO point 2. is simply a special case for point 4.

Ok, that makes sense, thank you for the explanation @ppedziwiatr :)

And the winner is...
image

(description coming soon)

@noomly @wkolod
To describe more this magnificent picture you see above - as you may have heard, both the Arweave gateway and Bundlr support now nested bundles https://github.com/ArweaveTeam/arweave-standards/blob/master/ans/ANS-104.md#31-nested-bundle and we believe it could solve the problem we are discussing.

So we would have our original transaction as it is right now with the input in the data field of the tx (with the size limitation of 100kb). Apart from that, user could pass other txs which should be linked to the input transaction (indicating its id and content type) - so it would cover the need of attaching some files e.g. pngs or mp3s.

Warp would be then responsible for creating two data items - one with the input transaction and second one with manifest with the paths to the input and all txs attached. Sequencer would then attach sequence and create a nested bundle containing two data items. Original tx id and nested bundle id would be returned from writeInteraction method. So having the nested bundle id you would have access to both - transaction and the manifest.

I believe this would handle both of your use cases @noomly @wkolod ? Can you see any bottlenecks in this approach?

This sounds like it's very close to the second solution I described here: #178 (comment) so I think it should be good! I have a few questions regarding your description of this beautiful piece of imagery you guys drew @asiaziola.

1.

[...] user could pass other txs which should be linked to the input transaction (indicating its id and content type) [...]

What would be the use of providing the content type at this stage? It should be included in the data transaction anyway. I think the two data required from the user at this point would be the data tx id & its path, as I described in my previous comment linked above at 2.b.2.

2.

Original tx id and nested bundle id would be returned from writeInteraction method.

By "original tx id" do you mean the interaction transaction? By interaction transaction I mean the transaction that has the Contract tag and optionally the Input tag, which if not present should be contained in the data field of the transaction.

3.

To what transactions exactly are the sequencer tags attached to? Is it the interaction transaction? My concern is that if that's the case then the manifest transaction wouldn't really represent the "authority" of the interaction and could easily be discarded. What I mean by this is that, for example, when looking through the interactions of a contract on Sonar, if I try to look into the details of a particular interaction, how will I know that this interaction has anything else (images, music or any kind of arbitrary data) attached to it?


Basically, are there actually any differences between the solution you guys have in mind and tried to describe in #178 (comment) and the solution number 2 I described a few months ago? If there are, do you mind trying to go through them in detail so we can make sure there wouldn't be any surprises?

Btw thank you for this, it's going to be yet another great step into making the ecosystem better. Also, sorry for being a bit repetitive, just want to make sure everything is extra clear.

Thanks for the response :)

ad. 1
you're right, we'll implement it according to your suggestion

ad. 2
yes, original tx id is the id of the interaction transaction

ad. 3
sequencer tags will be attached to the bundle containing two data items - data item with interaction transaction and data item with a manifest. When writing interaction two ids will be returned - again id of the interaction transaction and bundlr_tx_id - which in this case will be id of the nested bundle containing two data items. So e.g. on SonAR - for now bundlr tx id will be a link to viewblock where you'll be able to see what's happening underneath (what's in the manifest).

I guess the main difference is that this diagram:

Bundlr-Bundle -> Data-Item -> Warp-Bundle -> User-Data-Item -> | (inside the manifest)
                                                               | -> input.json: The contract interaction tx
                                                               | -> arbitrary/path/overtherainbow.flac: An arbirtrary data, e.g. a music file
                                                               | -> another-random/cat.png: Another arbitrary data, e.g. an image file
                                                               | -> [...]

...does not include creating main interaction transaction. In our approach Warp-Bundle will contain two data items:

  1. the interaction transaction (the transaction 'visible' by the contract's functions)
  2. the data-item with manifest.

My main question to you is will you need the information about additional manifest in the contract itself? I guess for now when executing the contract it will have the information about the main interaction transaction - either through the input tag or data field of the transaction but we will not have any info about the manifest itself. Most probably we could provide access to the information about it but I'm wondering if you need it.

Looks like it should be ok then, thank you for taking the time to answer my points!

My main question to you is will you need the information about additional manifest in the contract itself?

We don't have any planned features that would require having access to information about the files attached to the interaction; although it might be interesting to have that in the future!

We're starting development of this feature :)
Couple of additional notes regarding the subject:

  1. User will send interaction input and it will be Warp SDK's responsibility to decide wether the input should be placed in the tag (by default) or in the data field (if the interaction size exceeds the size limit for tags)
  2. Optionally, user can send ids of the additional pieces which should be attached to the interaction transaction (files like png, mp3 etc.) - it can be either transaction or a data item, doesn't really matter
  3. By default nested bundle will contain one data item with the interaction, if some additional file ids are specified - second data item with a manifest pointing to these additional txs/data items will be placed inside the nested bundle, manifest will also point to the original interaction data item id, this id will be validated by the sequencer so we're sure that manifest is pointing to the correct interaction
  4. Size limit for the interaction sent by Warp SDK to the sequencer will be 100kb - we are considering encoding the interaction with a tool like msgpack or compressing it with gzip and store it like so in the database, so essentially the size limit can be even bigger. Of course this 100kb is a total size limit for the whole nested bundle so the size limit for the data field itself will be a bit smaller.

Stage 1 - preparing minimal arbundles library version (the original one adds ~600kb to the resulting SDK bundle ๐Ÿ˜ฎ) is done, right? @asiaziola warp-arbundles adds ~9kb, right?

yes, sir. looks like it's done .
i'll write some tests and start working on the interaction data item implementation in the client.