Explicit "publish" behavior

Question

Explicit "publish" behavior

Jesus89 opened this issue 5 years ago · 14 comments

The Map.publish method creates an API key every time the user visualizes private tables or local data. For example, if you render a DataFrame and then you publish a map, the data is uploaded to a table into CARTO and a custom API key specific for that table is created. This behavior is too implicit, and maybe not desired by the user. Why uploading the data to CARTO? Why creating an API key? Why not publish directly the same visualization as in the notebook, with the data compressed in the HTML file?

With that in mind, we can do two actions towards a more explicit "publish" method:

If the map contains private datasets notify the user of the possible options in an Exception:

Your map contains private datasets, in order to publish it you can:
  - Make the datasets public using `update_privacy_table(table_name, 'public').
  - Call the `publish` method with the param `create_api_key=True`,
    which will create a custom API key for that map.

Publish DataFrames maps without uploading them into CARTO so the user has more control over the tables and API keys created. If the user wants to upload the DataFrame, he/she can use to_carto(df, 'my_table') and then publish a map with the publish method using 'my_table' instead of a randomly generated name.

These two changes are easy to implement, and they are only related to CARTOframes. However, we need to make aligned before doing it.

cc @cmongut @alasarr @oleurud

Answer 1 · 2020-01-15T09:43:16.000Z

First of all, I agree with your concern and your proposal, but at the same time, I am afraid about doing it now.

Some details:

if we add a create_api_key, we should add a way for adding a specific api key
in the geoDataframe case, we must ensure we are not sharing the master api key to the map (I think we are doing it in the notebook, so it would be shared too)

Answer 2 · 2020-01-15T09:51:57.000Z

OK. Thanks. I'll create a PoC in a branch to evaluate the improvement.

I agree with adding an api_key param to allow the user to pass a custom API key.
We should not provide the credentials when visualizing a GDF, I'll check that.

Answer 3 · 2020-01-15T10:05:43.000Z

If the map contains private datasets notify the user of the possible options in an Exception:

I wouldn't raise an exception and make users add another param. I would show the information in a log instead.

Publish DataFrames maps without uploading them into CARTO so the user has more control over the tables and API keys created. If the user wants to upload the DataFrame, he/she can use to_carto(df, 'my_table') and then publish a map with the publish method using 'my_table' instead of a randomly generated name.

It seems the way to go, they should choose the name of the tables to be able to reuse them later.

Answer 4 · 2020-01-15T10:10:48.000Z

I wouldn't raise an exception and make users add another param. I would show the information in a log instead.

This is similar to if_exists. By default, it fails because the table name already exists, so the user is informed by an Exception about using if_exists='replace'.

If a user wants to publish a map with a private dataset we can not show only a warning message because there is no default behavior for that case. That's why we need to raise an Exception.

It seems the way to go, they should choose the name of the tables to be able to reuse them later.

Yes. That's exactly the point.

Answer 5 · 2020-01-15T10:15:11.000Z

Let's ask the users :)

@giuliacarella, @andy-esch could you provide your view here?

Answer 6 · 2020-01-15T10:17:15.000Z

@cmongut What should be the default behavior when a user wants to publish a map with a private dataset like map.publish('my_map', '1234', creds)?

Answer 7 · 2020-01-15T10:31:46.000Z

I agree that you should be able to publish without uploading to carto. What would be the advantages of uploading it to carto before publishing? Better performances with large datasets?

Answer 8 · 2020-01-15T10:36:45.000Z

What would be the advantages of uploading it to carto before publishing? Better performances with large datasets?

You are not sharing your data, at least, you are not sharing it explicitly (because working with MVT you are sharing your data in a way or another)

Also, the performance would be better (the HTML is smaller and the data requested on the fly), but it is more important the previous point IMO

Answer 9 · 2020-01-15T10:51:47.000Z

Hi @giuliacarella. With this proposal, the user has control over the data upload:

You can publish a GeoDataFrame without uploading data.
If you want to upload, first upload your GeoDataFrame with to_carto and then run publish using the name of the table you provided in to_carto. This is more explicit so the user understands better what happens.

Answer 10 · 2020-01-15T11:06:56.000Z

@Jesus89 thanks! I've spoken to @cmongut and I understand now better the issue.

I understand that the problem is that to publish you need an API key, which, if not supplied, would be automatically generated every time the publish method is used, consuming the "API credits" of a user.

I didn't like at all generating manually the API key, what about linking an API key to the name of the visualisation? This will reduce the number of generated API keys without forcing the user to modify its code / creating it manually. And every time the publish method is used a warning could be issued informing the user on the number of "API credits" left.

Answer 11 · 2020-01-15T11:58:20.000Z

After an offline meeting, we have decided to do the following:

Do not upload GeoDataFrames to CARTO by default (avoiding the creation of random tables and API keys)
Create API keys by default for private datasets with a unique name to reuse already created API keys.
Add a maps_api_key param to the publish method to allow the user to pass an API key for private datasets.

myMap = Map([
    Layer(gdf),
    Layer('public_table', credentials=my_creds),
    Layer('private_table', credentials=my_creds)
])

>>> myMap.publish('my_map', '1234', credentials=my_creds)
Maps API key "bZN8gW3gugxgpTy7M9KkQg" is used for non public datasets ['private_table']
{...}

>>> myMap.publish('my_map', '1234', credentials=my_creds, maps_api_key='bZN8gW3gugxgpTy7M9KkQg')
{...}

Answer 12 · 2020-01-15T12:01:42.000Z

You can test this in the branch 1485-explicit-publish

Answer 13 · 2020-01-15T17:14:14.000Z

I have mixed feelings about uploading a geodataframe in the HTML doc. These dataframes, especially when represented as a GeoJSON can be quite large and have poor performance when rendered on a cartoframes map. Will there be a dataframe size restriction? E.g., if the size is > 10MB, should it fail and advise the user to upload the data to their carto account first?

Another thing that's unclear to me is what happens if a user wants to update an existing map.

Other than that, I think I feel comfortable with the workflows proposed above. I'm a big fan of throwing informative errors to help users. And I'm a big fan of preventing account bloat by not putting up datasets unless the user explicitly gives it a name, etc.

Answer 14 · 2020-01-15T18:08:54.000Z

Thanks for the feedback @andy-esch.

I have mixed feelings about uploading a geodataframe in the HTML doc. These dataframes, especially when represented as a GeoJSON can be quite large and have poor performance when rendered on a cartoframes map. Will there be a dataframe size restriction? E.g., if the size is > 10MB, should it fail and advise the user to upload the data to their carto account first?

There is a limit in the size of the published HTML files. Now it's quite low (100KB) and it's causing some issues actually (https://github.com/CartoDB/support/issues/2343). We all agree in increasing the number to 10 MB and show a nice Exception in case you exceed the size to upload your data to CARTO and publish using the table.

Another thing that's unclear to me is what happens if a user wants to update an existing map.

If the user wants to update an existing map using the same name and if_exists='replace' added in #1483.

Other than that, I think I feel comfortable with the workflows proposed above. I'm a big fan of throwing informative errors to help users. And I'm a big fan of preventing account bloat by not putting up datasets unless the user explicitly gives it a name, etc.

👍