G-Research/spark-dgraph-connector

Create a Colab/notebook example

Closed this issue · 21 comments

I'm trying to get the connector working in a Colab notebook. I've got this far, but cannot figure out how to bind the connector package in order to make use of the dgraph calls (apologies in advance, pyspark noob here).

I'll happily share this notebook source once this is sorted out.

Colab Notebook:

!apt-get install openjdk-8-jdk-headless -qq > /dev/null
!wget -q https://archive.apache.org/dist/spark/spark-3.3.2/spark-3.3.2-bin-hadoop3.tgz
!tar xzf spark-3.3.2-bin-hadoop3.tgz
!pip install -q findspark

#new frame
import os
import findspark

os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-3.3.2-bin-hadoop3"
findspark.init()

#new frame
import pyspark.sql
from pyspark.sql import SparkSession
from pyspark.sql import DataFrame

conf = pyspark.SparkConf()
conf.set("spark.jars.packages", "uk.co.gresearch.spark:spark-dgraph-connector_2.12:0.9.0-3.3")
conf.set("spark.ui.port", "4050")

spark = SparkSession.builder\
        .master("local")\
        .appName("Colab")\
        .config(conf=conf)\
        .getOrCreate()

endpoint = "localhost:9080"
df = spark.read.dgraph.nodes(endpoint)

Error:

AttributeError: 'DataFrameReader' object has no attribute 'dgraph'

It seems like there's some magic needed to get the package jars into the spark context, but I'm outta ideas.

@EnricoMi Hopefully the issue I'm having is a no-brainer for you. As I said, I'm happy to share the final notebook with everyone here and/or on the Dgraph repo.

At no point in your code you are importing the connector:

from gresearch.spark.dgraph.connector import *

@EnricoMi OK, thanks! Getting further along...

I'm getting an error when its trying to get the state from the group. I saw your thread regarding this issue with our /state endpoint changing in v21. I tried the snapshot version you mentioned in #144, but both that and 0.9.0-3.3 are returning the same error:

Py4JJavaError: An error occurred while calling o518.nodes.
: java.lang.RuntimeException: Could not retrieve cluster state from Dgraph alphas (x.x.x.x:9080)
	at uk.co.gresearch.spark.dgraph.connector.ClusterStateProvider.$anonfun$getClusterState$5(ClusterStateProvider.scala:32)

I've tried with Dgraph v22 and v21. What's the latest version that worked with your connector?

Can you provide the full error, there should be some cause for this RuntimeException. Are you sure the DGraph server is reachable from the spark driver and worker nodes? Can you see someone connecting to DGraph in the DGraph log?

Here's the full error. The cluster is reachable and I think the connector is connecting (if I put in a bad port for instance, it returns with a SchemaProvider.getSchema error).

---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
[<ipython-input-33-e31922b6e4e9>](https://localhost:8080/#) in <cell line: 21>()
     19 
     20 endpoint = "<snip>:9080"
---> 21 df = spark.read.dgraph.nodes(endpoint)
     22 #df = spark._jvm.uk.co.gresearch.spark.dgraph.connector.package.dgraph.nodes(endpoint)
     23 

[/tmp/spark-35dff170-5bc9-445c-8c77-d660d1c9e089/userFiles-c1cddd28-8d83-472a-95bf-55b93f215dff/uk.co.gresearch.spark_spark-dgraph-connector_2.12-0.9.0-3.3.jar/gresearch/spark/dgraph/connector/__init__.py](https://localhost:8080/#) in nodes(self, target, *targets)
     74 
     75     def nodes(self, target, *targets) -> DataFrame:
---> 76         jdf = self._reader.nodes(target, self._toSeq(targets))
     77         return DataFrame(jdf, self._spark)
     78 

[/content/spark-3.3.2-bin-hadoop3/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py](https://localhost:8080/#) in __call__(self, *args)
   1319 
   1320         answer = self.gateway_client.send_command(command)
-> 1321         return_value = get_return_value(
   1322             answer, self.gateway_client, self.target_id, self.name)
   1323 

[/content/spark-3.3.2-bin-hadoop3/python/pyspark/sql/utils.py](https://localhost:8080/#) in deco(*a, **kw)
    188     def deco(*a: Any, **kw: Any) -> Any:
    189         try:
--> 190             return f(*a, **kw)
    191         except Py4JJavaError as e:
    192             converted = convert_exception(e.java_exception)

[/content/spark-3.3.2-bin-hadoop3/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py](https://localhost:8080/#) in get_return_value(answer, gateway_client, target_id, name)
    324             value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325             if answer[1] == REFERENCE_TYPE:
--> 326                 raise Py4JJavaError(
    327                     "An error occurred while calling {0}{1}{2}.\n".
    328                     format(target_id, ".", name), value)

Py4JJavaError: An error occurred while calling o518.nodes.
: java.lang.RuntimeException: Could not retrieve cluster state from Dgraph alphas (<snip>:9080)
	at uk.co.gresearch.spark.dgraph.connector.ClusterStateProvider.$anonfun$getClusterState$5(ClusterStateProvider.scala:32)
	at scala.Option.getOrElse(Option.scala:189)
	at uk.co.gresearch.spark.dgraph.connector.ClusterStateProvider.getClusterState(ClusterStateProvider.scala:32)
	at uk.co.gresearch.spark.dgraph.connector.ClusterStateProvider.getClusterState$(ClusterStateProvider.scala:25)
	at uk.co.gresearch.spark.dgraph.connector.sources.NodeSource.getClusterState(NodeSource.scala:33)
	at uk.co.gresearch.spark.dgraph.connector.sources.NodeSource.getTable(NodeSource.scala:89)
	at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:92)
	at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.loadV2Source(DataSourceV2Utils.scala:140)
	at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:209)
	at scala.Option.flatMap(Option.scala:271)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:207)
	at uk.co.gresearch.spark.dgraph.connector.DgraphReader.nodes(DgraphReader.scala:59)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
	at java.lang.Thread.run(Thread.java:750)

Also, I can curl port 6080 which is the zero's http state endpoint and it's working as expected.

Is there some error log before those exceptions? Do you get logging at all? It looks like you are using the Python API?

Try to fetch the following URLs:

http://TARGET:8080/state
https://TARGET:8080/state

Note that when your target host and port given to spark.read.nodes is "TARGET:9080", then use TARGET:8080 above (port - 1000).

You could also add

spark.sparkContext.setLogLevel("INFO")

and watch out for a log line like

INFO DefaultSource: retrieved cluster state from localhost:9080 with 3717 bytes in 0.002s

OK, added that log level and found the error in the logs:

{"pid":7,"type":"jupyter","level":40,"msg":"java.lang.NumberFormatException: For input string: "18446055125930680484"","time":"2023-04-01T18:06:00.767Z","v":0}

{"pid":7,"type":"jupyter","level":40,"msg":"23/04/01 18:06:00 ERROR DefaultSource: retrieving state from http://198.211.110.95:8080/state failed: For input string: "18446055125930680484"","time":"2023-04-01T18:06:00.780Z","v":0}

The very large number is themaxNodes attribute in the license entry for the cluster.

Here's the full log output for completeness:

{"pid":7,"type":"jupyter","level":40,"msg":"\tcom.google.code.gson#gson;2.9.0 from central in [default]","time":"2023-04-01T18:04:59.929Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tcom.lihaoyi#geny_2.12;0.6.10 from central in [default]","time":"2023-04-01T18:04:59.930Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tcom.lihaoyi#requests_2.12;0.7.1 from central in [default]","time":"2023-04-01T18:04:59.930Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\torg.apache.commons#commons-lang3;3.12.0 from central in [default]","time":"2023-04-01T18:04:59.930Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tuk.co.gresearch.dgraph#dgraph4j-shaded;21.12.0-0 from central in [default]","time":"2023-04-01T18:04:59.931Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tuk.co.gresearch.spark#spark-dgraph-connector_2.12;0.9.0-3.3 from central in [default]","time":"2023-04-01T18:04:59.931Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\t---------------------------------------------------------------------","time":"2023-04-01T18:04:59.931Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\t---------------------------------------------------------------------","time":"2023-04-01T18:04:59.932Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\t---------------------------------------------------------------------","time":"2023-04-01T18:04:59.933Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":":: retrieving :: org.apache.spark#spark-submit-parent-b267e7e5-5f92-4950-a2e6-e688d9dc209c","time":"2023-04-01T18:04:59.938Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tconfs: [default]","time":"2023-04-01T18:04:59.939Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\t6 artifacts copied, 0 already retrieved (12723kB/44ms)","time":"2023-04-01T18:04:59.983Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"23/04/01 18:05:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable","time":"2023-04-01T18:05:00.336Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"Setting default log level to \"","time":"2023-04-01T18:05:00.733Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"WARN","time":"2023-04-01T18:05:00.734Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\".","time":"2023-04-01T18:05:00.735Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).","time":"2023-04-01T18:05:00.736Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"23/04/01 18:05:04 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.","time":"2023-04-01T18:05:04.125Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"23/04/01 18:05:04 INFO SharedState: Warehouse path is 'file:/content/spark-warehouse'.","time":"2023-04-01T18:05:04.146Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"SLF4J: Failed to load class \"org.slf4j.impl.StaticLoggerBinder\".","time":"2023-04-01T18:05:07.822Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"SLF4J: Defaulting to no-operation (NOP) logger implementation","time":"2023-04-01T18:05:07.822Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.","time":"2023-04-01T18:05:07.822Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"23/04/01 18:06:00 INFO DefaultSource: retrieved cluster state from 198.211.110.95:9080 with 5124 bytes in 0.399s","time":"2023-04-01T18:06:00.752Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"23/04/01 18:06:00 ERROR DefaultSource: failed to parse cluster state json: {\"counter\":\"1156\",\"groups\":{\"1\":{\"members\":{\"9\":{\"id\":\"9\",\"groupId\":1,\"addr\":\"localhost:7080\",\"leader\":true,\"amDead\":false,\"lastUpdate\":\"1680372002\",\"learner\":false,\"clusterInfoOnly\":false,\"forceGroupId\":false}},\"tablets\":{\"\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000actor.film\":{\"groupId\":1,\"predicate\":\"\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000actor.film\",\"force\":false,\"onDiskBytes\":\"0\",\"remove\":false,\"readOnly\":false,\"moveTs\":\"0\",\"uncompressedBytes\":\"0\"},\"\\u0000\\u0000\\u0000\\u0000\\u[…]compressedBytes\":\"0\"}},\"snapshotTs\":\"783\",\"checksum\":\"6760848057398901375\",\"checkpointTs\":\"0\"}},\"zeros\":{\"1\":{\"id\":\"1\",\"groupId\":0,\"addr\":\"localhost:5080\",\"leader\":true,\"amDead\":false,\"lastUpdate\":\"0\",\"learner\":false,\"clusterInfoOnly\":false,\"forceGroupId\":false}},\"maxUID\":\"18446055125930680484\",\"maxTxnTs\":\"70000\",\"maxNsID\":\"0\",\"maxRaftId\":\"9\",\"removed\":[],\"cid\":\"b598edca-10c9-4622-ad2e-3b91c1d8bffb\",\"license\":{\"user\":\"\",\"maxNodes\":\"18446744073709551615\",\"expiryTs\":\"1682712273\",\"enabled\":true}}","time":"2023-04-01T18:06:00.767Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"java.lang.NumberFormatException: For input string: \"18446055125930680484\"","time":"2023-04-01T18:06:00.767Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)","time":"2023-04-01T18:06:00.768Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat java.lang.Long.parseLong(Long.java:592)","time":"2023-04-01T18:06:00.768Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat java.lang.Long.parseLong(Long.java:631)","time":"2023-04-01T18:06:00.769Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat com.google.gson.JsonPrimitive.getAsLong(JsonPrimitive.java:238)","time":"2023-04-01T18:06:00.769Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat uk.co.gresearch.spark.dgraph.connector.ClusterState$.fromJson(ClusterState.scala:45)","time":"2023-04-01T18:06:00.769Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat uk.co.gresearch.spark.dgraph.connector.ClusterStateProvider.getClusterState(ClusterStateProvider.scala:58)","time":"2023-04-01T18:06:00.770Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat uk.co.gresearch.spark.dgraph.connector.ClusterStateProvider.getClusterState$(ClusterStateProvider.scala:43)","time":"2023-04-01T18:06:00.770Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat uk.co.gresearch.spark.dgraph.connector.sources.NodeSource.getClusterState(NodeSource.scala:33)","time":"2023-04-01T18:06:00.771Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat uk.co.gresearch.spark.dgraph.connector.ClusterStateProvider.$anonfun$getClusterState$1(ClusterStateProvider.scala:26)","time":"2023-04-01T18:06:00.771Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat scala.collection.immutable.List.map(List.scala:293)","time":"2023-04-01T18:06:00.771Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat uk.co.gresearch.spark.dgraph.connector.ClusterStateProvider.getClusterState(ClusterStateProvider.scala:26)","time":"2023-04-01T18:06:00.772Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat uk.co.gresearch.spark.dgraph.connector.ClusterStateProvider.getClusterState$(ClusterStateProvider.scala:25)","time":"2023-04-01T18:06:00.772Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat uk.co.gresearch.spark.dgraph.connector.sources.NodeSource.getClusterState(NodeSource.scala:33)","time":"2023-04-01T18:06:00.772Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat uk.co.gresearch.spark.dgraph.connector.sources.NodeSource.getTable(NodeSource.scala:89)","time":"2023-04-01T18:06:00.773Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:92)","time":"2023-04-01T18:06:00.773Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.loadV2Source(DataSourceV2Utils.scala:140)","time":"2023-04-01T18:06:00.774Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:209)","time":"2023-04-01T18:06:00.774Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat scala.Option.flatMap(Option.scala:271)","time":"2023-04-01T18:06:00.774Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:207)","time":"2023-04-01T18:06:00.775Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat uk.co.gresearch.spark.dgraph.connector.DgraphReader.nodes(DgraphReader.scala:59)","time":"2023-04-01T18:06:00.775Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)","time":"2023-04-01T18:06:00.775Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)","time":"2023-04-01T18:06:00.776Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)","time":"2023-04-01T18:06:00.776Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat java.lang.reflect.Method.invoke(Method.java:498)","time":"2023-04-01T18:06:00.778Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)","time":"2023-04-01T18:06:00.778Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)","time":"2023-04-01T18:06:00.778Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat py4j.Gateway.invoke(Gateway.java:282)","time":"2023-04-01T18:06:00.778Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)","time":"2023-04-01T18:06:00.778Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat py4j.commands.CallCommand.execute(CallCommand.java:79)","time":"2023-04-01T18:06:00.778Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)","time":"2023-04-01T18:06:00.779Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat py4j.ClientServerConnection.run(ClientServerConnection.java:106)","time":"2023-04-01T18:06:00.779Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"\tat java.lang.Thread.run(Thread.java:750)","time":"2023-04-01T18:06:00.779Z","v":0}
{"pid":7,"type":"jupyter","level":40,"msg":"23/04/01 18:06:00 ERROR DefaultSource: retrieving state from http://198.211.110.95:8080/state failed: For input string: \"18446055125930680484\"","time":"2023-04-01T18:06:00.780Z","v":0}

This number 18446055125930680484 does not fit into a Long, so this can be interpreted as largest possible number. This number is used to partition the data as an upper limit of nodes. Being the largest possible number, there is no upper limit any more. I think, the partitioning has to be modified to work without such an upper limit.

Right, but this maxNodes is part of the license restriction and pertains to the node count (zeros and alphas) in the cluster, not the number of graph nodes. Also, it's been present and set at that large uint64 for many releases (including v20), how was this working previously I wonder?

Here's a snippet of the state from a v20.11.0 cluster:

  "license":{
    "user":"",
    "maxNodes":"18446744073709551615",
    "expiryTs":"1683045659",
    "enabled":true
  }

A little more detail, I found this large integer in one of your tests...

The error says it is failing on

"maxUID":"18446055125930680484"

not

"maxNodes":"18446744073709551615"

The license is not used by the connector, but the maxUID is. It is used to partition the space of existing UIDs. Given this large number, partitioning is not possible any more (or useless).

I have created #216, which treats maxUID that are larger than a long as infinity, which is ignored. This should make the connector work again, but partitioning will be poor. It will work for small graphs.

In a separate but more extensive PR, I will try to make UID partitioning work for that case.

Ah, good on you, thanks. I didn't realize that maxUID had gotten that large.

I retried on a different cluster (one that I'm managing on my own network) and got beyond this maxUID error. However new issues arise.

<snip>
endpoint = "10.0.1.251:9080"
df = spark.read.dgraph.nodes(endpoint)
print("count", df.count())

Fails with:

23/04/07 14:05:26 WARN ManagedChannelImpl: [Channel<25>: (alpha:9080)] Failed to resolve name. status=Status{code=UNAVAILABLE, description=Unable to resolve host alpha, cause=java.lang.RuntimeException: java.net.UnknownHostException: alpha: nodename nor servname provided, or not known
at io.dgraph.dgraph4j.shaded.io.grpc.internal.DnsNameResolver.resolveAddresses(DnsNameResolver.java:223)
at io.dgraph.dgraph4j.shaded.io.grpc.internal.DnsNameResolver.doResolve(DnsNameResolver.java:282)
at io.dgraph.dgraph4j.shaded.io.grpc.internal.DnsNameResolver$Resolve.run(DnsNameResolver.java:318)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.net.UnknownHostException: alpha: nodename nor servname provided, or not known
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)

Seems like you (or maybe the default in dgraph4j) are relying on hostname alpha. I'm running a cluster on another machine. I added line 10.0.1.251 alpha to my /etc/hosts and it gets beyond that. Did I overlook a configuration option for setting zero and alpha(s) addresses?

Dealing now with a Query failed because the result is too large issue, but I see that you have documented ways to deal with that which I am now exploring.

I didn't realize that maxUID had gotten that large.

What kind of cluster was that, the Dgraph cloud service? That is a pretty unlikely maxUID.

The alpha nodes connect to the zero nodes and send their hostname:port, which is handed to the client and used to connect to the alphas. So the client has to be able to resolve the alpha's hostnames.

Alternatively, you start the alpha nodes with --my IP:PORT, which is sent to the zero nodes and thus used by the clients.

@matthewmcneely what kind of cluster did you use to reach that large maxUID?

@EnricoMi Yeah, that's odd. It was a fresh cluster onto which I loaded the 1M movie dataset. Can't understand how the uid level got so high. Weird. Anyway, that's not your issue. Feel free to close this ticket.

Well, it is an issue if this is common to happen. The connector as it is now would not be able to read from such a dgraph cluster.

If I could reproduce this, that would be very helpful. If you'd be happy to share the steps of how you loaded the dataset that led to those uid levels, that would be great.

@EnricoMi I didn't write down the steps, but to the best of my memory, I launched a new EC2 instance and followed these instructions: https://dgraph.io/tour/moredata/1/

I'm nearly 100% certain that it was a clean instance .

Thanks for the input.