amundsen-io/amundsen

Amundsen is unable to import MYSQL data

MalavikaN1 opened this issue · 6 comments

Expected Behavior

Changed the connection string in https://github.com/amundsen-io/amundsen/blob/main/databuilder/example/scripts/sample_mysql_loader.py to load locally hosted MySQL data into Amundsen .
Changes made in above file:

import pymysql
pymysql.install_as_MySQLdb()
def connection_string():
    user = 'root'
    password='root'
    host = 'localhost'
    port = '3307'
    db = 'test_db'
    return "mysql+pymysql://%s:%s@%s:%s/%s" % (user,password, host, port, db)

Current Behavior

While running the python file, I get the following error:
ERROR:neo4j:Failed to write data to connection IPv4Address(('127.0.0.1', 7687)) (IPv4Address(('127.0.0.1', 7687))).

I tried loading the sample data by running the https://github.com/amundsen-io/amundsen/blob/main/databuilder/example/scripts/sample_data_loader.py file and it worked.

Possible Solution

fix: Adding the below code to job_config in sample_mysql_loader.py fixed the issue.
f'publisher.neo4j.{neo4j_csv_publisher.NEO4J_ENCRYPTED}': False
So now the code looks like this:

job_config = ConfigFactory.from_dict({
        f'extractor.mysql_metadata.{MysqlMetadataExtractor.WHERE_CLAUSE_SUFFIX_KEY}': where_clause_suffix,
        f'extractor.mysql_metadata.{MysqlMetadataExtractor.USE_CATALOG_AS_CLUSTER_NAME}': True,
        f'extractor.mysql_metadata.extractor.sqlalchemy.{SQLAlchemyExtractor.CONN_STRING}': connection_string(),
        f'loader.filesystem_csv_neo4j.{FsNeo4jCSVLoader.NODE_DIR_PATH}': node_files_folder,
        f'loader.filesystem_csv_neo4j.{FsNeo4jCSVLoader.RELATION_DIR_PATH}': relationship_files_folder,
        f'publisher.neo4j.{neo4j_csv_publisher.NODE_FILES_DIR}': node_files_folder,
        f'publisher.neo4j.{neo4j_csv_publisher.RELATION_FILES_DIR}': relationship_files_folder,
        f'publisher.neo4j.{neo4j_csv_publisher.NEO4J_END_POINT_KEY}': neo4j_endpoint,
        f'publisher.neo4j.{neo4j_csv_publisher.NEO4J_USER}': neo4j_user,
        f'publisher.neo4j.{neo4j_csv_publisher.NEO4J_PASSWORD}': neo4j_password,
        f'publisher.neo4j.{neo4j_csv_publisher.NEO4J_ENCRYPTED}': False,
        f'publisher.neo4j.{neo4j_csv_publisher.JOB_PUBLISH_TAG}': 'unique_tag',  # should use unique tag here like {ds}
    })

Your Environment

Thanks for opening your first issue here!

I am not totally sure this will help you but it worked for me.
Since

from 4.0 onwards, the default encryption setting is off

Configure SSL Policy for Bolt server and HTTPS server

I would suggest you to try to set encrypted to false in the default security configuration:

default_security_conf = {'trust': neo4j.TRUST_ALL_CERTIFICATES, 'encrypted': False}

both in databuilder/databuilder/publisher/neo4j_csv_publisher.py and databuilder/databuilder/extractor/neo4j_extractor.py

Try it out and let me know!

stale commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale commented

This issue has been automatically closed for inactivity. If you still wish to make these changes, please open a new pull request or reopen this one.

stale commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.