SharePoint Pipeline
rajasekhar-gundala opened this issue · 14 comments
Hi Team,
Thank you very much for the project.
I deployed the data plane using docker-compose. Is there any sample pipeline configuration? I wanted to create a SharePoint pipeline to pull the data from SharePoint online.
I will try find some time tomorrow to put together an example you can use.
I will try find some time tomorrow to put together an example you can use.
@saul-data, Thank you very much.
@rajaseg I haven't yet finished but these recipes should help you get started.
Register an azure app: https://recipes.dataplane.app/office-365/register-an-azure-app
Authenticate with Graph API: https://recipes.dataplane.app/office-365/sharepoint-api
@rajaseg I haven't yet finished but these recipes should help you get started.
Register an azure app: https://recipes.dataplane.app/office-365/register-an-azure-app Authenticate with Graph API: https://recipes.dataplane.app/office-365/sharepoint-api
@saul-data, I tried to install requests==2.28.1
as mentioned in the SharePoint API recipe. But no luck. It is in a Running state.
Please find the below image for reference.
@rajaseg thanks - are you running Dataplane with the docker-compose setup? If so, if you have Docker Desktop can you show that all the services are running? Which operating system are you on? It may be a networking issue in docker compose, I'll make a note, I think we need to create a single docker image for a simpler setup.
@rajaseg thanks - are you running Dataplane with the docker-compose setup? If so, if you have Docker Desktop can you show that all the services are running? Which operating system are you on? It may be a networking issue in docker compose, I'll make a note, I think we need to create a single docker image for a simpler setup.
@saul-data, Yes I am running Dataplane using docker-compose in the docker-swarm environment (Ubuntu 20.04.4 LTS). It's behind Caddy proxy. Please find the docker-compose below.
version: '3.7'
services:
postgres:
image: timescale/timescaledb:2.5.1-pg14
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: "Password123!"
POSTGRES_DB: "dataplane"
volumes:
- /mnt/dataplanedb:/var/lib/postgresql/data
healthcheck:
test: [ "CMD", "pg_isready", "-U", "postgres" ]
interval: 30s
retries: 5
networks:
- caddy
deploy:
placement:
constraints: [node.role == manager]
replicas: 1
update_config:
parallelism: 2
delay: 10s
restart_policy:
condition: on-failure
nats:
image: nats:2.7.4-scratch
command:
- "--cluster"
- "nats://0.0.0.0:6222"
- "--http_port"
- "8222"
- "--port"
- "4222"
networks:
- caddy
deploy:
placement:
constraints: [node.role == worker]
replicas: 1
update_config:
parallelism: 2
delay: 10s
restart_policy:
condition: on-failure
nats-r:
image: nats:2.7.4-scratch
command:
- "--cluster"
- "nats://0.0.0.0:6222"
- "--http_port"
- "8222"
- "--port"
- "4222"
- "--routes"
- "nats://nats:6222"
networks:
- caddy
deploy:
placement:
constraints: [node.role == worker]
replicas: 2
update_config:
parallelism: 2
delay: 10s
restart_policy:
condition: on-failure
dataplane:
image: dataplane/dataplane:latest
ports:
- "9001:9001"
volumes:
- /mnt/dataplane:/dataplane/code-files/
environment:
DP_CODE_FOLDER: "/dataplane/code-files/"
secret_db_host: postgres
secret_db_user: postgres
secret_db_pwd: "Manimmahps123!"
secret_db_ssl: "disable"
secret_db_port: "5432"
secret_db_database: "dataplane"
secret_jwt_secret: "45feb20d-06c7-42a5-bdc5-cdfa60b1b39d"
secret_encryption_key: "Lt1nk%Nabcd5S&2VvMpCOiKMXUFo!H9P"
DP_DATABASE: "timescaledb"
DP_PORT: "9001"
DP_NATS: "nats://nats:4222, nats://nats-r_1:4222, nats://nats-r_2:4222"
DP_MODE: "development"
DP_DEBUG: "true"
DP_DB_DEBUG: "false"
DP_MQ_DEBUG: "false"
DP_METRIC_DEBUG: "false"
DP_SCHEDULER_DEBUG: "false"
DP_CLEANTASKS_DAYS: "60"
DP_REMOVELOGS_DAYS: "60"
networks:
- caddy
deploy:
placement:
constraints: [node.role == manager]
replicas: 1
update_config:
parallelism: 2
delay: 10s
restart_policy:
condition: on-failure
dataplane-worker:
image: dataplane/dataplane-worker-python:latest
volumes:
- /mnt/dataplane:/dataplane/code-files/
environment:
DP_CODE_FOLDER: "/dataplane/code-files/"
secret_db_host: postgres
secret_db_user: postgres
secret_db_pwd: "Password123!"
secret_db_ssl: "disable"
secret_db_port: "5432"
secret_db_database: "dataplane"
secret_jwt_secret: "45feb20d-06c7-42a5-bdc5-cdfa60b1b39d"
DP_DATABASE: "timescaledb"
DP_NATS: "nats://nats:4222, nats://nats-r_1:4222, nats://nats-r_2:4222"
DP_MODE: "development"
DP_DEBUG: "true"
DP_DB_DEBUG: "false"
DP_MQ_DEBUG: "false"
DP_METRIC_DEBUG: "false"
DP_SCHEDULER_DEBUG: "false"
DP_WORKER_HEARTBEAT_SECONDS: "1"
DP_WORKER_GROUP: "python_1"
DP_WORKER_CMD: "/bin/sh"
DP_WORKER_TYPE: "container"
DP_WORKER_LB: "roundrobin"
DP_WORKER_ENV: "Development"
DP_WORKER_PORT: "9005"
DP_WORKER_LANGUAGES: "Python"
DP_WORKER_LOAD_PACKAGES: "Python"
networks:
- caddy
deploy:
placement:
constraints: [node.role == worker]
replicas: 3
update_config:
parallelism: 2
delay: 10s
restart_policy:
condition: on-failure
volumes:
dataplanedb:
driver: "local"
dataplane:
driver: "local"
networks:
caddy:
external: true
All the services are running fine. Please find the below image for your reference.
Oh I see, I think that proxy is preventing the pip packages from installing. Are you able to whitelist pypi.python.org - the pip package installation will be trying to connect to that subdomain. You may be able to see it in the dataplane-worker-python logs.
Oh I see, I think that proxy is preventing the pip packages from installing. Are you able to whitelist pypi.python.org - the pip package installation will be trying to connect to that subdomain. You may be able to see it in the dataplane-worker-python logs.
@saul-data, Not sure how to whitelist the pip packages in Caddy. When I try to view the logs, blank page is opened. Please find the screenshot for reference.
Would it be possible to turn off the caddy proxy and see if that works? The logs for the pipeline only show pipeline runs and not worker logs. The pipeline can run on any worker.
To see the worker logs you need to log from the docker container: docker logs [OPTIONS] CONTAINER
https://docs.docker.com/engine/reference/commandline/logs/
I think to get going its best to turn off that proxy because it looks like it might be preventing egress traffic. Any external API, you wont be able to use if it is preventing external traffic.
May be easier to troubleshoot / chat about this on Discord if you like, here is our link: https://discord.gg/Ztu4ASNky8 If you want I can help you setup a hosted version that you can use.
@rajaseg - I just discovered an issue with pip install on the workers. This may not be to do with your proxy. Are you able to send me the logs using docker logs from one of the workers. Does it show this error message?
Defaulting to user installation because normal site-packages is not writeable
May be easier to troubleshoot / chat about this on Discord if you like, here is our link: https://discord.gg/Ztu4ASNky8 If you want I can help you setup a hosted version that you can use.
@saul-data, Sorry I went out to pick up my kids from school. Let me know your availability to connect through discord.
@rajaseg I am available now, if you are.
@rajaseg I am available now, if you are.
@saul-data, Joined now.