Unable to reach port 4317
niraj8241 opened this issue · 2 comments
Introduction
I have installed aws otel collector using the help chart provided here in this repository. I am able to send the metrics to cloudwatch and i can see logs appearing in cloudwatch logs as well, which is a good sign that the collector is working.
Issues
The installation went fine with couple of hiccups. I tried instrumenting a sample application but the pod is unable to connect to collector service on port 4317. Logs below:
My otel helm release looks like below:
`
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: aws-otel
namespace: infra
spec:
releaseName: aws-otel
interval: 5m
chart:
spec:
chart: adot-exporter-for-eks-on-ec2
sourceRef:
kind: HelmRepository
name: aws-otel
namespace: infra
values:
nameOverride: aws-otel
clusterName: dev
awsRegion: "us-west-2"
adotCollector:
image:
name: "aws-otel-collector"
repository: "amazon/aws-otel-collector"
tag: "v0.29.0"
daemonSetPullPolicy: "IfNotPresent"
sidecarPullPolicy: "Always"
daemonSet:
enabled: true
daemonSetName: "adot-collector-daemonset"
createNamespace: false
namespace: "infra"
clusterRoleName: "dataos-core-dev-adot-collector-role"
clusterRoleBindingName: "adot-collector-role-binding"
command:
- "/awscollector"
- "--config=/conf/adot-config.yaml"
resources:
limits:
cpu: "200m"
memory: "200Mi"
requests:
cpu: "200m"
memory: "200Mi"
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
exporters:
awsxray:
region: us-west-2
processors:
memory_limiter:
limit_mib: 100
check_interval: 5s
extensions:
sigv4auth:
assume_role:
arn: "arn:aws:iam::xxxxxxxxxx:role/adot-collector-sa"
sts_region: "us-west-2"
cwexporters:
namespace: "ContainerInsights"
logGroupName: "aws-otel"
logStreamName: "InputNodeName"
enabled: true
dimensionRollupOption: "NoDimensionRollup"
parseJsonEncodedAttrValues: [ "Sources", "kubernetes" ]
metricDeclarations: |
# node metrics
- dimensions: [[NodeName, InstanceId, ClusterName]]
metric_name_selectors:
- node_cpu_utilization
- node_memory_utilization
- node_network_total_bytes
- node_cpu_reserved_capacity
- node_memory_reserved_capacity
- node_number_of_running_pods
- node_number_of_running_containers
- dimensions: [[ClusterName]]
metric_name_selectors:
- node_cpu_utilization
- node_memory_utilization
- node_network_total_bytes
- node_cpu_reserved_capacity
- node_memory_reserved_capacity
- node_number_of_running_pods
- node_number_of_running_containers
- node_cpu_usage_total
- node_cpu_limit
- node_memory_working_set
- node_memory_limit
# pod metrics
- dimensions: [[PodName, Namespace, ClusterName], [Service, Namespace, ClusterName], [Namespace, ClusterName], [ClusterName]]
metric_name_selectors:
- pod_cpu_utilization
- pod_memory_utilization
- pod_network_rx_bytes
- pod_network_tx_bytes
- pod_cpu_utilization_over_pod_limit
- pod_memory_utilization_over_pod_limit
- dimensions: [[PodName, Namespace, ClusterName], [ClusterName]]
metric_name_selectors:
- pod_cpu_reserved_capacity
- pod_memory_reserved_capacity
- dimensions: [[PodName, Namespace, ClusterName]]
metric_name_selectors:
- pod_number_of_container_restarts
# cluster metrics
- dimensions: [[ClusterName]]
metric_name_selectors:
- cluster_node_count
- cluster_failed_node_count
# service metrics
- dimensions: [[Service, Namespace, ClusterName], [ClusterName]]
metric_name_selectors:
- service_number_of_running_pods
# node fs metrics
- dimensions: [[NodeName, InstanceId, ClusterName], [ClusterName]]
metric_name_selectors:
- node_filesystem_utilization
# namespace metrics
- dimensions: [[Namespace, ClusterName], [ClusterName]]
metric_name_selectors:
- namespace_number_of_running_pods
service:
pipelines:
traces:
processors:
- memory_limiter
receivers:
- otlp
exporters:
- awsxray
metrics:
receivers: [ "awscontainerinsightreceiver"]
processors: [ "batch/metrics" ]
exporters: [ "awsemf"]
extensions: [ "sigv4auth" ]
`
Further Troubleshooting
- I tried installing a test pod to run nmap to verify if the service are actually running or not. But i don't get anything back from nmap. Screenshot below
- I even tried telnetting and i get a connection refused.
Additional Issues
Apart from the above issue There are a couple of things i do not understand:
- How to get an IAM Role created using the helm chart. I had to manually create IRSA and then use with
sigv4auth
. - How do i get to choose the type of installation. ( Daemonset, statefulset, or side car type )
- Is this a preferred way to install or i need to use the helm chart. As i see no mention of helm charts in the documentation.
- How do i expose the ports of collector as a service.
Any help on this would be really appreciated as i cannot find anything on the internet related to such a problem.
This issue is stale because it has been open 90 days with no activity. If you want to keep this issue open, please just leave a comment below and auto-close will be canceled
This issue was closed because it has been marked as stale for 30 days with no activity.