[EKS] [request]: Nodegroup should support tagging ASGs
bhops opened this issue ยท 122 comments
Community Note
- Please vote on this issue by adding a ๐ reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Tell us about your request
It would be great if we could pass tags to the underlying ASGs (and tell the ASGs to propagate tags) that are created from the managed node groups for EKS so that the underlying instances/volumes are tagged appropriately.
Which service(s) is this request for?
EKS
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Currently, managed node groups for EKS do not support propagating tags to ASGs (and therefore the instances) created by the node group. This leads to EC2 instances that are not tagged according to our requirements for tracking cost, and resource ownership.
Passing the managed node tags to launch templates "Instance tags" will automatically apply to both EC2 and its volumes. If there are some challenges to do that, creating a separate "Custom Tags" section in the EKS managed node configuration page will also be helpful.
Workaround to add custom tags to WorkerNodes using EKS managed NodeGroup :
-
Create a managed worker Node group in EKS console. (Set Minimum & desired count as 1)
-
EKS creates an ASG in the background. You will find ASG information for EKS NodeGroup details in EKS console. Select ASG associated with managed worker NodeGroup > tags > Add your custom tags for EC2.
Note: Make sure to checkbox "Tag New Instances" while creating new tags.
-
Terminate the newly launched Ec2 without tags.
-
Scale up ManagedNodeGroup as per requirement.
-
After completing above steps, EKS managed Node groups will tag new EC2 Instance with custom tags.
This is crucial feature that is missing, and is the only reason our department is not moving from manual ASGs to node groups.
Yes, this is a very important change. We also cannot use this because of the need for tags. Bad practice to use semi-automatic infrastructure as code.
Any update ?
any updates ? can you open source node groups.. so, community can contribute ?
any updates ?
I don't think it's a duplicate. This one is for an API feature to add tags to the ASG created by the API, and also be able to set the flag on the ASG that propagates tags outwards: so it's only an API change to implement the same thing down manually in the workaround above.
#374 is for the EKS Cluster object itself to support propagating tags down, in the way ASGs already do. I imagine #374 would partially work by propagating tags to ASGs, and then turning on ASG tag propagation, rather than duplicating the behaviour.
Team: Having this functionality available will enable customers to use Cluster Autoscaler's capacity autodiscovery feature instead of forcing them to maintain manual capacity mappings on the command line.
The documentation there isn't super clear (see kubernetes/autoscaler#3198 for documentation updates), but advertising capacity resources to Cluster Autoscaler via ASG tags will make the use of multiple heterogeneous Auto Scaling Groups much easier for customers.
Team: Having this functionality available will enable customers to use Cluster Autoscaler's capacity autodiscovery feature instead of forcing them to maintain manual capacity mappings on the command line.
The documentation there isn't super clear (see kubernetes/autoscaler#3198 for documentation updates), but advertising capacity resources to Cluster Autoscaler via ASG tags will make the use of multiple heterogeneous Auto Scaling Groups much easier for customers.
@otterley While Managed Nodegroup doesn't support customer provided tags for ASGs today, we do add the necessary tags for CAS auto discovery to the ASG i.e. k8s.io/cluster-autoscaler/enabled
and k8s.io/cluster-autoscaler/<CLUSTER NAME>
.
@rtripat Understood. Perhaps I wasn't clear, but I was specifically referring to the ability to autodiscover specific capacity dimensions of an ASG such as cpu, memory, ephemeral storage, GPU, etc.
Until this feature is ready, I've had success with creating a cloudwatch rule based upon EC2 "pending" status, invoking a lambda that checks the instance_id
passed in through the event, checks the instance_id
to see if it's part of a managed node cluster, then adds the appropriate tags. I'm doing this all through Terraform with the spin up of the eks cluster.
Obviously would be much easier with a tags option! ๐
It could be great to be able to tag the launch templates too with the option to propagate these tags to instances and volumes or not.
Is there some kind of best practice on tagging ASG vs tagging LT? It seems to me that tagging LT offers more flexibility (like the ability to tag the volumes).
https://docs.aws.amazon.com/autoscaling/ec2/userguide/autoscaling-tagging.html touches upon the overlap in tag propagation between ASGs and Launch Templates.
https://docs.aws.amazon.com/autoscaling/ec2/userguide/autoscaling-tagging.html touches upon the overlap in tag propagation between ASGs and Launch Templates.
That's precisely the documentation page I had in mind when asking about best practices ;-) This page explains the overlap but there are no clear pros and cons of the two tagging approaches. But it seems to me that LT offers more flexibility and that ASG tags should be used only when necessary (like for the cluster autoscaler discovery tags).
There's a related discussion about tagging ASGs and LTs for non-managed Nodegroups at eksctl-io/eksctl#1603. My understanding from there is that tagging LTs and enabling propagation would be sufficient, but there might be use-cases where the ASG needs to have the tag too, but it wouldn't then needed to also support propagation.
The difference observed in that ticket is that the ASG propagation applies the tags after launch, while LT propagation applies the tags as part of the launch.
Yes, I create my non-managed node groups using Terraform and put the tags on the LT with propagation to instances and volumes. The only tags I needed to put on ASG are the cluster autoscaler related tags. But propagation is not needed for these tags.
need this feature too, will impact calculate costs if I add the tags manually later in ASG.
We have EKS deployed as a new part of our stacks in prod through preprod, stage and dev (alongside a very large ECS deployment in each environment). It is very annoying that the instances are not tagged for cost allocation.
+1 cost calcs are reaaaly important
I would also like to see custom names or name prefixes for the autoscaling groups. The auto-generated uuid naming really slows down management of larger clusters.
With managed node groups support for launch templates, you can now add tags to the EC2 instances created as part of your node groups. See EKS docs for details.
I will leave this issue open for a little while, as I want to get some more feedback. The issue as originally opened asks for tags on ASGs, but I suspect most of you ultimately care about tags on EC2 instances, not the ASGs. Please leave any comments if you still have a need for tags on the ASGs themselves. Our vision is we handle any of these ASG tags for you, for example when we implement scale to 0 #724, we'll automatically add the required tags to the ASG.
I will leave this issue open for a little while, as I want to get some more feedback. The issue as originally opened asks for tags on ASGs, but I suspect most of you ultimately care about tags on EC2 instances, not the ASGs.
As the original issue creator, I can confirm that being able to tag the underlying EC2 instances was indeed the intent of the original ask. Though others may have had other reasons for wanting ASGs to be tagged.
Thank you to the EKS team for implementing this!
Tags on the ASG are crucial if the ASG scales to zero. The cluster autoscaler for example will use the ASG tags if they exist. Without a way to propagate tags to the ASGs, we either have to run with unnecessary hosts or we have to bootstrap the ASGs directly.
@dindurthy As I mentioned above, "Our vision is we handle any of these ASG tags for you, for example when we implement scale to 0 #724, we'll automatically add the required tags to the ASG."
@mikestef9 While it is an admirable goal to handle the ASG tags automatically, it seems unlikely you will be able to do it quickly or easily. There are tags for node labels, node taints, and node resources, and it is unlikely EKS will be aware of all of them because of the various ways they can be created. At the moment it appears I cannot even get tags to propagate from the launch template to elastic GPUs (it seems EKS makes a copy of the launch template rather than use it directly, and the copy disables the "tag elastic graphics" setting), which makes me wary of trusting automatic behavior. I would rather you implement direct ASG tagging (or at least copying Launch Template tags to the ASG) first and see about automation later.
Did the tags used to work? I thought they did but all my nodeGroups now no longer have the tags specified in the eksctl configuration. Need to get these tags back in as they are used for Cost reporting.
Docs says the feature is there now but it does not work. Is it a tf 0.13.x feature as I am still on 0.12.x (don't want to move to 0.13 yet)? Would be nice the EC2 worker instances had a meaningful name, rather than just a hyphen.
tags = {
"Name" = "eks-${var.cluster-name}-1"
}
I really want for tags on ASGs.
Because I attach target group to my ASG ( we don't use type: loadbalancer / use nodeport) . Since we have multiple ASGs for different purposes, we need to be able to identify them and attach the Target Group.
really need to pass tags of node-group to ec2 & volumes, otherwise we have to query the instances of eks node-group and tagged it out of automatical process
With managed node groups support for launch templates, you can now add tags to the EC2 instances created as part of your node groups. See EKS docs for details.
I will leave this issue open for a little while, as I want to get some more feedback. The issue as originally opened asks for tags on ASGs, but I suspect most of you ultimately care about tags on EC2 instances, not the ASGs. Please leave any comments if you still have a need for tags on the ASGs themselves. Our vision is we handle any of these ASG tags for you, for example when we implement scale to 0 #724, we'll automatically add the required tags to the ASG.
Tags can be propagated for EC2 instances but lets say if i need my EC2 instances to be tagged as node-01, node-02, node-03..... which is not happening as the ASG is the one which triggers to launch the Nodes not the Launch template. This is something very important.
I need to tag the Autoscaling Group itself, not EC2.
I want to monitor the desired capacity of the AutoscalingGroup in Datadog, and I need to be able to set arbitrary tags on the AutoscalingGroup itself in order to be able to use it comfortably.
Autoscaling Groups created by Managed NodeGroups do not output metrics to CloudWatch, which is another issue, but tagging is still important
It's kind of frustrating not to be able to tag our node group instances programmatically. In my case, I'm using terraform and already tried the tags and additional_tags, and neither one propagated the tags to ASG or instances itself. Our main goal with this tags is the cost allocation, so it would be extremely helpful.
Please leave any comments if you still have a need for tags on the ASGs themselves.
+1
Is @ravvereddy's use-case (of having per-node tags generated by ASG) actually supported?
I don't see anything in the docs hinting that there is some kind of templating for tags propagated from ASGs, so it seems feature-wise that ASG tagging doesn't bring anything more for node tagging than Launch Templates do.
I think it'd be particularly helpful to know if there are any use-cases for ASG tags propagating to nodes that aren't covered by Launch Template instance tags. I'd assume the latter can cover cost-allocation tracking or metric identification for EC2 instances, for example.
If not, then this question becomes simpler as then we have a clear "best practice" for tagging EC2 instances (Launch Templates, which already works), and this ticket can focus on the remaining needs for ASG-specific tags.
Tagging of ASGs themselves is still needed for Cluster Autoscaler scale-to-zero (#724 should cover the specifics of that use-case, I hope, as they do not require instance propagation) and resource ownership identification on accounts shared between teams, which is the use-case I've had in the past. My studio has graduated to multiple accounts under AWS Organizations, so that use-case has fallen off my radar now.
Tagging of the ASG themselves is handy for some other stuff we want to run e.g https://github.com/AutoSpotting/AutoSpotting requires a tag on the ASG for it to do it's thing.
I have created a custom resource which tags the ASG and propagates to EC2 instances. Our cluster was created as below:
### EKS control plane ###
Cluster:
Type: AWS::EKS::Cluster
Properties:
Name: !Sub ${EKSClusterName}-${Environment}
Version: !Sub ${KubernetesVersion}
RoleArn: !GetAtt ClusterRole.Arn
ResourcesVpcConfig:
SecurityGroupIds:
- !Ref ClusterControlPlaneSecurityGroup
SubnetIds:
- Fn::ImportValue: !Sub ${VpcStackName}-${Environment}-private-a
- Fn::ImportValue: !Sub ${VpcStackName}-${Environment}-private-b
- Fn::ImportValue: !Sub ${VpcStackName}-${Environment}-private-c
The node group was created like this:
### Create EKS managed node group ###
Nodegroup:
DependsOn: Cluster
Type: 'AWS::EKS::Nodegroup'
Properties:
NodegroupName: !Sub ${EKSClusterName}-node-${Environment}
ClusterName: !Ref Cluster
InstanceTypes:
- !Ref NodeInstanceType
DiskSize: !Ref NodeVolumeSize
RemoteAccess:
Ec2SshKey: !Sub ${EKSClusterName}-${Environment}
SourceSecurityGroups:
- !Ref NodeSecurityGroup
NodeRole: !GetAtt NodeInstanceRole.Arn
ScalingConfig:
MinSize: !Ref NodeGroupMinSize
MaxSize: !Ref NodeGroupMaxSize
DesiredSize: !If [IsNotProd, 1, !Ref NodeGroupDesiredCapacity]
Labels:
type: !Ref Environment
Subnets:
- Fn::ImportValue: !Sub ${VpcStackName}-${Environment}-private-a
- Fn::ImportValue: !Sub ${VpcStackName}-${Environment}-private-b
- Fn::ImportValue: !Sub ${VpcStackName}-${Environment}-private-c
Then we tag the ASG with the custom resource (the tag name is "Name" and our tag value is the cluster name):
## Tag resources ###
AsgTaggingRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service:
- lambda.amazonaws.com
Action:
- sts:AssumeRole
Path: "/"
Policies:
- PolicyName: lambda-logging
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- logs:CreateLogGroup
- logs:CreateLogStream
- logs:PutLogEvents
Resource: arn:aws:logs:*:*:*
- PolicyName: lambda-tagging
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- autoscaling:CreateOrUpdateTags
Resource:
- '*'
- Effect: Allow
Action:
- eks:DescribeNodegroup
Resource: '*'
AsgTagging:
Type: Custom::AsgTagging
Properties:
ServiceToken: !GetAtt AsgTaggingFunction.Arn
AsgId: !GetAtt Nodegroup.NodegroupName # Get the node group name
AsgTaggingFunction:
Type: AWS::Lambda::Function
Properties:
Runtime: python3.7
Handler: index.lambda_handler
MemorySize: 128
Role: !GetAtt AsgTaggingRole.Arn
Timeout: 120
Environment:
Variables:
TAG_KEY: Name
TAG_VALUE: !Ref Cluster # Get the EKS cluster name
EKS_CLUSTER: !Ref Cluster # Get the EKS cluster name
NODE_GROUP: !GetAtt Nodegroup.NodegroupName # Get the node group name
Code:
ZipFile: |
import boto3
from botocore.exceptions import ClientError
import os
import cfnresponse
def lambda_handler(event, context):
print("Event :", event)
data = {}
tag_key = os.getenv('TAG_KEY')
tag_value = os.getenv('TAG_VALUE')
eks_cluster = os.getenv('EKS_CLUSTER')
node_group = os.getenv('NODE_GROUP')
try:
eks = boto3.client('eks')
# Retrieve autoscaling group name
asg = eks.describe_nodegroup(clusterName=eks_cluster, nodegroupName=node_group)['nodegroup']['resources']['autoScalingGroups'][0]['name']
except Exception as e:
print(e)
try:
client = boto3.client('autoscaling')
if event['RequestType'] == 'Create':
res = client.create_or_update_tags(
Tags=[
{
'Key': tag_key,
'PropagateAtLaunch': True,
'ResourceId': asg,
'ResourceType': 'auto-scaling-group',
'Value': tag_value,
}
],
)
data["Reason"] = "The ASG " + asg + " has been tagged."
cfnresponse.send(event, context, cfnresponse.SUCCESS, data)
elif event['RequestType'] == 'Update':
res = client.create_or_update_tags(
Tags=[
{
'Key': tag_key,
'PropagateAtLaunch': True,
'ResourceId': asg,
'ResourceType': 'auto-scaling-group',
'Value': tag_value,
}
],
)
data["Reason"] = "The ASG " + asg + " has been tagged."
cfnresponse.send(event, context, cfnresponse.SUCCESS, data)
elif event['RequestType'] == 'Delete':
data["Reason"] = "Resource deleted"
cfnresponse.send(event, context, cfnresponse.SUCCESS, data)
else:
data["Reason"] = "Unknown operation: " + event['RequestType']
cfnresponse.send(event, context, cfnresponse.FAILED, data, "")
except Exception as e:
data["Reason"] = "Cannot " + event['RequestType'] + " Resource: " + str(e)
cfnresponse.send(event, context, cfnresponse.FAILED, data, "")
I hope this can help.
Anything missing mandatory tags is considered non-complaint in my organization. EKS ASGs got deleted when compliance scan kicks in. We really need to have the tags propagated from the managed node group to these ASGs.
Same here, using the launch template to propagate tags to EC2 instances are not enough, we also need to tag the ASG itself to be compliant to our organization's policy, otherwise it will be scaled down to 0.
Anything missing mandatory tags is considered non-complaint in my organization. EKS ASGs got deleted when compliance scan kicks in. We really need to have the tags propagated from the managed node group to these ASGs.
Why is this issue controversial? Many company need tag on ASG for cost and compliance reason.
I'm not sure what you're seeing as controversial in this ticket?
- The originally-described use-case for this feature has been resolved elsewhere, per #608 (comment)
- ASG tags for use by cluster autoscaler scale-to-zero is being resolved in #724 by adding support to CA to not depend on the ASG tags
- Other uses of ASG tags have been described in this ticket, as requested in #608 (comment)
I don't see anyone saying that this should not happen, or otherwise introducing controversy?
EKS ver == 1.20
As a workaround, have to launch a nodegroup, then custom launch template with resource tags from the nodegroup template, then delete existing nodegroup, and re-run new nodegroup with customize template to apply resource tags.
Why this issue is controversial I do not know.
AWS has always had AutoscalingGroups, and has always had resource tags. We just want to take advantage of it. AWS don't need to develop anything additional.
Why can't managed services take advantage of these features?
Am I making a strange request?
The AutoscalingGroup name automatically generated by Managed Node Groups is indistinguishable to humans. Without resource tags, you won't be able to comfortably tell them apart.
Additional contexts:
#608 (comment)
Trying to manage SPOT/OD node groups as managed node groups.
To be able to scale from zero in such scenario I need to tag the ASG's according to this doc.
Without the option to add custom tags, I'm unable to make this work with managed node groups.
@TBBle this is related to the Scale-from-zero with Cluster Autoscaler #724 but acording to the docs is needed more urgently to have cluster autoscaler work correctly with labelled and tainted nodes.
In my mind there are three ways to solve this.
- Support copying all managed node group tags to the ASG
- Support copying all tags with a specific prefix or prefixes to the ASG
- Automate creating the
k8s.io/cluster-autoscaler/node-template/label/
&k8s.io/cluster-autoscaler/node-template/taint/
tags on the ASG
Option 1 would be the status quo to un-managed node groups, option 2 would limit the scope of the tags and option 3 would actually make managed node groups a better solution than their un-managed counterparts.
The long-term solution chosen by AWS is none of those (instead, Cluster Autoscaler reads Managed Nodegroups metadata directly to learn the labels and taints), but in #724 you'll see examples and workarounds implementing your approaches, and that would be the place to make your case that scale-from-zero can't wait for the implementation of the CA feature, but should be handled by some kind of ASG tag automation as you have described.
@otterley While Managed Nodegroup doesn't support customer provided tags for ASGs today, we do add the necessary tags for CAS auto discovery to the ASG i.e.
k8s.io/cluster-autoscaler/enabled
andk8s.io/cluster-autoscaler/<CLUSTER NAME>
.
@rtripat I wouldn't say that all necessary tags are addedโฆ
I'm trying to use autoscaler in architecture-mixed EKS (ARM + x86) and it just doesn't works because I have GitLab Runner running on ARM node which spins x86 Pod. Autoscaler is totally unaware of nodeSelector kubernetes.io/arch=amd64
I set for GitLab Runner and can't scale up from 0-node x86 node groups.
I followed docs, added tags to terrafrom (k8s.io/cluster-autoscaler/node-template/label/kubernetes.io/arch=amd64
to be specific), they were added to nodegroups andโฆ wellโฆ doesn't works because ASG didn't get those tags. Adding them to ASG manually makes autoscaler works properly in this scenario.
But handling OS and Arch should be fully automatic. C'mon, EKS management costs A LOT and it's unable to export basic Kubernetes labelsโฆ :/
As noted in #724, hashicorp/terraform-provider-aws#20674 is pending release which will allow you to add the tags to the ASG that's implicitly created by the managed node group (assuming you're using terraform to create those node pools or are able to otherwise find out the ASG name).
It's a lot more work than if it would happen automatically, but it's at least possible now.
@daenney will EKS terraform module be also updated to fix this? https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest
@morsik No idea, I don't use that module and am not one of the maintainers. I'd suggest raising that on their issue tracker: https://github.com/terraform-aws-modules/terraform-aws-eks/issues.
Passing the managed node tags to launch templates "Instance tags" will automatically apply to both EC2 and its volumes. If there are some challenges to do that, creating a separate "Custom Tags" section in the EKS managed node configuration page will also be helpful.
Can you please help identify where the "Custom Tags" section exists?
You haven't mentioned what tool you're using, if any, but at the EC2 API level, that was probably referring to TagSpecifications
in the RequestLaunchTemplateData
object used with the CreateLaunchTemplate
and CreateLaunchTemplateVersion
APIs.
That's what's used for terraform-provider-aws's launch template implementation, and eksctl's managed and unmanaged node group implementations.
The issue with aws_autoscaling_group_tag
is, in order to be able to loop over the ASGs they have to exist. So one can't write modules to create node groups and tag the auto scaling groups in one go. It will always require a --target apply
due to a Terraform limitation.
The only viable approach is to:
- Have the node group resource propagate the tags
- Have the CA read the tags from node group instead of the auto scaling group
I understand from the thread that option 2 is preferred. But I can't find any PR or issue regarding this in the CA. Is AWS working on this?
Why is this issue controversial? Many company need tag on ASG for cost and compliance reason.
Same here. ASG tagging is mandatory in my company for cost and compliance.
I find it useful to use the launchtemplate configuration for the terraform eks module.
This way I'm able to tag the underlying instances. Tagging the instances in my case is more valuable that tagging the ASG it self.
See TF example here.
I'll say it again: I don't want to monitor EC2, I want to monitor the AutoscalingGroup itself.
#608 (comment)
Is this even still being looked at and worked on? For us ASG tagging is a compliance issue. I know there are workarounds using lambda and such, however our enterprise has a ton of accounts and having this be the default behavior of nodegroups would make our lives so much simpler. Doesn't seem like a big ask.
It's almost two years for just setting a simple tag... which turns out to be critical for compliance, security, scaling node groups...
I don't think it's even been "worked on", it's still in the 'Researching' stage, and probably lost a lot of momentum when it was thought that the value was only for cluster-autoscaler, or for tags on ASGs only to propagate to nodes, both of which are being addressed elsewhere. Or at least that's my impression from #608 (comment).
Disappointingly, since @mikestef9 asked in that comment a year ago for more ASG-tagging use-cases, we haven't heard any follow-up (or even acknowledgement) on the various use-cases that have been shared here.
@TBBle there are a lot of ASG tagging use cases in this thread before the @mikestef9 comment that launch template tagging wont address, there are also a lot of the ASG tagging use cases called out in #724 when discussing the tags for CA.
Looking back, the only comment I can see that requested something other than node-tagging via tag propagation, or Cluster Autoscaler tags for scale-from-zero, was #608 (comment) 10 days before @mikestef9's comment, and well after the development work that delivered Managed Node Group Launch Template support would have happened to address the needs called out in this ticket.
A quick skim doesn't show anything in #724 up to that date that isn't specifically about CA scale-from-zero, and if there is such a thing, overlooking it in the context of the rest of the discussion would be pretty easy.
Even the original feature request of this ticket was tagging ASGs for propagation to nodes and volumes.
@TBBle I was sure I'd read a number of requests about compliance policies requiring ASGs to be tagged, but both of these issue threads are so long it's hard to parse the data out of them, but that'd be the one I'd be putting forward.
I'm still of the opinion that something as trivial as cascading tags from MNGs to ASGs should have been implemented as part of the MVP. If there is a better way to handle the same use cases it needs to actually exist and then it can be adopted by virtue and not as the only solution (or no solution).
The threads are long, but the vast majority of activity was after the comment from @mikestef9 that we were talking about, in August 2020. It's about 20 comments in, compared to the fortyish that have come after that. And #724 spent most of its time before August 2020 talking about actually setting minSize to 0 (and directly modifying the ASG to work around this); the tagging discussion only came up there once, in July 2020, until people were referred to there from here by that comment.
Anyway, looking at "should have been" isn't very valuable here. This is containers-roadmap
, not containers-if-we-could-turn-back-time
. What's interesting is what is going to be done; currently, that's an empty space, and the rest of the discussion just makes it more likely that the actual use-cases will be lost in the noise and nothing will continue to happen. -_-
(If I could turn back time on this ticket, I'd have opened a different ticket for tagging ASGs for their own sake back when I subscribed to this ticket; at the time I didn't realise that the node-group-tag-propagation part of the request was going to be the need that was fulfilled. Similarly I regret not leaving a comment for my at-the-time use-case way back then too)
@TBBle @stevehipwell I created a separate issue to track the request for ASG tagging (for compliance and cost reasons). This is quite important for many organizations to adopt EKS.
the EBS CSI controller needs the tag topology.ebs.csi.aws.com/zone
. The PVs created with nodeAffinity
and expects the nodes to have topology.ebs.csi.aws.com/zone
, when the ASGs have 0 instances, cluster autoscaler needs ASGs to have those tags..
nodegroups manage ASGs, nodegroups should be able to tag ASGs
aws-node-termination-handler Queue mode needs ASGs to be tagged with Key=aws-node-termination-handler/managed
. Ability to propagate tags from MNG to ASG will make it easier for users.
@nkk1 There's a proposal/discussion for Cluster Autoscaler to assume/populate topology.ebs.csi.aws.com/zone
on the scale-from-zero template using the AWS backend. Since the label's auto-added by the CSI Controller, I think that'll be less surprising for the user, and frankly will probably land sooner than ASG user-defined tagging support.
If CA doesn't accept that proposal, then user ASG tagging be the only way to make scale-from-zero work correctly with Managed Node Groups and the EBS CSI Controller, since the effort to migrate away from that label seems to have failed.
@askulkarni2 I think the guidance is to not use NTH with MNGs, @bwagner5 can probably shed a bit more light here.
@stevehipwell @askulkarni2 That is correct. NTH is not needed when using managed node groups for handling terminations. MNG already gracefully handle terminations due to ASG scale-down, spot interruptions, and capacity rebalance.
@bwagner5 fantastic! Thanks for the insight. And thanks @stevehipwell for bringing it to attention.
That's really crucial for several reasons, following are some of the issues we're experiencing due to this limitation.
- Cost analysis issues due to missing tags on ASG and its EC2
- Running dedicated node groups for specific workloads requires a couple of
cluster-autoscaler
tags for properly scaling node groups based on labels and taints.
So I really hope there will be a proper solution instead of some workarounds.
This is a really important feature!
Let's say we need to tag the ASG to make the cluster auto scaler work like a charm, check tracking cost, resource ownership, ....
Using AWS managed node groups:
- we can set labels and taints and not the ASG tags, in terraform this is easily solved using something like this:
resource "aws_autoscaling_group_tag" "tag_cpu_ng" {
autoscaling_group_name = aws_eks_node_group.cpu_ng.resources[0].autoscaling_groups[0].name
tag {
key = "k8s.io/cluster-autoscaler/node-template/taint/X"
value = "NoSchedule"
propagate_at_launch = true
}
}
And using CloudFormation?
Using self-managed node groups:
- we can create the ASG and set the tags, but how can we set the taints and labels? Using the bootstrap.sh?
๐ โน๏ธ Managed ASG tagging is now implemented in eksctl with eksctl-io/eksctl#5002 and should land in the next release.
I am waiting for this feature to be natively supported by AWS.
While eksctl is an excellent solution, it is not suitable for use in the context of Infrastructure as a Code and needs continued support to be available in solutions such as CloudFormation and Terraform.
@andre-lx 's answer is spot on. Using the "aws_autoscaling_group_tag" resource worked for me. But it only worked for new nodes. So I just cycled out my existing nodes one-by-one, and the new nodes were all tagged as they should. For instance, this is my set up for creating a node group, and creating a aws_autoscaling_group_tag that sets the "Name" tag which shows up in ec2.
resource "aws_eks_node_group" "nodes_group" {
cluster_name = aws_eks_cluster.eks_cluster.name
node_role_arn = aws_iam_role.eks_assume_role.arn
subnet_ids = var.subnet_ids
###########
# Optional
ami_type = "AL2_x86_64"
disk_size = 60
instance_types = ["m6i.xlarge"]
node_group_name = "worker"
version = var.kubenetes_version
scaling_config {
desired_size = 2
max_size = 4
min_size = 1
}
update_config {
max_unavailable = 2
}
# Ensure that IAM Role permissions are created before and deleted after EKS Node Group handling.
# Otherwise, EKS will not be able to properly delete EC2 Instances and Elastic Network Interfaces.
depends_on = [
aws_iam_role_policy_attachment.EKS-AmazonEKSWorkerNodePolicy,
aws_iam_role_policy_attachment.EKS-AmazonEKS_CNI_Policy,
aws_iam_role_policy_attachment.EKS-AmazonEC2ContainerRegistryReadOnly,
]
}
#EKS can't directly set the "Name" tag, so we use the autoscaling_group_tag resource.
resource "aws_autoscaling_group_tag" "nodes_group" {
for_each = toset(
[for asg in flatten(
[for resources in aws_eks_node_group.nodes_group.resources : resources.autoscaling_groups]
) : asg.name]
)
autoscaling_group_name = each.value
tag {
key = "Name"
value = "eks_node_group"
propagate_at_launch = true
}
}
In all honesty, Terraform should at least explicitly state in the documentation that their Tag doesn't work for setting the "Name" tag, as that is a key tag that lots of companies use to organize instances and manage billing. Personally I think not having the tags parameter override the "Name" tag is a bug. But... but I'd at least settle for better documentation that describes this work around.
Can you at least update the documentation so other people don't have to waste as much time on this?
Pretty please?
Guys, the resource responsible for EC2 tags is the "Resource Tags" of the Launch Template. I have the same problem and i have verified that its not possible to add tags in the Launch Template automatically generated by Terraform. To create the Tags for this resource, we need to provision our own Launch Template.
However, I managed to at least Tag EC2 with the "Default tags" from my repository. Here is the code used:
data "aws_default_tags" "default_tags" {}
# Add 1 Tag for the 1 or more node groups
resource "aws_autoscaling_group_tag" "tag_aws_node_termination_handler" {
for_each = toset([
aws_eks_node_group.node_group_name1.resources[0].autoscaling_groups[0].name,
aws_eks_node_group.node_group_name2.resources[0].autoscaling_groups[0].name
])
autoscaling_group_name = each.value
tag {
key = "aws-node-termination-handler/managed"
value = "PropagateAtLaunch=true"
propagate_at_launch = true
}
}
# Add tags for 1 node group
resource "aws_autoscaling_group_tag" "default_tags_asg_high" {
count = length(keys(data.aws_default_tags.default_tags.tags))
autoscaling_group_name = aws_eks_node_group.node_group_name.resources[0].autoscaling_groups[0].name
tag {
key = keys(data.aws_default_tags.default_tags.tags)[count.index]
value = values(data.aws_default_tags.default_tags.tags)[count.index]
propagate_at_launch = true
}
}
I just came here to say this issue has been open for 900 days and is the 3rd most ๐'d in the project.
Just found this, again in conversation with our AWS AM. kind of a dup of #724
As @SlevinWasAlreadyTaken mentions now available in eksctl thanks to his hard work in eksctl-io/eksctl#5002
Some bash to workaround #724 (comment)
Some terraform in that comment chain too.
Edit: I don't work for AWS, complain to your account manager. I wholeheartedly agree this is ridiculous.
Hey guys.
Here at YData we built a solution to solve this issue till the official release. We created an AWS lambda that can tag the EKS Node groups ASG with common tags or specific tags per node group using CloudFormation.
This can be used to add tags for labels and taints (cluster-autoscaler) or any other tags that can help for check tracking cost, resource ownership, ...
Fell free to test and give us your review.
https://github.com/ydataai/aws-asg-tags-lambda
You can use it directly in the template as:
ASGTagLambdaFunction:
Type: AWS::Lambda::Function
Properties:
Role: !GetAtt ASGTagLambdaExecutionRole.Arn
PackageType: Image
Code:
ImageUri: !Ref EcrImageUri
Architectures:
- x86_64
MemorySize: 1024
Timeout: 300
ASGTagLambdaInvoke:
Type: AWS::CloudFormation::CustomResource
DependsOn: ASGTagLambdaFunction
Version: "1.0"
Properties:
ServiceToken: !GetAtt ASGTagLambdaFunction.Arn
StackID: !Ref AWS::StackId
AccountID: !Ref AWS::AccountId
Region: !Ref AWS::Region
ClusterName: "the EKS cluster name" #!Ref EKSCluster
CommonTags:
- Name: "ENVIRONMENT"
Value: "dev"
PropagateAtLaunch: true
NodePools:
- Name: "system-nodepool" #!GetAtt YourNodeGroup.NodegroupName
Tags:
- Name: 'k8s.io/cluster-autoscaler/node-template/taint/TAINT'
Value: 'NoSchedule'
PropagateAtLaunch: true
- Name: 'k8s.io/cluster-autoscaler/node-template/label/LABEL'
Value: 'LABEL_VALUE'
PropagateAtLaunch: true
- Name: "another-pool"
Here at YData we built a solution to solve this issue till the official release.
Each of us already has such a solution.
We do not want to do that in the future, so we are requesting official support for such a solution.
AWS has always had AutoscalingGroups, and has always had resource tags. We just want to take advantage of it. AWS don't need to develop anything additional.
Why can't managed services take advantage of these features?
Am I making a strange request?
I wanted to share a recent launch from the EKS team that might be of interest to folks following this issue. Earlier this week we released Cluster-level Cost Allocation Tagging:
With this launch, all EC2 instances which join an EKS cluster are automatically tagged with an AWS-generated cost allocation tag [containing the EKS cluster name]. Any EC2 instance used in an EKS cluster will be tagged automatically without any additional action required, regardless of whether they are provisioned using EKS managed node groups, Karpenter, or directly via EC2. This tag can be used to allocate EC2 costs to individual EKS clusters through AWS Billing and Cost Management tools...
While this feature won't help to propagate customer-defined tags down to the EC2 instances in an EKS cluster, for those of you who are looking for better cost allocation across multiple EKS clusters, this feature will reduce the work required.
While this feature won't help to propagate customer-defined tags down to the EC2 instances in an EKS cluster, for those of you who are looking for better cost allocation across multiple EKS clusters, this feature will reduce the work required.
Thanks for sharing this update, I am sure some will find it useful. I don't want to shoot the messenger and I know you are just trying to help but this really is attacking the problem from the wrong end is it not? ๐คทโโ๏ธ
EC2 instances that are not a part of a managed node group are already easily taggable with customer-defined tags and managed node groups already had this tag that would be passed through for cost-allocation. From my perspective this change does little to reduce work required if you want to use managed node groups.
I only know this because of having to write code for a previous employer to process this tag to retrieve the customer defined tag values so costs could be allocated in the same way as everything else. Luckily we had well structured cluster names that included the values required. However, it is brittle and the processing broke a few times when cluster name structure was changed for operational reasons (eg to add blue/green support to our automation for EKS upgrades).
Please consider following through on this issue (getting close to 3 years now). We always get told by our account managers, TAMs and SAs to use tags, it would be nice if tagging actually worked for cost allocation for all resources, EKS or otherwise. Thanks.
Can someone please help us understand why this is not getting any traction with this much attention? It appears to be still in the "researching" phase. Just ran into this when trying to scaling from 0 on managed nodes groups using the terraform eks module.
Honestly you should all move to Karpenter.
Another vote for karpenter
Honestly you should all move to Karpenter.
Honestly that has to be one of the least helpful suggestions I have seen in while. We are talking about Managed Nodes which are not going to magically get tagged if you install Karpenter in the cluster and start launching nodes (you also need running nodes for Karpenter's controller to run on, bit of chicken and egg problem). Karpenter is a nice tool for sure but it is not a solution to this issue.
you also need running nodes for Karpenter's controller to run on
Customers can run Karpenter on Fargate (managed compute). This helps eliminate the bootstrapping problem. However, resource tagging is not yet available for Fargate on EKS. If Karpenter is the only thing running on Fargate, this might be acceptable for cost-allocation purposes.
Karpenter has some nice features but is a bit more complex for probably most people's needs since it geared for clusters with larger workloads that can benefit from more advanced scheduling and continuous resource optimization. One thing I don't like is Karpenter's requirement to know specific node group info vs just using static tags with the cluster name. Thanks for the suggestion but for those that want to continue to use CA it would be nice to see some tags on the autoscaling groups to solve this.
Going off topic here, but curious about this
One thing I don't like is Karpenter's requirement to know specific node group info vs just using static tags with the cluster name.
What do you mean?
Karpenter doesn't care about anything other than pending pods to be allocated.
Or you can go overboard and create very scoped provisioners per team or deployment label
Going off topic here, but curious about this
One thing I don't like is Karpenter's requirement to know specific node group info vs just using static tags with the cluster name.
What do you mean? Karpenter doesn't care about anything other than pending pods to be allocated. Or you can go overboard and create very scoped provisioners per team or deployment label
I might be wrong but was just looking at their install documentation and noticed they wanted "--set aws.defaultInstanceProfile=KarpenterNodeInstanceProfile-${CLUSTER_NAME} " in the helm chart but guessing maybe there is a way to just use irsa instead. I also see there is an application of a manifest after you run a helm chart which is a bit odd but these are just installation related issues, and off topic as you said.
@flowinh2o I think aws.defaultInstanceProfile
is the instance profile for the nodes being created so they can join the cluster.
Running dedicated node groups for specific workloads requires a couple of cluster-autoscaler tags for properly scaling node groups based on labels and taints.
Exactly our problem also. Details here.
This looks like such a must-have feature thats super simple to implement if you allow to just propagate node_group tags to asg...
Any updates on this? Also needed here
Also having a problem with this - I wouldn't expect in a service called Managed Node Groups that I would have to work around tag propagation issues. It seems vendors such as weaveworks have implemented their own workarounds, sadly there is no such workaround in terraform.
This issue is pretty fundamental - would like it fixing please :)
It seems vendors such as weaveworks have implemented their own workarounds, sadly there is no such workaround in terraform.
@RogerWatkins-Anaplan it's easy enough to do this with Terraform and I think there are links in some of the comments above on how to do it. That said you wouldn't expect to need to do this for a first party vendor solution.
Are the guys working on this? This is a must-have for us to implement the project