ros-infrastructure/ros_buildfarm_config

update sync limits

dirk-thomas opened this issue · 12 comments

@wjwwood Jade needs a significantly higher sync limits:

Why does it need to be updated? What should it be updated to? Is there any documentation which tells me these things?

The configuration options are documented here: https://github.com/ros-infrastructure/ros_buildfarm/blob/master/doc/configuration_options.rst

Currently Jade contain almost 1000 released packages. Even if a very big portion of them fail to build successfully the packages are automatically being synced to the testing repo (if at least 140 Debian packages are ok). The sync limit prevents to break the state of the testing repo in that case and only allows the automatic sync to happen if "enough" packages are built successfully. The limit needs to be raised over time when more packages are being released.

I guess @tfoote aimed to address this with #11 but didn't follow up with cherry-picking the increased thresholds to the production branch.

Thanks, I know what the config is for, but I don't know where to change it (not the file location which you linked to, but which branch and what needs to be done to make the farm take it). For example, I don't know the difference in the production branch and master branch. I'm assuming that I need put it to the production branch but I wouldn't have guessed that. That's the kind of documentation I need if I'm to be expected to update it as the ROS boss for Jade.

So, I'm still left wondering, what should it be changed to? (700?) Should this just be a percentage and not a fixed number? It seems destine to fail if we have to just remember to check it periodically and there are no notifications or naturally scaling settings.

The problem of missing documentation describing the branches is ticketed in #23 and will hopefully be addressed soon.

Imo a number as a limit and/or a list of crucial packages is sufficient. The amount of work to revisit this threshold after a few month as a maintainer of a ROS distribution seems tiny compared to other regular tasks. If you think it is worth automating this or changing the configuration to allow percentages please feel free to add support for either of that.

It's still not clear to me: What should it be changed to (I suggested 700) and how do I change it?

I understand that we've ticketed the need for documentation, but if we're going to skip that documentation, and you then ask me to do it, you need to tell me how. It's not efficient for me to reverse engineer the setup and figure it out on my own.

It's not about the level of effort, I'm sure that's small, but my point is that if we just have to remember to do it, then realistically it's likely to be missed. We either need a notification or a note in the ROS kick-off instructions to set a reminder (calendar or something) or a setting that scales better. I can setup a reminder for myself to check it, but I need to know how often I should do so, what I need to check (what is a good way to determine if it needs to be changed/why should it be changed), and how to change it. If you can answer some of those things then I can help you guys rather than delegating the task.

It's the level at which the buildfarm will not sync.This means that if less than that number of packages successfully build it will not sync to testing as something went wrong with a release or the buildfarm had a major failure. This is to keep the testing repository from getting wiped out if there's a bad release of a low level package. We had a few instances. And it breaks anyone who is using the testing repository until the problem is fixed or reverted and a full rebuild is completed.

Typically I try maintain it at about 90% of the successfully building packages. Which means that without changing it we know that the number of packages in testing will not drop below that level. It's not really a kickoff checklist item but an ongoing maintenance item. As new packages get released this number should ratchet upward.

I suggest bringing over the number for #11. Note that this caught a regression on Saucy on the test farm with the higher limits. http://54.183.26.131:8080/job/Irel_sync-packages-to-testing_saucy_amd64/ all of the ecl packages are failing to build due to using a too high version of cmake.

Guys, I understand what it's for (that's actually documented in the REP). All I wanted is a suggestion on what it should be now and a bullet list of what to do in order to change it. @tfoote suggested taking from #11 so I'll probably do that, and he also explained how he arrives at that number, which is very useful. So, thanks. As for the instructions, I haven't heard anything, and I want it written here so we can at least point people here until we have documentation. So, let me try guessing and you all correct me:

  • Open a PR against the production branch and get it merged
  • Do something to the production farm so it takes the config changes (I have no idea about this)
  • Check the settings took affect (I also have no idea about this)
  • ???

@wjwwood Sorry I was trying to catch up and missed your request for help on the process. You just have to change the config file.

These numbers get baked into the sync jobs: such as http://build.ros.org:8080/job/Irel_sync-packages-to-testing_saucy_amd64/ which gets updated automatically nightly. like here 23:17:19 Skipped job 'Irel_sync-packages-to-testing_trusty_amd64' because the config is the same (Though it skipped changing it due to the same config)

Perfect, I created #27 and if you guys could review/merge it I'll check on the farm that it took.

What is the status on this ticket? Can it be closed?

I'd say so since #28 and #27 were merged?

lgtm