tmobile/monarch

Failed retrieving actual LRP grups

Opened this issue · 7 comments

I am getting these issues when i am running the exp.json file for blocking the traffic to my app.
" Failed retrieving actual LRP grups from diego_cell/xxxxxx "
"App discovery failed because no application instances could be found!"

Could you ssh into a diego cell, and run the command cfdot actual-lrp-groups. My suspicion is that cfdot is not installed.

Hi Eadword,

I tried executing the command in diego cell after doing ssh. It worked for me and I can see the output.

In the output, i can see few connection refused logs AND NOT SURE IF THIS IS RELATED TO MONARCH.

Thanks,
Biswa

If cfdot is outputting a non-zero exitcode, it will fail... Try running with the lrp-group-fix branch. When I wrote in the failure case, it was because I had issues with Auth so I wanted to make sure it was clear that it failed. Had not run into a case where it may partially fail but still be okay.

Depending on the nature of these errors, it may mean you only find out about some of the locations an app is hosted which will cause some experiments to fail. So if this "fixes" the issue, you will want to make sure that if you expect it to find 20 app instances, it finds 20 app instances.

Hi Eadword,

When i was running the master branch i was getting the issue as:
"Failed retrieving actual LRP groups from diego_cell/xxxz"
"No Application instances found for Appxxxxx"
"failed: chaoslib.exceptions.ActivityFailed: Error discovering app!"

Now when i tried it in lrp-group-fix branch i got the following warning/issues:
" May have failed retrieving actual LRP groups from diego_cell/xxxz"
"failed: TypeError: 'NoneType' object is not iterable"

Though it discovered the App i was not getting the issues as application instance not found..

Thanks,
Biswojeet

I am going to need some more information to debug this.

  1. Can you send the output you are seeing when you run cfdot actual-lrp-groups, if there is any information you want to keep private, please just change it inline with something that is of the same form. You can also truncate it after an example or two (including an error message). I just want to see what these errors look like and make sure your version of cfdot is not producing a different object structure.

  2. The cfdot version you have installed in the diego cells.

  3. A line number/full stack trace of what you saw on the lrp-group-fix branch. If using chaostoolkit did not provide it, run the following (replacing values as appropriate) in the python shell after logging in to boshcli and cfcli.

from monarch.pcf.app import App
a = App.discover("myorg", "myspce", "myappname")

Some additional information: It found the app because that part just scans cf using the cfcli for apps, but it cannot find the instances because it is trying to scan through the output of cfdot to find where they are actually hosted. This is all part of the discovery step, and is not lazy; i.e. even if you don't need all the information for the experiment you are running, they are all going to be discovered.

@biswajeet619 In addition to above questions, it would be good if we can have the version of PCF (Pivotal Cloud Foundry) you are testing this on.

Hi @karunchennuri ,

The AppsMan version is 2.6.6 and OpsMan version is 2.6.11-build.210

Thanks,
Biswojeet