Get Inferences Question

Question

Get Inferences Question

brad302 opened this issue 5 years ago · 10 comments

Ok, I've tried to bundle my questions, hopefully I don't think of any others later and again, I do appreciate you taking the time to help me.

So call me crazy but in order to get a better handle on how the components of the camera hang together, I've decided to rebuild the MS Python SDK into a C# version. So far it's been relatively straight forward, the postman collection and the Python SDK outline the payloads to work with for configuring the camera and so that wasn't a big stretch.

The one thing that I've been unable to crack at this point is the inferences component. It's also more difficult because the code itself needs to run directly on the device, I can't test here on my development machine. So my questions are:

I see that the inferences are being obtained from the gstreamer module (gst-launch-1.0). I had a look online at this but it's all a bit over my head and probably something I don't need to know a whole heap about, I just need to know how the MS SDK invokes it. My question is though, is that a core component that is shipped with the camera because as far as I could see, it's not delivered as a part of the getting started module. If I do a search though, the gst-launch-1.0 file is in a docker container.
Based on the command in the SDK, I thought I'd try and run the gst-launch-1.0 command directly using ADB on the device but it doesn't accept it. I'm no linux expert. Is it possible for me to obtain output using ADB for gst-launch-1.0? If so, how do I get that to work? I also tried through SSH but I don't have root access to the device so I'm a bit stuck in that regard.
Is there any reason why the get inferences method not a REST API like the other end points? Who created the REST endpoints for the camera, was that MS or the hardware manufacturer? I ask because if it were a REST endpoint, I'd just follow the same approach as the other methods, easy!
I had a look for the SDK associated with the camera REST API's but nothing has been forthcoming. Is that a Qualcomm SDK? If so, do you happen to have a link?!?

If I can get over that hurdle, I reckon I'm on the home stretch.

Keen to get your thoughts.

Answer 1 · 2020-01-23T08:01:27.000Z

@Devinwong Can you kindly suggest

Answer 2 · 2020-01-23T08:30:21.000Z

@PuneetRahejaMS @Devinwong I actually spoke to @initmahesh over another medium and he was going to pick this up and reply but of course I’m happy for anyone to answer.

Answer 3 · 2020-01-27T20:52:15.000Z

Hi @brad302 what was @initmahesh response to this?

Answer 4 · 2020-01-27T21:08:05.000Z

No reply as of yet @jkubicka. He did ask me to raise it as an issue though because he said the answers will benefit a lot of others. He was going to try and get to the answer Friday but obviously never got around to it. He also said you guys did ask for a REST API for the inferences but you were given a system command instead. So yeah, definitely keen to understand a bit more.

Answer 5 · 2020-01-27T22:22:59.000Z

you understanding is correct you need gstreamer to decode and we only have python sample for this :(.
Now to do the part where you want to decode the results in C# you need to write the logic in qualcomm SDK in python to c# TO LEARN THIS HERE ARE STEPS

Open QCM SDK documentaion here
https://github.com/microsoft/vision-ai-developer-kit/blob/master/camera-sdk/sdk_api_docs/index.html
then try this sample
use getting started to install gstreamer
and then run basic tutorial on PC using camera ip address in IP to get stream decoded from gstreamer.

Now you have to use the logic below in turotial 4 in C# basically

thsi is api
get_inferences()
Inference generator for the application.

This inference generator gives inferences from the VA metadata stream.

Yields
AiCameraInference (AiCameraInference class object) – This AiCameraInference object yielded from VideoInferenceIterator.start()

Raises
EOFError – If the preview is not started. Or if the vam is not started.

sample code is here in python

import argparse
import sys
import socket
import time

from sdk.camera import CameraClient

def getWlanIp():
#if(os.name == "nt") :
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
try:
# doesn't even have to be reachable
s.connect(('10.255.255.255', 1))
IP = s.getsockname()[0]
if IP.split('.')[0] == '172':
print("Ip address detected is :: " + IP )
IP = '127.0.0.1'
print("Ip address changed to :: " + IP + "to avoid docker interface")
print("Ip address detected is :: " + IP )

except:
    IP = '127.0.0.1'
finally:
    s.close()
return IP

def main(protocol=None):
print("\nPython %s\n" % sys.version)
parser = argparse.ArgumentParser()
parser.add_argument('--ip', help='ip address of the camera', default='127.0.0.1')
parser.add_argument('--username', help='username of the camera', default='admin')
parser.add_argument('--password', help='password of the camera', default='admin')
args = parser.parse_args()
ip_addr = args.ip
username = args.username
password = args.password

with CameraClient.connect(ip_address=ip_addr, username=username, password=password) as camera_client:

    print(camera_client.configure_preview(resolution="1080P", display_out=1))

    camera_client.set_preview_state("on")

    print(camera_client.preview_url)

    camera_client.set_analytics_state("on")

    print(camera_client.vam_url)

    camera_client.configure_overlay("inference")

    camera_client.set_overlay_state("on")

    try:
        with camera_client.get_inferences() as results:
            print_inferences(results)
    except:
        print("Stopping")

def print_inferences(results=None):
print("")
for result in results:
if result is not None and result.objects is not None and len(result.objects):
timestamp = result.timestamp
if timestamp:
print("timestamp={}".format(timestamp))
else:
print("timestamp= " + "None")
for object in result.objects:
id = object.id
print("id={}".format(id))
label = object.label
print("label={}".format(label))
confidence = object.confidence
print("confidence={}".format(confidence))
x = object.position.x
y = object.position.y
w = object.position.width
h = object.position.height
print("Position(x,y,w,h)=({},{},{},{})".format(x, y, w, h))
print("")
else:
print("No results")

if name == 'main':
main()

Answer 6 · 2020-01-28T00:03:59.000Z

@initmahesh, thanks man.

Let me just stew on that for a bit, the one thing I overlooked was the installation instructions for gstreamer. I've looked through the SDK doco but managed to miss that somehow.

As for the sample programs, yep, had already been all over that, it helped me get my C# REST API calls going.

Answer 7 · 2020-01-28T00:19:12.000Z

@initmahesh you have no idea how exciting it was to see this ...! I'm away!

Answer 8 · 2020-01-28T00:45:27.000Z

I am surprised i thought you will need a remote session ! GREAT WORK!!!

Answer 9 · 2020-01-28T00:56:05.000Z

I am surprised i thought you will need a remote session ! GREAT WORK!!!

@initmahesh yeah, the above is, it's from my workstation and the camera is sitting next to me disconnected but on the same network ... if that's what you class as being a "remote session" anyway. I have to say though, the output is bloody disgusting but so be it, I can work with it.

Thanks again for the kickstart. I'd realised I was so close to answering my own questions but sometimes you just need another person to give you something that gets you going again.

Answer 10 · 2020-01-29T04:29:59.000Z

closing this issue as it's resolved.