jbuehl/solaredge

Invalid/incomplete JSON

Closed this issue · 8 comments

First, a big "thank you" for all of the work done to get this project to where it is. As far as monitoring my own SolarEdge install, I'm sitting on the shoulders of giants because of you.

Background: My SolarEdge site has three inverters, each with three strings of solar panels and optimizers (124 total between the three SE10000 inverters). Because the inverters have revenue grade meters built into them, the installers report they could not chain them together via the RS-485 bus and, instead, each has its own Ethernet connection to a dedicated subnet. There is a Linux-based router connecting that subnet to the Internet and that's where I have tcpdump capturing all tcp traffic to/from the solar energy subnet.

I captured the three key files via RS-232 and process the pcap with seextract.py before feeding it to semonitor.py three times, once per inverter with the appropriate key file as an argument each time.

My problems may be related (hence one Github issue) or might be two independent issues.

  1. I have incomplete data in the JSON files created by semonitor.py. Looking at the pcap file in Wireshark, I see data regularly since midnight. And I see updates every 15 minutes on the SolarEdge monitoring portal. But the data in the JSON files is irregular at best.

  2. The JSON created by semonitor.py appears to be invalid, according to both the Perl JSON modules and to Firefox's built-in ability to display JSON data. It looks like semonitor.py is producing each line in a JSON-ish format but the file itself is not standard JSON. To fix this, I have to

  • Add quote marks around NaN values
  • Add a comma at the end of each line (except the last)
  • Add a [ at the beginning of the file and a ] at the end of the file.

3JSONs+pcap.zip

It is normal for rate of the data to be irregular. There are messages output by the inverters on a regular basis, but they don't all contain data that goes to the JSON file. The inverter only outputs data when there is something to report, so you will see more activity when the sun is shining and almost none at night.

The JSON output is created with the python JSON library. The file consists of a series of JSON strings separated by a newlines (\n). It was never intended that the entire output file should be a single valid JSON string, which is what would would require the extra commas and brackets that you refer to in your second and third points.

Apparently NaN is not part of the JSON spec, but is part of Javascript. The python JSON library implements that extension and will encode that value unless it is explicitly told not to. NaN without the quotes should be valid if it is being consumed by Javascript.

Thanks @jbuehl

I think "regular" was probably a bad choice of words on my part. Processing today's pcap gives me 454 lines for one inverter, 175 lines for the second and only 8 lines for the last. All three are fed from panels on the same roof and produced power for approx. 14 hours today according to the SolarEdge portal.

For that third inverter, which has data for every 15 minutes for all 14 hours on the monitoring portal and produced the most power of the three inverters today, I only get JSON strings for 00:49:52, 00:54:52, 00:59:52, 01:04:51, 01:09:51, 01:14:51, 21:46:47, and 22:04:54. (I grabbed the PCAP at 23:00 so maybe there would be a few more night readings in there if I let it go.)

So, while it might not be "regular," I would expect a more even distribution of data points among the three inverters and more than 8 usable messages over the course of 23 hours. Any thoughts on what I should look at to troubleshoot this?

As far as the JSON output... Hmm. Interesting that the Python folks would choose the default to be JavaScript over per-the-spec JSON output. Thanks for the info. I'll experiment with using simplejson instead of the json library and the ignore_nan=True flag which reports null instead of Nan.

BTW, I did check the pcap in Wireshark. All three inverters are talking to SolarEdge at 217.68.149.103.

What is the command you are using to process the PCAP file? You should be piping the data through seextract.py before semonitor.py. That can cause data to be lost. Also, include the -d and -v options on semonitor.py to show any error messages.

python seextract.py yyyymmdd.pcap | python semonitor.py -d stdout -vv -o yyyymmdd.json

I've been using tcpdump -C 5 -i eth2 -U tcp -w YYYYMMDD.pcap to do the capture and then processing it with the following shell script.

#!/bin/sh

PCAP=$1

BINDIR=/home/apu/solaredge
KEYDIR=/home/apu/keys
OUTDIR=/home/apu

${BINDIR}/seextract.py -o ${OUTDIR}/out.dat ${PCAP}

for INVERTER in 7d112e73 7d112e7e 7d112e84
do
	echo ${INVERTER}
	${BINDIR}/semonitor.py -k ${KEYDIR}/${INVERTER}.key \
		-d ${OUTDIR}/${INVERTER}.log -vv \
		-o ${OUTDIR}/${INVERTER}.json ${OUTDIR}/out.dat 
done

I'm not seeing any obvious differences in the log files from one inverter to the next despite the significant differences in the quantity of JSON output.

JSON & log files: Edit: Files deleted.

P.S. As expected, there does not appear to be any difference between seextract.py writing to a temp file and then processing that temp file three times, vs. running seextract.py three times and piping the result directly to semonitor.py each time. (Besides the obvious IO error when the pipe runs out of data.) I ran it both ways just to check.

Got it!

The pipe was actually the clue even if it didn't make a difference in my previous workflow (above). I thought seextract.py was extracting all SolarEdge data from the PCAP. I didn't realize that it only expects to be reading data about one inverter at a time.

If I pre-process YYYYMMDD.pcap using tcpdump -r YYYYMMDD.pcap -w inverter1.pcap "host 1.1.1.1" three times (once per inverter) and then run seextract/semonitor on those three PCAP files, I get 603 lines for the first inverter, 591 for the second and 605 for the third.

Yes, seextract.py assumes that there is only one inverter that is sending data to SolarEdge. It would have created one TCP stream that contained messages to and from all three inverters which semonitor.py couldn't make sense of.

I have an installation with two inverters, but they are connected together with RS-485 and only one of them communicates with the server and sends the data for both inverters.

Thanks, @jbuehl. I'm closing this issue because it was ultimately my misunderstanding in what seextract.py expected as input and not any bug in the code. That said, since my system is designed differently than yours, if there is data I can collect to help further the project, please let me know.