andy-shev/linux

boot failure on SparkFun blocks (base or console)

ecc1 opened this issue · 37 comments

ecc1 commented

I cloned https://github.com/andy-shev/linux and checked out the eds branch.
The tree is at e72aad722abd0a639a3ef4c42d8e59f4adc50d9e

On my desktop (Debian amd64, running stretch), I did:
$ make i386_defconfig
$ make ARCH=i386 -j4 bzImage
I copied bzImage to the /boot partition (/dev/mmcblk0p7) on my Edison.
I rebooted and entered U-Boot. I changed the bootargs as shown in the attached file and booted bzImage. Console output from the boot is attached, as is my kernel .config file.

Thanks for your time in looking at this.

uboot-settings.txt
console-output.txt
kernel-config.txt

The very latest i386 (eds branch) works for us. I have just compiled it and here is the paste log...wireless and external mmc should be working with patches from Andy...I am going to test them as well

http://pastebin.com/YRuXwXD1

ecc1 commented

On Thu, Aug 25, 2016 at 10:12:46AM -0700, Rahul Atlury wrote:

The very latest i386 works for us. I have just compiled it and here is the paste
log...wireless and external mmc should be working with patches from Andy...I am
going to test it...

http://pastebin.com/YRuXwXD1

Can you tell me which tree you're using (sha1 of HEAD) and post a copy
of your kernel .config? I'd like to try to reproduce your success
exactly.

Eric Cooper e c c @ c m u . e d u

Eric

Please use the following 7z file (please remove the pdf extension). Contains the Andy's patches + config.
i386-config-edison-mainline-with-p-andy.7z.pdf

Kindly just checkout the latest eds branch.....i will check sha1 later for you...

ecc1 commented

I've now tried those patches, with your exact .config and with the (patched) i386_defconfig, with gcc-6 and gcc-5, and all still result in error messages like this:

[ 3.779816] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0xe642f98297, max_idle_ns: 881590439301 ns
[ 3.999823] i2c-designware-pci 0000:00:08.0: Can't set power state D0: -16
[ 4.006813] clocksource: Switched to clocksource tsc
[ 4.009310] i2c-designware-pci 0000:00:08.0: Unknown Synopsys component type: 0xffffffff
...
[ 9.391730] sdhci-pci 0000:00:01.0: SDHCI controller found [8086:1190](rev 1)
[ 10.271649] sdhci-pci 0000:00:01.0: Can't set power state D0: -16
[ 10.379438] mmc0: Reset 0x1 never completed.
[ 10.383762] sdhci: =========== REGISTER DUMP (mmc0)===========
[ 10.389651] sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff
[ 10.395534] sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff
[ 10.401417] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff
[ 10.407308] sdhci: Present: 0xffffffff | Host ctl: 0x000000ff
[ 10.413193] sdhci: Power: 0x000000ff | Blk gap: 0x000000ff
[ 10.419074] sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff
[ 10.424953] sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff
[ 10.430834] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff
[ 10.436716] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff
[ 10.442597] sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff
[ 10.448478] sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff
[ 10.454358] sdhci: Host ctl2: 0x0000ffff
[ 10.458325] sdhci: ===========================================
[ 10.464210] mmc0: Unknown controller version (255). You may experience problems.
[ 10.471737] mmc0: Invalid maximum block size, assuming 512 bytes
[ 10.577872] mmc0: Reset 0x1 never completed.

ecc1 commented

I'm probably doing something stupid; would you mind pasting the commands you used to configure and compile the kernel?

Yes before that, can you please burn and test this release
https://github.com/atlury/Intel-Edison-OS-Images/releases/download/AlpineLinux-3.4.3-Mainline-Kernel/Alpine-Edison-Distro-Mainline-4.8-RC3.7z

  1. Extract
  2. run flashall.bat --recovery first and then plug in the board
  3. immediately after that run flashall.bat (without disconnecting or re-plugging the board)

Meanwhile I will prep a guide for you.

To disable the watch dog timer please run a script something like
while [ 1 ] ; do sleep 1; echo V > /dev/watchdog; echo V > /dev/watchdog0; done

the login is root and no password

ecc1 commented

I'd rather not reflash if I can avoid it but if I hit a complete brick wall I will. I notice a newer U-Boot in your boot log; is there anything that could be doing differently?

My system also disables watchdog in /etc/rc.local, but of course it's not getting that far with this kernel.

Yes my apologies, I re-uploaded...please share you hangouts...i will help you compile...We use a customized version of u-boot suited for that particular rootfs....you can then share a tutorial here...

ecc1 commented

I tried the kernel from your Alpine image and got the same errors. Finally I realized that all of these tests had been with my Edison module on a SparkFun base block. I moved the Edison module to an Intel mini-breakout board and it works fine.

I have no idea what difference the base board should make. Andy, please let me know if there's anything you'd like me to try to pinpoint and debug the difference in behavior.

Rahul, thanks very much for your time and patience.

Thats strange. Even i am using the following sparkfun boards only.

Base block with microsd card block.

ecc1 commented

I have a Pi Block with an SPI device connected; I'll try disconnecting that to see if that's affecting the boot.

ecc1 commented

So it also boots correctly on just the Sparkfun base block. But when also connected to a Sparkfun Pi Block, wired to an SPI device, it always produces the i2c and other errors in my original boot log.

ecc1 commented

Another data point: the kernel also boots OK on the mini-breakout board wired to the same SPI device in the same way (SPI lines plus a couple of GPIOs for interrupts).

Hi Eric
my priority is get the wifi (i have a new build to test it) and get external sd-card working. Post that, we can check the spi issue....

htot commented

@ecc1 ,

I'm on Linux edison 4.8.0-rc6+ and have approx the same console-output as your first message, but my kernel boots.

I'm interested if you have been able to connect anything to the usb. In my case not:
root@edison:~# lsusb -t
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 5000M
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci-hcd/1p, 480M

I have a smsc95xx connected (which is actually a 4-port hub with ethernet port), but nothing happens (kernel module is not loaded, when manually modprobed still doesn't work).

I noted that the original kernel (edison 3.10.14) loads another driver (dwc3-host instead of xhci-hcd), and am wondering if that might be the cause.

root@edison:~# lsusb -t
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=dwc3-host/1p, 5000M
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=dwc3-host/1p, 480M
|__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/5p, 480M
|__ Port 1: Dev 3, If 0, Class=Vendor Specific Class, Driver=smsc95xx, 480M
|__ Port 3: Dev 4, If 0, Class=Vendor Specific Class, Driver=ftdi_sio, 12M
|__ Port 3: Dev 4, If 1, Class=Vendor Specific Class, Driver=ftdi_sio, 12M

My Edison is on a Intel Edison Arduino kit.

ecc1 commented

On Sun, Sep 18, 2016 at 09:21:32AM -0700, Ferry wrote:

I'm on Linux edison 4.8.0-rc6+ and have approx the same console-output as your
first message, but my kernel boots.

I'm interested if you have been able to connect anything to the usb.

I haven't tried. The new kernel isn't useful for me until it handles
both WiFi and Bluetooth reliably, so I'm just waiting on the sidelines
until that seems close.

My Edison is on a Intel Edison Arduino kit.

I have only mini-breakout and Sparkfun boards.

htot commented

That's why I want to get eth working using the smsc95xx. But it seems now I need to get usb working first?

Guys, I will check what I can do as soon as I have time slot for this.

@htot can you check if the switch is set to correct side? USB Host & peripheral ports are mutually exclusive, so just wondering if you have the switch on the right place or not.

@htot lspci shows dwc3-pci as a driver in my case and I can see devices connected to a host. Also, please check what Felipe suggested.

htot commented

lsusb commands above were done with the switch in the same position.

I'll rebuild the kernel and collect the relevant boot log from journalctl then check back here

@ecc1 Do I get correctly that you connected Base board and PI board together to Edison?

ecc1 commented

On Tue, Oct 25, 2016 at 05:01:43AM -0700, Andy Shevchenko wrote:

@ecc1 Do I get correctly that you connected Base board and PI board together to
Edison?

Yes, I was using a Sparkfun base board stacked with the Pi block for Edison.
The SPI lines, Vcc, GND, and a couple of GPIO lines were connected to a radio
module.

Looking into schematics I may conclude that they are mutual exclusive, you have to use either one or the other.

ecc1 commented

No, they are intended to be "stacked" one on the other. Using the base block gives you (for example) console and OTG ports that are not available on the Pi Block. And I am successfully using such a stack with the 3.10 kernel that Intel distributed with the device.

I see. Unfortunately I'm lack of ideas, only one is left: test power consumption and actual current per rail in both cases (old / new kernel). It might be that you need an additional power source. Since I have no such boards, I can't you help anyhow. And please clarify the subject a) base block. b) PI block, or c) both attached option to reproduce the issue.

@ecc1, I saw once today similar issue (with power failure for devices) when I at some point of Yocto shutdown switched USB ID switcher to the host mode from device (cables - USB uart, USB device, power - left untouched). All consequential boots failed the same way you described. What I did to fix it: a) unplugged USB device cable, b) unplugged power cable, c) turned switch couple of times and left it in host mode, d) attached cables back. So, seems like some voltage regulator on the board might go to condition where one of the power lines is not set properly.

ecc1 commented

On Fri, Nov 18, 2016 at 10:32:50AM -0800, Andy Shevchenko wrote:

@ecc1, I saw once today similar issue (with power failure for devices) when I at
some point of Yocto shutdown switched USB ID switcher to the host mode from
device (cables - USB uart, USB device, power - left untouched). All
consequential boots failed the same way you described. What I did to fix it: a)
unplugged USB device cable, b) unplugged power cable, c) turned switch couple of
times and left it in host mode, d) attached cables back. So, seems like some
voltage regulator on the board might go to condition where one of the power
lines is not set properly.

Interesting. I haven't tried recently, but is there any U-Boot
command that would reset the power system before booting the kernel?
(And neither the Sparkfun or Intel mini-breakout boards have any hardware
switch for USB host mode.)

Eric Cooper e c c @ c m u . e d u

Okay, I got SparkFun console block and I can reproduce the issue! Will look at it as soon as I will have enough time.

If someone, Eric @ecc1?, can confirm that following code fixes the issue (it's a broken patch, one needs apply it manually):

--- a/drivers/watchdog/intel-mid_wdt.c
+++ b/drivers/watchdog/intel-mid_wdt.c
@@ -157,6 +157,7 @@ static int mid_wdt_probe(struct platform_device *pdev)
                return ret;
        }

+intel_scu_ipc_update_register(0x4b, 8, 8);
        dev_info(&pdev->dev, "Intel MID watchdog device probed\n");

        return 0;
ecc1 commented

No, this does not fix the issue (booting your i386_defconfig on a Sparkfun Base Block):

[    2.765801] i2c-designware-pci 0000:00:08.0: Can't set power state D0: -16
[    2.774777] i2c-designware-pci 0000:00:08.0: Unknown Synopsys component type: 0xffffffff
[    3.653669] i2c-designware-pci 0000:00:08.1: Can't set power state D0: -16
[    3.662735] i2c-designware-pci 0000:00:08.1: Unknown Synopsys component type: 0xffffffff
[    4.541685] i2c-designware-pci 0000:00:08.2: Can't set power state D0: -16
[    4.550694] i2c-designware-pci 0000:00:08.2: Unknown Synopsys component type: 0xffffffff
[    5.428969] i2c-designware-pci 0000:00:08.3: Can't set power state D0: -16
[    5.437668] i2c-designware-pci 0000:00:08.3: Unknown Synopsys component type: 0xffffffff
[    6.319730] i2c-designware-pci 0000:00:09.0: Can't set power state D0: -16
[    6.328663] i2c-designware-pci 0000:00:09.0: Unknown Synopsys component type: 0xffffffff
[    7.209869] i2c-designware-pci 0000:00:09.1: Can't set power state D0: -16
[    7.218674] i2c-designware-pci 0000:00:09.1: Unknown Synopsys component type: 0xffffffff
[    8.100348] i2c-designware-pci 0000:00:09.2: Can't set power state D0: -16
[    8.108683] i2c-designware-pci 0000:00:09.2: Unknown Synopsys component type: 0xffffffff

I don't have an initrd set up to load mmc yet, but I assume that would fail -- it did when it was compiled into the kernel, with these errors:

[    8.379836] sdhci: Secure Digital Host Controller Interface driver
[    8.386077] sdhci: Copyright(c) Pierre Ossman
[    8.390552] sdhci-pci 0000:00:01.0: SDHCI controller found [8086:1190] (rev 1)
[    9.270737] sdhci-pci 0000:00:01.0: Can't set power state D0: -16
[    9.378419] mmc0: Reset 0x1 never completed.
[    9.382743] sdhci: =========== REGISTER DUMP (mmc0)===========
[    9.388632] sdhci: Sys addr: 0xffffffff | Version:  0x0000ffff
[    9.394514] sdhci: Blk size: 0x0000ffff | Blk cnt:  0x0000ffff
[    9.400400] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff
[    9.406295] sdhci: Present:  0xffffffff | Host ctl: 0x000000ff
[    9.412181] sdhci: Power:    0x000000ff | Blk gap:  0x000000ff
[    9.418063] sdhci: Wake-up:  0x000000ff | Clock:    0x0000ffff
[    9.423946] sdhci: Timeout:  0x000000ff | Int stat: 0xffffffff
[    9.429827] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff
[    9.435706] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff
[    9.441589] sdhci: Caps:     0xffffffff | Caps_1:   0xffffffff
[    9.447469] sdhci: Cmd:      0x0000ffff | Max curr: 0xffffffff
[    9.453347] sdhci: Host ctl2: 0x0000ffff
[    9.457315] sdhci: ===========================================
[    9.463200] mmc0: Unknown controller version (255). You may experience problems.
[    9.470699] mmc0: Invalid maximum block size, assuming 512 bytes

I have just sent you a patch. Please, apply it on top of my eds branch and try again. In any case send me back full dmesg log, not only some parts of it.

Does it still appear?

On Tue, Oct 25, 2016 at 05:01:43AM -0700, Andy Shevchenko wrote:

@ecc1 Do I get correctly that you connected Base board and PI board together to
Edison?

Yes, I was using a Sparkfun base board stacked with the Pi block for Edison.
The SPI lines, Vcc, GND, and a couple of GPIO lines were connected to a radio
module.

Had you chance to find a root cause?

ecc1 commented

LOL Intel discontinued the product and I've long since moved on to a viable platform instead

@ecc1, this repository was never the official one, and yes, I understand your point. I hope everything works for you on that platform and thanks for the bug report, maybe it would be useful to someone.