hypriot/flash

Can't make HypriotOS 1.12.1 work. So many different problems

rafaeleyng opened this issue · 15 comments

TLDR:

  • I've spent 2 full days trying to use HypriotOS. I've had so many problems that I can't give a summary, so please read the full issue.
  • After all of this, I'm still unable to run HypriotOS.
  • I've encountered a lot of missing, outdated, obscure, and contradictory documentation. Please read carefully the full issue where I represent these points in one of three ways: as questions, by explicitly saying that the doc could explain the specific point, or making guesses how something should work (instead of getting clear information from the docs).
  • I would really like to be able to run this on my Raspberry Pi devices since it seems to be a perfect fit for what I need to do.
  • I'm willing to help to improve the documentation if you think my observations are valid. But I can only do so after understanding the enormous confusion that I'm facing with this setup. I've put around 6 hours of work just to write this issue with every detail I could get.
  • I would like to make it very explicit (with all due respect for the project): using HypriotOS and the flash script was not easy at all in my experience (unlike what is said here).

Hello there

First and foremost, thank you for the project, it seems to really provide a good solution for the problem, especially for less experienced developers like me. If the following text ends up sounding harsh, it is not directed to the project, or to the developers, but to my own frustration.

It's been a recurring theme for me. Whatever I try to learn or do, I end up finding a tool that "makes everything really easy". More often than not, I end up not being able to use or understand the tool, and left with the doubt: either it isn't that easy, or I'm just very, very incapable.

I've spent the 2 last days trying to use Hypriot, and failing miserably. I'm so confused about how this is supposed to work that I can't even open a specific issue about a specific question, so I'll try to expose the problems I've had in this single issue and we can work from this.

Hopefully my questions could point some obscure points about the documentation, and so help more people who are in my situation.

Note: I've read several materials before creating this issue. To cite some:

I hope that makes it clear that this is not just a lazy "help me, it is not working" without any effort or context.


what devices I have

  • Mac Book (where I'm flashing the SD cards etc)
  • Raspberry Pi 2 Model B (with a wi-fi stick)
  • Raspberry Pi Zero W

what I'm trying to achieve

I want to flash an image on my devices with HypriotOS v1.12.1, configuring:

  • hostname
  • wifi settings
  • SSH keys

To make it more clear: I want to put an SD card in my Raspberry Pi devices and boot them with these 3 things configured, so I don't want to have to plug a keyboard or Ethernet cable to my devices.


what I've tried

with the Raspberry Pi 2 Model B

First I've tried a bunch of stuff that I can't remember right now. Then I started all over again, with baby steps (doing the most basic setup, then trying to change one thing at a time), as follows.

first attempt - no config

Here I'm doing basically what is described here.

The blog post is about HypriotOS 1.11 (I'm actually using 1.12). Not sure whether that makes any difference here.

Steps:

  • flash the image without any extra configuration or file, using the flash script: flash -f ~/cluster-images/hypriotos-rpi-v1.12.1.img
  • connect the Pi via Ethernet to my Ethernet switch, via cable
  • access the Pi via SSH with the default hostname: ssh pirate@black-pearl.local, and password hypriot.

This works ✅, but is not what I need, because I'm still accessing with the default hostname, through an Ethernet cable, and with a password in SSH.

second attempt - just the hostname

Everything like the first, but setting a hostname.

Here comes the first confusion. There seems to be more than 1 way to do this, without any explanation of priorities or use cases.

  • flash has the --hostname option.

  • https://github.com/hypriot/flash#cloud-init shows with the configuration hostname: black-pearl and mentions the files for the options --userdata and --metadata (I've learned that these files are related to cloud-init, but I could not find anywhere that explained satisfactorily the difference between both and which fields are accepted in any one of the files). The documentation linked states:

    With HypriotOS v1.7.0 and higher the options --userdata and --metadata can be used to copy both cloud-init config files into the FAT partition.

    and follows with an example file, but does not mention whether that is the --userdata file or the --metadata file. I could guess it is on the --userdata file (100% just a guess, even after researching), but I think that would be so easier if the README explained that.

hostname - first way

So I'll make a guess and go with the --hostname.

Steps:

  • flash --hostname testing-hostname -f ~/cluster-images/hypriotos-rpi-v1.12.1.img
  • connect the Pi via Ethernet to my Ethernet switch, via cable
  • access the Pi via SSH with the configured hostname: ssh pirate@testing-hostname.local, and password hypriot.

This works ✅, but has the same shortcomings than the first attempt.

hostname - second way

Now I try to change the hostname through the cloud-init approach. I create a cloud-init.yml file with:

hostname: testing-hostname-2
manage_etc_hosts: true
package_upgrade: false

Steps:

  • flash --userdata ./cloud-init.yml -f ~/cluster-images/hypriotos-rpi-v1.12.1.img
  • connect the Pi via Ethernet to my Ethernet switch, via cable
  • access the Pi via SSH with the configured hostname: ssh pirate@testing-hostname-2.local, and password hypriot.

This does not work ❌. I try to connect via SSH, and it simply hangs. The hostname was not changed.

hostname - third way

After some more digging, I find in an sample the systemctl restart avahi-daemon command (not present in the example at the README). I update the cloud-init.yml adding it to the file:

hostname: testing-hostname-3
manage_etc_hosts: true
package_upgrade: false

runcmd:
  - 'systemctl restart avahi-daemon'

Again, this does not work ❌.

hostname - fourth way

Out of despair, I copy the full example from https://github.com/hypriot/flash/blob/master/sample/wifi-user-data.yml (and of course make my adjustments). Note: I've copied inclusive the wifi configurations, but I'm still working with the Ethernet cable plugged in.

hostname: testing-hostname-4
manage_etc_hosts: true

users:
  - name: pirate
    gecos: "Hypriot Pirate"
    sudo: ALL=(ALL) NOPASSWD:ALL
    shell: /bin/bash
    groups: users,docker,video
    plain_text_passwd: hypriot
    lock_passwd: false
    ssh_pwauth: true
    chpasswd: { expire: false }

package_upgrade: false

write_files:
  - content: |
      allow-hotplug wlan0
      iface wlan0 inet dhcp
      wpa-conf /etc/wpa_supplicant/wpa_supplicant.conf
      iface default inet dhcp
    path: /etc/network/interfaces.d/wlan0
  - content: |
      country=BR
      ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev
      update_config=1
      network={
      ssid="---"
      psk="---"
      proto=RSN
      key_mgmt=WPA-PSK
      pairwise=CCMP
      auth_alg=OPEN
      }
    path: /etc/wpa_supplicant/wpa_supplicant.conf

runcmd:
  - 'systemctl restart avahi-daemon'
  - 'ifup wlan0'

This works ✅. And by that, I mean: this did set the hostname (I'm not mentioning wifi so far).

I don't understand why. It seems that I've only added user information and wifi configuration, and that somehow makes the hostname work. I just give up investigating the hostname at this point.

third attempt - hostname and wifi

Here I must say: the docs were really, really confusing.

My first reference was https://blog.hypriot.com/post/releasing-HypriotOS-1-11/#flash-with-wi-fi-settings-for-pi-zero-pi-3-pi-4, which recommends using the flash option -u to configure wifi.

Then I go to https://github.com/hypriot/flash and search for "wifi".

First result important for me:

copy an optional config.txt file into the boot partition of the SD image (eg. to enable onboard WiFi)

Ok, so new information, there is a file called config.txt. Should I just use the -u option, or is this config.txt file needed as well?

Second result important for me:

--ssid|-s Set WiFi SSID for this SD image
--password|-p Set WiFI password for this SD image

How does that relate to -u and to config.txt? Should they be combined? Does one override the other?

Third result important for me:

Onboard WiFi
The options --userdata and --bootconf must be used to disable UART and enable onboard WiFi for Raspberry Pi 3 and Pi 0. For external WiFi sticks you do not need to specify the -bootconf option.

Ok, so I'm in a Raspberry Pi 2 Model B, with a wifi stick. So I think I don't need the --bootconf option, and therefore, I don't need a config.txt in this case.

There is an example (that I assume because the previous text is for Pi 3 and Pi 0, actually needs the --bootconf option):

flash --userdata sample/wlan-user-data.yaml --bootconf sample/no-uart-config.txt hypriotos-rpi-v1.12.0.img

Ok, so I think I should use this example and just omit the --botconf option. I go to the sample folder and there is no wlan-user-data.yaml file there. There is a wifi-user-data.yml file, and (another guess) I think it is the right sample. The file has a really confusing comment:

The current version of cloud-init in the Hypriot rpi-64 is 0.7.9

Hypriot rpi-64? What is this? I think because I'm using a Raspberry Pi 2 Model B (a 32 bits CPU), this does not apply to me. Then I check https://blog.hypriot.com/downloads/ to see if I've downloaded the correct version. There I find nothing mentioning neither "32 bits" nor "64 bits". Not knowing what to do with this my best bet is to ignore this information.

So I copy the wifi-user-data.yml file, set in it my wifi SSID, wifi password and country code.

Steps:

  • flash -f --userdata ./cloud-init.yml ~/cluster-images/hypriotos-rpi-v1.12.1.img
  • this time I don't connect the Pi to my Ethernet cable, because I'm trying to use only the wifi
  • access the Pi via SSH with the configured hostname: ssh pirate@testing-wifi.local, and password hypriot.

This is essentially what I've done in the last way that I've described trying to set the hostname, above. And this does set the hostname but does not configure my wifi.

This does not work ❌.

At this point, since I no longer have the Ethernet cable plugged in, I plug a display and a keyboard to my Raspberry Pi 2 Model B and with ifconfig I confirm that I don't have a wifi connection.

In the startup scripts, I see something very intriguing: Cloud-init v 18.3 running 'init-local'.

Ok. Up to this point, I've seen several mentions to the cloud-init used being version 0.7.9. Here are some:

  • https://blog.hypriot.com/post/cloud-init-cloud-on-hypriot-x64/ (which is also linked in the flash README)

    It should be noted, that at this time, the cloud-init version available for Debian distribution is 0.7.9,

  • https://github.com/hypriot/flash

    Please have a look at the sample folder, our guest blog post Bootstrapping a Cloud with Cloud-Init and HypriotOS or at the cloud-init documentation how to do more things like using SSH keys, running additional commands, etc.
    This links to the previous item (so reinforces the idea that the version is 0.7.9, and links to the 0.7.9 documentation).

  • the flash script itself, as I currently write this (

    flash/flash

    Line 61 in aa8107c

    See http://cloudinit.readthedocs.io/en/0.7.9/ for more details.
    ):

    See http://cloudinit.readthedocs.io/en/0.7.9/ for more details.

  • the README at the samples folder: https://github.com/hypriot/flash/blob/master/sample/README.md

    Beginning with HypriotOS 1.7.0 we have switched to cloud-init

  • besides all that, 3 of the 5 samples contain the following comment:

    The current version of cloud-init in the Hypriot rpi-64 is 0.7.9
    When dealing with cloud-init, it is SUPER important to know the version
    I have wasted many hours creating servers to find out the module I was trying to use wasn't in the cloud-init version I had
    Documentation: http://cloudinit.readthedocs.io/en/0.7.9/index.html
    To be honest, the "When dealing with cloud-init, it is SUPER important to know the version" comment seems very ironic to my situation.

So I start looking for places that would mention the cloud-init 18.3:

This was very confusing. I wonder whether this change from 0.7.9 to 18.3 could explain some of my problems (since I could be using the configuration for 0.7.9 and actually running on 18.3).

At this point, I don't know how else can I try to make my Raspberry Pi 2 Model B work with HypriotOS.

with the Raspberry Pi Zero

What could I do here? Spend several hours flashing images just to try to set the hostname? No, I skipped right to the wifi configuration.

Following the instructions on https://blog.hypriot.com/faq/#how-can-i-boot-a-raspberry-pi-zero:

flash --userdata wifi.yaml hypriotos-rpi-v1.10.0.img.zip

Using the exact config in that example (of course filling in my SSID, password, and country code), I've got on the boot: "Failed to start Raise network interfaces. See 'systemctl status networking.service' for details".

This caused my wifi not to work.

The following image (sorry to show pictures instead of text, but it was on a Pi Zero without any network connectivity) shows the specific information about the error, which seems to relate somehow to an eth0 interface (in a Pi Zero, which shouldn't have it in the first place).

1

More information about the error:

2

Checking my interface gets me this (notice how is an eth0 interface there, although I have no idea why):

3

Here another inconsistency in the doc emerges:

Long story short, whether I configured or not the UART thing, wifi didn't work on my Raspberry Pi Zero, with the error shown in the picture above


other observations about the docs

I'll just list some other points where I've found the docs confusing.

Hey @rafaeleyng Thank you very much for your detailed repot. Oh you really tried hard and the time you put into getting it work and writing this feedback is extraordinary.

I haven't read all scenarios yet as I want to give you some quick tips that might help. But be assured I'll read all and reply with more comments later.

So, the basic tests worked fine, but as soon as you created an own coud-init.yaml it started to fail.

One of the nasty things about cloud-init is that this yaml file must contain a first line containing

#cloud-config

Thanks to your feedback I realise that this should be made more visible in our docs.
If that line is missing, cloud-init does not recognise this file and then, of course, none of the configurations are applied and won't work.

Maybe this little piece is the only thing that unblocks you.

Here is a link to the cloud-init FAQ https://cloudinit.readthedocs.io/en/latest/topics/faq.html#how-can-i-debug-my-user-data

Two of the most common issues with user data, that also happens to be cloud-config is:

  1. Incorrectly formatted YAML
  2. First line does not contain #cloud-config

It's hard to add a yaml check into flash bash script, but at least we could check if the first line has the comment.

I just found out that on macOS we already validate the YAML file. So the missing comment makes it to the top 1 problem. I've opened #178 to add a check for the comment in the first line of the user-data file.

Hello, I upgraded my mac to Catalina and updated flash to it last version and I have the exacte same problem described in this issue with my rasp 3 and 4, even with the #cloud-config comment. Before everything was working well since I used the cloud-init config and flash several time before without any issue to report.

Hey @StefanScherer, thanks a lot for the quick response. Let's work together and fix this 😄 .

I'll just like to point out that, even if we manage to solve this problem, the main issue I see here is the obscurity/lack of consistency of the docs. If the docs were clearer, I might have ended with the same problem, but I sure wouldn't have to try a hundred combinations of factors and play the guessing game. And I think my original post helps to point out several of these inconsistencies.


One of the nasty things about cloud-init is that this yaml file must contain a first line containing

I'd take a step back here.

As somebody who doesn't know about cloud-init (just found out that it existed 2 days ago), this is one of the most obscure parts of all of this. Which file are we even referring to? userdata or metadata? What the difference between each of them? What should I put in which one of them? All of this is very implicit, especially in cloud-init doc (which does not include comprehensive documentation of both files, what are all the valid configurations, etc).

image

No one of these menus actually explains what it is. The first menu "User-Data Formats" talks a lot about file formats, shows some Python code, I and can't see the relationship between all of that and the yaml file we are talking about here.

The second menu "Cloud config examples" seems better, it does look like the yaml we are talking about (I still don't see the relationship of this with the User-Data Formats). And then it follows with a long list of examples that should supposedly just work, and only benefit people who already know what that actually does.

I'll not dive into the "metadata" part, whose documentation is even more obscure and confusing. But my point remains.

Let's talk about the issue in hand now.


#cloud-config

So, unfortunately, that was not my problem. I removed the comments in my examples shown in the original post, but in most of them (not all) I've actually copied examples from:

in which they all have the #cloud-config comment.


To be 100% sure, I have just tested it again:

  • Raspberry Pi 2 Model B (so, no onboard wifi, but using a wifi stick). Still not sure whether this makes any difference (see my main post for all the confusion about needing an extra step for devices with onboard wifi)
  • this exact configuration for --userdata (except for the password):
    #cloud-config
    
    # Set your hostname here, the manage_etc_hosts will update the hosts file entries as well
    hostname: black-pearl
    manage_etc_hosts: true
    
    # You could modify this for your own user information
    users:
      - name: pirate
        gecos: "Hypriot Pirate"
        sudo: ALL=(ALL) NOPASSWD:ALL
        shell: /bin/bash
        groups: users,docker,video
        plain_text_passwd: hypriot
        lock_passwd: false
        ssh_pwauth: true
        chpasswd: { expire: false }
    
    package_upgrade: false
    
    # # WiFi connect to HotSpot
    # # - use `wpa_passphrase SSID PASSWORD` to encrypt the psk
    write_files:
      - content: |
          allow-hotplug wlan0
          iface wlan0 inet dhcp
          wpa-conf /etc/wpa_supplicant/wpa_supplicant.conf
          iface default inet dhcp
        path: /etc/network/interfaces.d/wlan0
      - content: |
          country=BR
          ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev
          update_config=1
          network={
          ssid="TP-Link_11FE_5G"
          psk="--------"
          proto=RSN
          key_mgmt=WPA-PSK
          pairwise=CCMP
          auth_alg=OPEN
          }
        path: /etc/wpa_supplicant/wpa_supplicant.conf
    
    # These commands will be ran once on first boot only
    runcmd:
      # Pickup the hostname changes
      - 'systemctl restart avahi-daemon'
    
      # Activate WiFi interface
      - 'ifup wlan0'

Some pictures of what I've got (note that these 3 pictures are from the Raspberry Pi 2 Model B, while in my original post they were all about the Raspberry Pi Zero, I don't have any pictures of the initial tests with the Pi 2 to compare).

Networking service:
service

ifconfig:
ifconfig

wpa supplicant:
wpa

Then I've retested the same configuration with the Pi Zero, and my results were the same as in the original post (including the pictures).

Update: I've tested on Pi Zero both with and without the --bootconf with the enable_uart=0 config (just to remember that the documentation about that is one of the questions I raise in my original post).


@YasserAntonio has a really good point. I've had several problems with several applications after upgrading OS X to Catalina.

Might be related: balena-io/etcher#2997

Note: I've also tried to run the flash script with sudo, getting the same results.

More information about notarization:

I just finished trying to set up the Raspberry Pi 2 Model B again, but with the Vagrant setup, in a Linux VM. I was able to write the image to the card and added --userdata with the same configuration than I used in #177 (comment). I got the same results for all the commands.

@rafaeleyng Thanks for your detailed report. Things evolved over time and the FAQ and docs got adapted from different persons as well. With that said sorry for the inconsistency.

For clarification of your hostname - fourth way section

This works white_check_mark. And by that, I mean: this did set the hostname (I'm not mentioning wifi so far).

I don't understand why. It seems that I've only added user information and wifi configuration, and that somehow makes the hostname work. I just give up investigating the hostname at this point.

When you add a cloud-init.yaml file without a user block like:

# You could modify this for your own user information
users:
  - name: pirate
    gecos: "Hypriot Pirate"
    sudo: ALL=(ALL) NOPASSWD:ALL
    shell: /bin/bash
    groups: users,docker,video
    plain_text_passwd: hypriot
    lock_passwd: false
    ssh_pwauth: true
    chpasswd: { expire: false }

... then no user gets created so you cannot login.

That's why we added a default cloud-init userdata file if no specific userdata file is provided.

I have also struggled with the cloud-init docs in the beginning but if they are insufficient we should maybe open an issue or even better a PR there?

Hey @firecyberice. I've ended up giving up on Hypriot some time after this issue was created. Since I wasn't able to make things work, I'm not comfortable in making a PR (I don't understand the problem and the solution, so I can't fix it).

Hello guys just an easy workaround: When flashing dont set cloud-init.yml with the user-data param with flash. Just use flash to copy the Hypriot image on the sd card. After flashing remount the sd card and simply copy past your cloud-init.yml and rename it "user-data" (must be named this way, without extension) at the root of the sd card and it will work at boot.

@rafaeleyng I feel you
Was just about to run into the same problems... half a year later. Docs as inconsistent as ever.
What you describe matches so much to what I experienced that I start to believe that that's the destiny of the silent majority of first-time users.
Well, and opening PRs... You would have to have at least some moments of success to feel motivated enough.

I did the following steps to change the hostname after run Hypriot 1.12.3 for the first time and it worked for me!

  1. Change the hostname value in /boot/user-data.

  2. Type the following commands:

    • sudo cloud-init clean
    • sudo cloud-init clean --logs
    • sudo cloud-init clean --reboot

After this the hostname should have changed.

I made wifi work! read this https://kerneldriver.wordpress.com/2012/10/21/configuring-wpa2-using-wpa_supplicant-on-the-raspberry-pi/

From what I can read of this github issue, and the resultant conversation, nothing has changed, and the wifi part is only 1 very small part of the whole issue.
I too have NOT been able to change the username or the whole thing breaks again, and I have reflashed, and tried many different methods to get the default user to be something else.
ANY CHANGES that I make to the initial user-config break all ability to login, even though I have not removed or added ANY spaces whatsoever.

I did the following steps to change the hostname after run Hypriot 1.12.3 for the first time and it worked for me!

  1. Change the hostname value in /boot/user-data.

  2. Type the following commands:

    • sudo cloud-init clean
    • sudo cloud-init clean --logs
    • sudo cloud-init clean --reboot

After this the hostname should have changed.

I have three pi's running Hypriotos, and am testing this RIGHT now.
Boot process is taking a bit longer.
Well, it did not undo my naming schema, however, I will have to re-reflash them again to know. For now, I will do it on the master only.
Can confirm that this whole issue, and the documentation still suck.