Doorbot 2 failed
Opened this issue · 31 comments
Doorbot 2 failed yesterday morning, when I got in it was repeatedly rebooting. It seems like the SD card holder on the Pi was broken. Ultimately just replaced the Pi with another Pi3 and that worked. We still need to reassign its IP address so that it's what we expect for doorbot2.
I tried setting up a Pi4 as a replacement. There was various problems. i2c is not enabled by default on the Pi, this needs to be enabled through raspi-config although it's quite possible that this can be by the Imager software too these days. I then needed to recompile the C program that we use for reading RFID cards so that it would run on the 64 bit Pi OS. This appeared to work but then was reading the first part of the fob's ID as zeroes. I couldn't get past this in the time I had so that was when I switched to a Pi4. Fobs with shorter IDs were all zeros, here's some examples with the actual identifying parts replaced by "na":
- 00000000nanana
- 00000000
So the doorbot is working, it could do with a new power supply with a longer cable but apart from that is probably fine but it might be a plan to either have spare Pi 3 devices or figure out how to get it working on a Pi 4 or other hardware (and then have some spares of those!)
FWIW you don't need your C programs that read/write directly to memory as root anymore, you can just use the gpioset
program from libgpiod
, the kernel handles all of this now. As for enabling i2c, you just need to mess with the config.txt, but in my NixOS based images I just have a function which generates a Pi image with all of this sorted out.
gpioset --mode=time --sec=10 gpiochip0 25=1
will open the door for 1 second by triggering the relay
And there's a much better rfid-reader C program that does not depend on bcm2385 specifics like yours, here https://gist.github.com/guyzmo/10118958
https://github.com/MatthewCroughan/raspberrypi-nixos-example/
I'm happy to make this with you, if you're in, but as I've already tried to interact with yourself and @amcewen multiple times with no response on the issue of maintaining the doorbots, I think it's more likely that you are both still disinterested in real maintenance and refactor.
I'll bring this up at futuregazing.
Worth noting that if you use mainline linux, rather than raspberry pi foundation Linux, you don't need to mess with config.txt.
@MatthewCroughan I don't quite know what "real maintenance" is. @johnmckerrell and I have been the only people I've seen do any actual work to fix any issues we've had with the doorbots of late.
Your role seems to be to pop up and tell us that we're doing it wrong; and then pitch how a different software system would have prevented the problem, despite this issue and all the ones we've had of late being hardware issues.
I am uninterested in replacing the software system that's understood and has decades of run time with one that won't solve the maintenance issues we have and is likely (because it's a new system) to have bugs that we'll need to work out.
I'm also tired of having to explain this each time there's an issue with the doorbots
It's good to know that you have zero interest in learning about the code I've created and tested. As long as that has been made clear, I have no issue ignoring future issues and trying to get the attention of two directors that have no interest in change or contribution to a critical piece of infrastructure.
I tried to get your attention, as well as John's. John murmured about some google docs functionality that wasn't fully explained in detail, then he went home to do his paying job, and you never really interacted, as you're busy with your own urgent work.
I will bring it up at futuregazing, since I think it's fairly easy to make the case that you are both in denial about your ability to maintain this system over time.
Neither of your attitudes inspire contribution.
You won't receive another comment from me about any issues relating to the doorbot, or any infrastructure in DoES for that matter. Let it continue to rot.
@johnmckerrell and I have been the only people I've seen do any actual work to fix any issues we've had with the doorbots of late.
Can I further just point out how condescending that statement is, when @pkharvey and I have spent hours each time these issues crop up since 2020, documenting circuit misconfiguration in the entrance doorbot and then sending it in private emails because you didn't want details to leak because you're not confident enough in your own system to have anything be public? And how I have been the one to document and understand doorbot issues since 2020?
In fact, the last time the doorbot died, I directly replaced it with one of my own Pi3's, reverse engineered, and re-implemented its functionality in a 3 line shell script that uses modern kernel APIs instead of 10 year old ones. It takes a special kind of gatekeeping that you seem to be very good at, to make the argument that only you and John are the ones doing work.
I am uninterested in replacing the software system that's understood and has decades of run time with one that won't solve the maintenance issues we have and is likely (because it's a new system) to have bugs that we'll need to work out
Yes, because the current bugs that we know of now, and that you are are not interested in fixing by using a new system are so desirable to keep.
There are tons of pie in the sky issues about things you will never achieve unless you actually spend some time on it. And that's what I mean by "real maintenance", which you are both incapable of because you don't have any time to work on it. And whenever somebody else like myself comes to offer their time, expertise and life-force to work on it, you become a gatekeeper, making excuses about why we can't update anything.
If you think you can do this all on your own, then go ahead.
Your gatekeeping and preciousness over your decade old code got me thinking about a quote from Grace Hopper's lecture
I think the saddest phrase I ever hear in a computer installation is that horrible one. "but we've always done it that way".
@johnmckerrell I found the Pi3 I contributed to the doorbot last time on your desk. I re-soldered the SD card cage back on and tested it working again, I'll take the Pi3 back home.
@MatthewCroughan, from Joel Spolsky's "Things You Should Never Do, Part I":
The single worst strategic mistake that any software company can make is to rewrite the code from scratch.
As you admit, your "3 line shell script" missed "some google docs functionality", and perhaps other things?
That said, you have been offering to help for years so yes in that time we should have been able to find some way to let you, perhaps by breaking down the problem or giving more documentation of the requirements. Your suggestions in your original comment were just what I was thinking of asking you about and are appreciated.
We don't currently use C to open the door, simply writing to /sys/class/gpio/gpio25/value
from the ruby code, yes we could switch to gpioset which uses setuid to gain the permissions I think?
I honestly thought there would be a simple library we would be able to use for the RFID scanning these days but if it's just a different arbitrary C file from the internet then fair enough! It would be good to confirm that works both on a Pi3 and also on a Pi4 (and whether the compiled binary will work fine on both or if we need one per device) and then we can switch over and have some spare hardware ready to go in case of future problems.
I'm glad you managed to get your Pi fixed and working again, holding the SD card in place didn't work so I feared it was more than just that.
@johnmckerrell Your quote assumes your code is so valuable that it's worth keeping, and not chock full of bugs already. DoES isn't a software company either, so it's misplaced IMO.
We don't currently use C to open the door
So you're not using https://github.com/museuminabox/simple_sl030_rfid_reader from https://github.com/DoESLiverpool/doorbot-setup/blob/894027a321c957f05b39f48ad0813059f66b2867/build-rcapp.yml#L12 ?
The way that works is by using bcm2385.h
from http://www.airspayce.com/mikem/bcm2835 which has the following C code
int bcm2835_i2c_begin(void)
{
uint16_t cdiv;
if ( bcm2835_bsc0 == MAP_FAILED
|| bcm2835_bsc1 == MAP_FAILED)
return 0; /* bcm2835_init() failed, or not root */
#ifdef I2C_V1
volatile uint32_t* paddr = bcm2835_bsc0 + BCM2835_BSC_DIV/4;
/* Set the I2C/BSC0 pins to the Alt 0 function to enable I2C access on them */
bcm2835_gpio_fsel(RPI_GPIO_P1_03, BCM2835_GPIO_FSEL_ALT0); /* SDA */
bcm2835_gpio_fsel(RPI_GPIO_P1_05, BCM2835_GPIO_FSEL_ALT0); /* SCL */
This doesn't co-operate with the kernel at all, and crashes horribly if 2 things use i2c at once, it also doesn't expect the device-tree to be setup, which on modern systems, it will be. Nowadays in Linux, you don't need to do anything like this, but 10 years ago, you probably did.
It would be good to confirm that works both on a Pi3 and also on a Pi4
Yup, already did last time the doorbot was down and made a basic NixOS image for pi3/4 that re-implemented that functionality.
simply writing to /sys/class/gpio/gpio25/value
/sys/class/gpio is deprecated in the Linux Kernel since 2015, and on newer releases of any distro you use, you probably won't see it soon. Now you use a character device in /dev/gpiochipX
and use something like libgpiod
's gpioset
command to interface with it.
FWIW when I reverse-engineered the doorbot2 wiring, I figured out that both of our code has to do a while true
since the SL030 is on an old version of the firmware, and not wired up properly, so we can't support making use of its "falling-edge`' feature to detect when a tag has been pressed. I read the manual here, and they don't host the firmware file online anymore from what I can find http://www.stronglink-rfid.com/download/SL030_User_Manual_V3.1.pdf
From my logs, the SL030 returned this
Length: 12
Command: 0xF0
status is 0
Status: Operation succeed
Data: 53;4C;30;33;30;2D;33;2E;35;
Which looks like this value from the manual
Which is..
Buffer("534c3033302d332e35","hex").toString("ascii")
'SL030-3.5'
I recall now, there is no V3.5 manual hosted by stronglink-rfid to specify what changed. The firmware version is therefore undocumented, and I couldn't get falling-edge working last time, so I assumed either misconfiguration in the wiring, or that v3.5 of the firmware screwed something up, and I didn't want to modify the firmware version on the SL030 in case I damaged the doorbot.
I disagree that just because we're not a software company means that the quote doesn't have value.
We don't currently use C to open the door
There I was specifically talking about the door opening action, yes we definitely use C to read the RFIDs
Yup, already did last time the doorbot was down and made a basic NixOS image for pi3/4 that re-implemented that functionality.
Awesome!
means that the quote doesn't have value.
I think the quote has value, yes, I just don't think it's being applied in the right context. I would expect it to apply to big, monolithic (usually corporate) software that has been around for a long time, where tons of bugs have been solved, and that is not worth re-implementing due to these solved bugs. Or something like FreeCAD/OpenCascade which has decades of use-cases encoded in it. Not this minimal code which reads a value from a sensor and makes a pin go high.
doorbot2 has been running my nixos based setup for the past 10 hours and admitted everyone during the futuregazing setup, which proves that it works, and is worth looking at by either of you at some point.
When the existing setup fails, we can fall back on this as a known working system, or not.
I've reverted it back to your normal doorbot, all I did was swap the SD cards.
I'm not seeing the ability to log visits locally, or log visits by hot deskers through the Google Form so this is not currently a drop-in replacement. Note that the logging of hot deskers to Google also has a local cache in case of network downtime, it also only occurs on certain doorbots. Fortunately this wouldn't have been a problem yesterday as we only log these during weekdays but please do not swap the hardware out without asking again (unless you did talk to someone offline of course).
It's still very interesting though. Can you confirm what we would do when we've added fobs, I guess we have to update the hashes in flake.lock, is that manual or does nix do that for you? Then you run the deploy command mentioned in the readme? Do you have to commit changes to git first or is having them locally enough? Would be good to have that documented in a bit more detail ultimately.
I'm not seeing the ability to log visits locally, or log visits by hot deskers through the Google Form
I know. Where is this code currently implemented? I couldn't find it in the existing deployment. I intend to sit down with you, whenever you're free, and implement this. FWIW, is it possible we could use something that isn't Google? This is an example of how the refactoring/rewriting could allow us to become less dependent on third parties like outlined in
- #18.
Obviously I will use Google if there is no other option, but I think this is a good first step.
Can you confirm what we would do when we've added fobs, I guess we have to update the hashes in flake.lock
That's one approach, and works fine yes. You can just nix flake lock --update-input doorbots-config
and this will update the flake.lock
to refer to the new revision of the repo doorbots-config
, and then you can deploy.
But that is only because the Nix code does not reside in the same repository as the _config.yml
database, otherwise you wouldn't need to update the input(s), as the file resides in the same git repository, no need to fetch it from anywhere or update a lock.
In addition, you can have a public GitHub repo, rather than a private one, if you just encrypt/decrypt the _config.yaml
on the fly. I use yubikeys/fido2 to perform encryption/decryption of my secrets, then the ssh host keys on the remote machine are capable of decrypting those secrets anywhere in the filesystem via a nice thing called age
. age
re-uses SSH keys for encryption/decryption, meaning all the existing SSH infrastructure can be used to encrypt/decrypt secrets. This is what I do for all of my own secrets in my personal NixOS config. You can see that here https://github.com/MatthewCroughan/nixcfg/tree/master/secrets. Though there are many approaches to that, such as using sops
, etc, it's a matter of personal taste.
My understanding around #18 is that is less about "DoES should host its own cloud infrastructure" and more "DoES should be the owners of the accounts for its own cloud infrastructure". If that makes sense? i.e. in the post it talks about various individuals own various accounts, which is not ideal when someone goes on holiday, etc. So using Google is fine (to a degree) so long as the accounts are owned by DoES as is currently the case. Our plan for replacing this would really be https://github.com/doesliverpool/optimism but it's a way away from being able to do that yet.
The software that runs on the Pi currently is https://github.com/DoESLiverpool/logcards You'll note it does more intensive parsing of the yaml file as sometimes people have multiple fobs and so one card ID can reference another.
For completeness the current method to provision a Pi is using Ansible with this repo: https://github.com/DoESLiverpool/doorbot-setup
It is disappointing to me on a personal level that does doesn't want to own the infrastructure, only the accounts. But I'll work with that, and still re-implement the missing functionality with Google.
FWIW, I did study this repo twice in the last 8 months in order to re-implement everything in Nix. But I didn't spot anything relating to Google, can you please point it out more clearly, instead of just linking to the doorbot-setup repo? https://github.com/search?q=repo%3ADoESLiverpool%2Fdoorbot-setup%20google&type=code
Our plan for replacing this would really be https://github.com/doesliverpool/optimism but it's a way away from being able to do that yet.
That's our plan for having more control over this and would be "hosted" ourselves (on a cloud hosting provider most likely).
The Google Spreadsheet part is not in the setup repo, it's in the one I mentioned a few lines above - the log cards repo, this line specifically:
(on a cloud hosting provider)
So you still won't own it.
Our plan for replacing this would really be https://github.com/doesliverpool/optimism but it's a way away from being able to do that yet.
I don't see a clear plan to optimism getting done. The last time it received a commit was 2 years ago. Like most of the issues on the issue tracker, regarding the doorbot. But here you have someone (me) interested in getting things done quickly, and you keep placing made-up blockers after made-up blockers in place. Enough is enough. Let's get something done.
What precisely is the purpose of posting to Google docs. I'm interested in throwing 300 lines of ruby away, and replacing it with 10-20 lines of something else.
My understanding based on what you've written so far is that all your code does is:
- Log hotdeskers who enter the space
- During weekdays only
- Has a local cache in case of network downtime
- Only occurs on certain doorbots
My proposal for making this less complicated is to use an MQTT broker and hash the doorbot IDs, so it's not public who entered, just that a hotdesker entered. When a hotdesker enters, it submits this hash to the broker. Now you can write a tiny script to listen to this broker, and do what you want with that information, which includes parsing it and placing an entry on google docs. This also means we aren't tied to any specific google docs functionality or API via HTTP, and things can be swapped around in the future.
I'm happy to write this, implement this, and help you understand anything I write. When are you next in DoES?
Yes, there is a lot of concern trolling going on. That is right. I'm going to try on this issue for about 2 weeks before I need to go to Europe for conferences.
If we cannot make progress, I'm thinking about leaving the space altogether, and avoiding getting nothing for the 20% increase in desk prices.
From Google, concern trolling is defined as:
the action or practice of disingenuously expressing concern about an issue in order to undermine or derail genuine discussion.
@seanspotatobusiness that's a nice idea but I'm happy that it's not necessary right now.
@MatthewCroughan I have tried to be civil throughout this discussion, there is at least one real, specific, requirement that your solution does not perform. You have been told this before. On this thread I feel that you have only grudgingly listened to my concerns and have been rude and aggressive repeatedly. I will admit that you are not the person who raised the temperature initially but I personally have tried to stick to facts and specific requirements in my messages (and also praise!)
I haven't had chance to reply to yesterday's message for multiple reasons but my preference would be not to transmit the IDs or a hash thereof on a "public" (even on our internal network) MQTT broker. I'm sure we've discussed in the past that I quite liked the idea of event passing within the pi to pass this work to separate processes and use MQTT for more anonymous events if necessary (but this isn't a current requirement).
If you're happy to implement that or something along those lines then that would be great, or of course suggest an alternative (although I've tried not to be too specific other than the "not on a publicly accessible broker").
It would be nice if someone else could also review logcards to give a second opinion and confirm that we wouldn't lose any other useful functionality (I'm happy that anything around Slack, kindles or ringtones is not currently useful, although I think I removed most of that anyway).
Having now spent almost my entire lunch hour thinking about this when I should have been driving to the hospital I'm going to have to leave it there.
there is at least one real, specific, requirement that your solution does not perform.
Yes, and I've proposed above how this would be done, and asked when you would next be in. But there is no response to that.
If you're not going to be in, then tell me that. If Adrian is not available, then he should tell me that as well.
Tonight is hack the space night according to the calendar. Will I make any progress on this, or will I travel to DoES pointlessly, and make no progress tonight?
https://doesliverpool.com/calendar/
I want to get this done in a reasonable timespan (2 weeks). If that's not possible due to your own lack of time, or other's lack of time, since apparently in this space you need extra special permission, I'd like you to stop dangling a carrot in front of my face, and simply tell me no. I think you have forgotten the past 3 years. This has not come from nowhere.
my preference would be not to transmit the IDs or a hash thereof on a "public" (even on our internal network) MQTT broker.
MQTT is capable of authentication, the same way HTTP is. Are you happy if authentication is performed? I'm confused otherwise.
The extra fun functionality you reference (logcards?) hasn't been in use since Gostins, which is part of what I want to restore. But can't until you actually give me the permission to hack on this stuff, which is why I've been so frustrated for the past 3 years, being unable to get anything from either you or Adrian on this.
I imagine you can develop that without my presence
I need you to show me the credentials needed and what the form looks like, then I can figure it out, yes.
There are no credentials needed actually, you simply put the name and the number of days (which is dependent on the time of day) into the URL which was listed in the logcards repo:
https://docs.google.com/forms/d/1eW3ebkEZcoQ7AvsLoZmL5Ju7eQbw8xABXQm3ggPJ-v4/formResponse?entry.1000001=#{URI.escape(name)}&entry.1000002=#{numberOfDays}&entry.1000002.other_option_response=&submit=Submit")
It seems we're even just using GET and it works. You can see the form here though if you're particularly interested:
I went in for hack the space last night, but nobody was there. So I just got some beer and hung out with @pkharvey
We obviously still need to update the network config to give doorbot2
the right IP address, but as a workaround because @JackiePease wanted to update the keys, I've updated /etc/hosts
on doorbot1
to have the current IP address for doorbot2
.