Host can write 720k images but not 1.44MB images
hmb001 opened this issue · 35 comments
Problem:
FF will read files perfectly fine from 1.44MB (standard PC format) disk image. When writing a file to the image, the Gotek display shows track changes and indicates write is occurring. System reports no errors during write. However, if disk is ejected and then reinserted, no file has been written. Same happens when running format - the Gotek display shows writing occurring to every track and side, and entire format finishes without error. After the format completes and the disk is ejected and reinserted the original data files remain on the disk, indicating nothing was actually written to the disk during the format. Incidentally, same problem when using HFE vs IMG image file.
Noted that FF works perfectly fine when reading/writing to a 5.25" DSDD IMG file.
Background info:
- Gotek/FF installed in older PC clone that supports all standard PC drive standards (5.25" DD/HD and 3.5" DD/HD), and known to work with "real 3.5" drive"
- only jumper installed is for DS1. System does drive motor on line, but I understand Gotek/FF ignores this signal
- latest model Gotek AT32F435
- issue is same with FF v3.41 and 4.6a
- same issue with a wide range of USB flash drives, from very small capacity to a brand new 32GB. Yes, all formatted as FAT32 and can be written to without difficulty by a wide range of PCs and Macs.
Changes I have made in FF.CFG with no effect:
- interface: Shugart vs IBMPC
- pin 02 set to NC, and pin 34 set to RDY (as required on the system it is installed in)
- track change and write drain: have tried both instant and realtime
- index suppression: have tried yes and no
This "seems" to be a simple problem - the FF firmware when emulating a 1.44M floppy is simply unable to perform a write operation to the USB flash drive.
That's an interesting problem, so if you write/format a HD HFE image file, it is not written at all but left pristine?
I would at least expect it to be corrupted I think. It could also be interesting to format such an image and attach it here for inspection.
I will test this myself against Greaseweazle just to confirm I can write a 1m44 IMG or HFE. Tomorrow.
That's an interesting problem, so if you write/format a HD HFE image file, it is not written at all but left pristine?
I would at least expect it to be corrupted I think. It could also be interesting to format such an image and attach it here for inspection.
The image remains pristine. That is how I know this appears to be a fundamental problem, ie nothing gets written to the disk image on the USB flash drive.
What makes you certain that the Gotek recognises a write is occurring? Does it flash W in the display?
May also be worth installing logfile firmware and grabbing a FFLOG.TXT. Is the HFE left so pristine that say an md5sum will match before and after?
What makes you certain that the Gotek recognises a write is occurring? Does it flash W in the display?
Yes, the display shows the Gotek is recognizing a write. During a file write, the track indicator changes and the "W" appears as each track is written. During a format, the track and side indicators increment in pace with the screen display, and the "W" flashes as each track gets written to.
The USB flash drive is not left "completely pristine", since FF does write its CFG file to it indicating the most recent file image it mounted.
As I mentioned, read/write of a 360K 5.25" disk image works just fine.
I should also mention that an older pre-AT32F435 Gotek version, running FF version 3.21, worked perfectly fine in the same system.
Yes it makes sense that IMAGE_A.CFG is written. It is interesting whether the HFE image is left bit-for-bit identical.
Yes it makes sense that IMAGE_A.CFG is written. It is interesting whether the HFE image is left bit-for-bit identical.
It would be interesting to see whether even a few bytes managed to get written to the HFE file. That is something I could check - though it is likely to be quite small given the files on the image all still seem to be quite readable after a "format" of the image.
A bit more information:
- the system will format/read/write a 3.5" DD image without problems
- I put a scope on the floppy write data line, which shows a 500K data rate when formatting a 3.5" HD floppy, and a 250K data rate when formatting a 3.5" DD floppy, as expected.
- I ran a format on the unformatted "high_density.hfe" image provided on the FF web site, and it does not appear as though any changes were made to it after running format.
And an additional data point - I hooked up an old Gotek (has the original STM microcontroller) running an older version of the HxC firmware. No problem writing to a 3.5" HD floppy image on that emulator.
I noticed the problem I am having is exactly the same as the one reported in "file 851" someone else reported here in Nov/23
Perhaps you can gather FFLOG.TXT after writing to an IMG and writing to an HFE? By installing the logfile alt firmware. Use latest v3.x.
The error would be consistent with the MCU seeing no flux transitions on the write pin for some reason. That would be weird though since it works perfectly at half rate and the rate is hardly fast.
There are two versions of the logfile firmware with no documentation explaining the difference:
FF_Gotek-logfile-3.41.upd
flashfloppy-logfile-3.41.upd
Which one do I install?
Ah yes, the latter. Actually you can copy both and only the appropriate one would be used. But flashfloppy-logfile-3.41.upd is the one for AT32F435.
Attached are two log files:
FFLOG1 was during formatting of the 1m44.img image.
FFLOG2 was during formatting of the high_density.hfe image.
One difference (possibly attributable to the slightly different firmware version?) was that unlike my previous attempts when verify following format showed no errors, after both of these formats the majority of tracks showed verify errors.
Yes that is interesting as a couple of lines in FFLOG1.TXT do look similar to #851
IMG Bad Wr.Off: -12034
IMG DAM Unknown
The HFE does appear that it should be written to. Lots of this in FFLOG2.TXT:
Write 0-4095 (4096)... 5078 us
Write 4096-8191 (4096)... 5351 us
Write 8192-12287 (4096)... 5380 us
Write 12288-16383 (4096)... 5306 us
Write 16384-20479 (4096)... 5021 us
Write 20480-24575 (4096)... 5301 us
Write 24576-28671 (4096)... 5385 us
Write 28672-32767 (4096)... 15618 us
Write 32768-36863 (4096)... 6932 us
Write 36864-40959 (4096)... 5418 us
Write 40960-45055 (4096)... 5235 us
Write 45056-49151 (4096)... 5224 us
It's not very fathomable therefore that the HFE image remains pristine. Can you attach your resulting HFE image?
A quick test of writing a 1.44MB random IMG to a '435 Gotek running v3.41 and using HFE and IMG images does work 100% for me. So what's happening in your case, I don't know; Some electrical incompatibility, or marginal signal, or maybe even a defective '435 MCU? It's hard to know because of course I have not seen this problem myself.
Your HFE image file may shed some light. Or not. Even if it contains the bad written data it's a bitcell format not a flux format. A logic analyser or scope trace of WDATA could be interesting, actually.
EDIT: Another thing to test: That resistance of Gotek pin 22 to +5v is 1kohm. That would confirm the Gotek has the correct pull up on WDATA.
Attached is a copy of the 1m44.img and high_density.hfe files after format.
I could provide a scope tracing of the WDATA signal. As I mentioned previously, its bit rate frequency and timing appear spot on, but happy to have you look at it.
I have this same problem with two new Goteks with Flashfloppy installed, both recently purchased. Yes, I considered the signal might be "marginal" however it would be disappointing if it turns out older model Goteks with older versions of Flashfloppy and HxC are more tolerant of the write signal than the newest Gotek and firmware versions.
Thanks for the files. The IMG is indeed pristine, this is unsurprising if the write stream appears garbage (for reasons undetermined) to the FF firmware. No sectors decoded, therefore no sectors written to IMG.
The HFE is definitely not pristine, however it is garbage. Loaded into HxC tools I can see bit bands at 1us, 2us, 3us, 4us. The 1us bit band should not be present. Interestingly the visual pattern looks identical across tracks. It's definitely not random garbage.
A scope trace to show voltage levels and edge rates could be interesting. Also a logic analyser type longer trace that shows edge timings over a decent time period (the kind of thing you get from those cheap Saleae clones, for example). Pullup resistance on WDATA at the Gotek (pin 22) to +5v.
I could also supply an alt firmware which logs at least some flux measurements during writes. Either to FFLOG.TXT or to serial line if you have a USB-TTL serial adapter or similar. This would have similar information to a LA trace, but also would tell us what the firmware is actually seeing "coming in" on that IO pin.
One option here could be a slow slew rate plus noise introducing extra fluxes. It would seem weird this would affect HD rate only though, and apparently so deterministically.
Thanks. The falling edges on write data don't look very well locked to 500kHz but perhaps that is partly a lack of precision in the LA. It ought probably to be good enough anyway. I guess we need to find out whhat the firmware thinks. Can you gather serial traces? Or do you want to gather FFLOG.TXT (rather more restricted in what we can gather as it's limited by a small in-RAM log buffer)?
EDIT: For example, look at the falling edges either side of 518106us. They look to be pretty close to 1.5us rather than 2.0us would you say?
EDIT2: even across a sequence of several pulses, it feels like the clock rate is higher than 500khz (or 1MHz MFM code rate). Perhaps approaching 10% higher? That would be a long way off for a crystal-clocked disk controller!
I had a closer look at the WDATA trace above. What follows is a fairly technical analysis.
I think the close-together adjacant pulses are two MFM clocks apart, while the far-apart adjacent pulses are four MFM clocks apart. That is, the given trace would decode to bitcells as 1010001010101000101000101
. That's 24 MFM clocks in total, and the time between first and last pulses is 24us. So the MFM clock is actually about 1us, as it should be (1MHz MFM clock).
But let's look at the bit timing in the far-apart pulses: fairly solidly 4.5us apart (4.0us + 500ns).
Now look at the train of four close-together pulses: gaps are about 1.75us, 2.0us, 1.75us (that's -250ns, 0ns, -250ns from expected) .
Now look at the following singleton close-together pulses (at around +518106us): timing is pretty close to 1.50us (that's -500ns from expected).
So perhaps the problem here is aggressive write precompensation (https://github.com/keirf/greaseweazle/wiki/Write-Precompensation). This is where close together pulses get written even closer together, to counteract the tendency for them to spread apart when superimposed on the magnetic media.
I wonder if this host is writing with 250ns write precompensation? This will push a pulse towards its closer neighbour by 250ns. A single short pulse gets pushed 250ns from each side, shortening it by a total of 500ns.
Following this analysis, I am happy to furnish you with debug/log firmware if you can gather serial traces. Alternatively I can investigate writing with aggressive precomp from Greaseweazle myself, and see if I can repro your issue. I am confident the issue can be fixed, anyhow.
The issue is that I reset my "bit clock" on every pulse, and count MFM clocks to the following pulse. 1.5us is right on the edge of being considered MFM 11 vs 101. It should be the latter but FF is probably guessing the former. The fix will be to carry over some bit-time credit (or deficit) from the previous pulse gap (say 50%). Easily implemented!
Prior to reading your last couple of messages I had come to the same conclusion - that precomp being applied to the write data signal cannot be handled by the current version of Flashfloppy. My floppy controller uses an original 8272 floppy controller chip, and yes this controller is designed to apply 250ns of write precompensation using a simple shift register circuit external to the 8272. This design is to accomodate requirements of older drives (ie 8" drives), and fortunately many newer drives are still able to function despite this write precomp having been applied. Likely the HxC firmware, and older versions of Flashfloppy running on older Gotek versions, are more tolerant of the changes in pulse timing introduced by precomp, whereas the current version of Flashfloppy is unable.
Knowing this... do you still need a serial trace of the entire bit stream being sent to the Gotek? I'd have to figure out whether my logic analyzer (a DS Logic U3Pro16) is capable of capturing a bit stream like that.
Wow 8272, that is very vintage. I don't think I need any logging, I can reproduce this using Greaseweazle and supply test firmware to you in a few days.
I doubt that 250ns precomp with a 1MHz write clock has ever worked reliably with FlashFloppy. I assume that HxC either gets lucky or more likely implements something along the lines of what I'm about to do.
With a (temporary) patch to the drive controller circuit board to disable write precompensation, I have confirmed this is indeed the problem. The controller is able to successfully read/write/format a 1.44MB disk image with write precomp turned off.
Unfortunately I cannot leave the drive controller permanently patched this way, since I have a couple of 8" and 5.25" floppy drives also connected to it that are dependent on write precomp for proper operation. I'm hoping you might be able to come up with a method within the Flashfloppy firmware to compensate for up to 250ns of write precomp in the write data stream.
I'm sure I can, the data stream is marginal and only needs a nudge to decode correctly. Sit tight.
I haven't tested this myself yet, but I have a prototype fix built for testing here https://github.com/keirf/flashfloppy/actions/runs/7477276551
You need to be logged in to GitHub to see the Artifact link at the bottom of the linked webpage. Download that zip, extract the non-debug zip contained within, and flash the upd file. You will see that your Gotek reports a commit hash as the version number, showing that you are correctly updated to a non-release firmware.
I'm happy to report your "prototype fix" appears to have solved the problem.
Using this firmware version I was able to successfully:
- format a 1.44M IMG and HFE disk image
- copy (with verify) and read back a disk full of files to both IMG and HFE disk image files
That's good news. You can continue to use that firmware until the next release which will include the fix. Meanwhile I need to do my own testing, get the fix into main branch, and decide when to do a release. Probably that will be soon, since v3.41 was released six months ago and I have a few things already queued up.
The fix is tweaked and released in new firmware v3.42 -- please test!
Unfortunately, firmware version 3.42 seems to have a significant problem. Any attempt to mount, read, or write to a 1.44M disk image results in an error - either "not ready" or it is unable to detect there is a disk in the drive. With firmware version 3.41 my system was at least able to mount and read 1.44M disk images.
For now, I have reverted back to yesterday’s “prototype fix”.
So even ignoring writes, 3.42 cannot reliably read an existing good IMG file?
This is hard to fix because there are no differences between 3.42 and prototype fix, except in write handling (very minor) and unrelated areas: ADF image handling and QD image handling. There is no code change at all that would affect reading of IMG files.
flashfloppy$ git diff origin/issue-861..v3.42
diff --git a/RELEASE_NOTES b/RELEASE_NOTES
index 6baadbd..746fb87 100644
--- a/RELEASE_NOTES
+++ b/RELEASE_NOTES
<elided>
diff --git a/examples/Host/GRiD/IMG.CFG b/examples/Host/GRiD/IMG.CFG
new file mode 100644
index 0000000..cceefed
--- /dev/null
+++ b/examples/Host/GRiD/IMG.CFG
<elided>
diff --git a/src/floppy_generic.c b/src/floppy_generic.c
index 5c4e9ee..b9fdcd7 100644
--- a/src/floppy_generic.c
+++ b/src/floppy_generic.c
@@ -674,7 +674,7 @@ static void IRQ_wdata_dma(void)
bc_dat = image->write_bc_window;
for (cons = dma_wr->cons; cons != prod; cons = (cons+1) & buf_mask) {
next = dma_wr->buf[cons];
- curr = (uint16_t)(next - prev) - (cell >> 1);
+ curr = (int16_t)(next - prev) - (cell >> 1);
if (unlikely(curr < 0)) {
/* Runt flux, much shorter than bitcell clock. Merge it forward. */
continue;
@@ -687,7 +687,7 @@ static void IRQ_wdata_dma(void)
bc_buf[((bc_prod-1) / 32) & bc_bufmask] = htobe32(bc_dat);
}
curr += cell >> 1; /* remove the 1/2-cell bias */
- prev -= curr >> 3; /* de-jitter/precomp: carry 1/8 of phase error */
+ prev -= curr >> 2; /* de-jitter/precomp: carry 1/4 of phase error */
bc_dat = (bc_dat << 1) | 1;
bc_prod++;
switch (sync) {
diff --git a/src/image/adf.c b/src/image/adf.c
index 80fb1df..016611a 100644
--- a/src/image/adf.c
+++ b/src/image/adf.c
@@ -41,6 +41,7 @@ static bool_t adf_open(struct image *im)
im->tracklen_bc = DD_TRACKLEN_BC;
im->ticks_per_cell = ((sampleclk_stk(im->stk_per_rev) * 16u)
/ im->tracklen_bc);
+ im->write_bc_ticks = im->ticks_per_cell / 16u;
im->nr_cyls = f_size(&im->fp) / (2 * 11 * 512);
diff --git a/src/image/qd.c b/src/image/qd.c
index a6ef853..758a326 100644
--- a/src/image/qd.c
+++ b/src/image/qd.c
@@ -33,7 +33,7 @@ static bool_t qd_open(struct image *im)
im->qd.tb = 1;
im->nr_cyls = 1;
im->nr_sides = 1;
- im->write_bc_ticks = sampleclk_us(4) + 66; /* 4.917us */
+ im->write_bc_ticks = sampleclk_ns(4917); /* 4.917us */
im->ticks_per_cell = im->write_bc_ticks;
im->sync = SYNC_none;
Version 3.42 ... actually does work perfectly.
User error on my part. I switched back and forth between versions this evening, and every time 3.42 was installed it wouldn't work. What I think I was doing wrong - though why it only happened with version 3.42 and not the provisional version I cannot explain - is that I may not have been allowing for sufficient time for the new firmware and FF to "settle" before accessing an image. Perhaps I should have just rebooted the entire system after upgrading the FF firmware, and before trying to use the new version. Anyway, all is well and sorry for causing you the unnecessary concern.
No worries, thanks for testing.