swharden/pyABF

explore support for variable-length sweeps

swharden opened this issue · 3 comments

Currently pyABF is hard-coded to expect fixed-length sweeps. Andrew emailed me some ABFs with variable length sweeps which pyABF has trouble with. ABF info: 2020_06_16_0000.md and 2020_06_16_0001.md

pyABF calculates sweep length like this, which is correct for fixed-length sweeps, but incorrect for variable length sweeps:

pyABF/src/pyabf/abf.py

Lines 344 to 345 in fcf0c7d

self.sweepPointCount = int(
self.dataPointCount / self.sweepCount / self.channelCount)

Variable-length sweeps are managed by the SynchArraySection which pyABF finds the memory address for but doesn't use.

self.SynchArraySection = readStruct(fb, "IIi", 316)

For the 2020_06_16_0000.md the header map says SynchArraySection = [362, 8, 3].

  • 362 times 512 = ‭185,344‬ which is a memory address near the end of the file.
  • 8 means each synch array contains 8 bytes
  • 3 means there are 3 synch arrays

Let's poke around with a hex editor at this address and see what we have
image

I'm guessing each array contains 2 values, each a UInt32. This would make 3 arrays:

  • 14479, 3540
  • 44979, 70040
  • 147479, 16040

Let's compare to ClampFit

image

Those values look like start times and durations:

  • sweep 1: starts at 1.4479 s (lasts 0.3540 s)
  • sweep 2: starts at 4.4979 s (lasts 7.0040 s)
  • sweep 3: starts at 14.7479 s (lasts 1.6040 s)

EDIT: it seems sweep 2 start time doesn't match up. What the heck is this?

... not sure how to best modify pyABF to support this yet (fixed sweep lengths are baked in pretty deep) but at least now we know how to get all the information we need

Awesome, this information dates back to ABF1 files so we have some documentation:
https://swharden.com/pyabf/abf1-file-format.md.html#the-abf-synch-section

The ABF Synch Section

The ABF Synch array is an important array that stores the start time and length
of each portion of the data if the data are not part of a continuous gap-free
acquisition. The data section might contain equal length or variable length
sweeps of data. The Synch Array contains a record to indicate the start time
and length of every sweep or Event in the data file. The ABF reading routines
automatically decode the Synch Array when providing information about the data.

A Synch array is created and used in the following acquisition modes:
ABF_VARLENEVENTS, ABF_FIXLENEVENTS & ABF_HIGHSPEEDOSC. The acquisition modes
ABF_GAPFREEFILE and ABF_WAVEFORMFILE do not always use a Synch array.

Offset Header Entry Name Type Description
0 lStart long Start time of sweep in fSynchTimeUnit units.
4 lLength long Length of the sweep in multiplexed samples.

I got this working in concept...

# THIS EXAMPLE ASSUMES A SINGLE CHANNEL
filePath = DATA_FOLDER + "/2020_06_16_0000.abf"
abf = pyabf.ABF(filePath)

sweepYs = []
sweepXs = []
with open(filePath, 'rb') as fb:
    fb.seek(abf.dataByteStart)
    for sweepIndex in abf.sweepList:
        firstPoint = abf._syncArraySection.lStart[sweepIndex]
        pointCount = abf._syncArraySection.lLength[sweepIndex]
        sweepY = np.fromfile(fb, dtype=abf._dtype, count=pointCount)
        sweepY = np.multiply(sweepY, abf._dataGain)
        sweepY = np.add(sweepY, abf._dataOffset)
        sweepYs.append(sweepY)
        offsetSec = firstPoint / abf.dataRate
        sweepX = np.arange(len(sweepY)) / abf.dataRate + offsetSec
        sweepXs.append(sweepX)

plt.figure()
for i in abf.sweepList:
    plt.plot(sweepXs[i], sweepYs[i])
plt.show()

image

Implementing this in the core pyABF library will require extreme care to ensure the existing behavior for fixed-length sweeps remains unmodified. The ABF reading function is definitely one I do not want to break 😅 I'm happy I'm protected by thousands of automated tests, but still, to avoid headaches I'll only move on this when I'm ready to approach this very carefully.

Got it all figured out, merged in, and it's now live on pypi (pyabf 2.2.6)

pip install --upgrade pyabf

The sweepY list may now be a variable size, and if absoluteTime is True then sweepX returns proper times in the recording. With variable length recordings this means gaps in the data may be present:

import pyabf
import matplotlib.pyplot as plt

abf = pyabf.ABF("2020_06_16_0000.abf")

for sweepIndex in abf.sweepList:
	abf.setSweep(sweepIndex, absoluteTime=True)
	plt.plot(abf.sweepX, abf.sweepY)
plt.show()

image