BEL-Public/mffpy

ENH: Modify XML.from_file or other method to return category names

Closed this issue · 14 comments

XML.from_file is limited use, can check for a category, exists with limited information. Need the following abilities:

  1. Get list of all categories and print them:
    cats = XML.from_file(myfile.xml) for aCat in cats: print(aCat)

  2. get specific category via reader:
    myFile = mffpy.Reader(someEEG.mff)
    print(myFile.categories)
    aCategoryOfInterest = myFile.categories[0]

Per the e-mail from @pmolfese this morning, our goal here is to enhance XML_files.py with an additional method for the Categories class that returns an iterable list of categories from the MFF in question.

@dhorkin - amending my statement above. I'm wondering if between the email @pmolfese sent and his description above, if we shouldn't expand reader.py with a @cached_property for the categories of a file, similar to the epochs property currently there.

It appears that the Category class in xml_files.py currently has a method that'll return a Dictionary of all categories in the associated MFF file - and it looks like the main reader just needs to be expanded out a bit further to accommodate more of what an MFF can do.

Having it in Reader directly would be better. I think the shortest path to most workflows is: 1) open file, 2) see what type of file it is (avg, continuous, epoched), 3) if avg/epoched, get category names, 4) get data for each category if asked - if epoched that would mean a multi-argument function with either epoch number, or epoch count + category name.

@pmolfese - I have a potential start to this functionality committed up to the "category_list_fix6" branch. We'll be testing on our end, but if you want to give it a shot - you can let me know to what degree I have correctly interpreted your desired functionality. :)

@ephathaway - how about you, @dhorkin and I toss some ideas around on how to test this feature in practice so the branch isn't floating out in the ether for too long.

This is still on my list... this week is fairly condensed but happy to chime in soon.

It's close... I'm not sure I get the entirety of the logic but building it into Reader makes sense. I want to do this:

fo = mffpy.Reader("NASA_001m_6x25_1HDT.ave.mff")
fo.set_unit('EEG', 'uV')

read_epochs = fo.epochs

for anEpoch in read_epochs:
    print(anEpoch.name)

And get the name of each epoch from the categories segmentation.

Hi @pmolfese, my apologies that this issue has fallen through the cracks. We have had some shifting of responsibilities for the maintainers of this repository.

We are still interested in getting this feature implemented and wanted to reach out to see if your use case is still the same. If I am understanding correctly, you would like the categories and epochs methods for the Reader class to be combined so that we can access the category name for each epoch with something like epoch.name?

Yes. I could see two use cases - when data is averaged it would be nice to have it print names, but also retrieve epochs specific to the condition (weather data is averaged or not). We can do this in MNE:

epochs_left = epochs['Auditory/Left']

We also have the ability to do things like:
evoked_left = epochs['Auditory/Left'].average(picks=picks) #ERP of this condition instead of individual trials

I could also see use cases for getting epochs by number. So fetch the first epoch or the first 10 epochs, etc.

Yes. I could see two use cases - when data is averaged it would be nice to have it print names, but also retrieve epochs specific to the condition (weather data is averaged or not). We can do this in MNE:

Being able to index epochs by condition for both segmented and averaged MFFs would be very useful. We will need to figure out a way to determine whether an MFF file is segmented or averaged. This info is contained in the history.xml, but we haven't implemented read capability for these XMLs yet.

Side note: I am actually working on adding read capability for averaged MFF files to MNE and I use mffpy as a dependency. Implementing indexing epochs by condition for averaged files would make this a lot easier.

Glad to help! I'll go on a limb and say that I think MNE is the a good standard to try to emulate when necessary, but also of course importing mffpy into MNE to get the functionality would be great. Just within that (and this is probably a separate issue): read continuous, read segmented, read averaged, show history, etc would all be super useful.

For sure. The ultimate goal is to be able to read and write all different flavors of MFF. Here is a link to the issue if you are interested.

As far as getting categories into the Reader class for mffpy, I will work on getting this implemented in the next couple of weeks. Looks like Wayne already started working on a branch so it shouldn't take too much to get that going.

@pmolfese I submitted a PR #33 to address this issue. I got around determining whether the MFF file is segmented or averaged by populating the Epoch.name field with the category names present in categories.xml (provided that there are an equal amount of categories and epochs. You can then query epochs by name with Reader.epochs_by_name("name").

Let me know if this satisfies your requirements.

@pmolfese we merged the PR #33 with a few changes. You can now index epochs through the Reader class with either Reader.epochs[idx] or Reader.epochs['category name'].