AAC-Corpora-Collecting

Research figuring out how we collect corpora/message histories from AAC system

Windows OS Based AAC Software
iOS only
- Proloquo4Text
MultiPlatform

Windows OS Based AAC Software

Grid 3

Download a demo: https://thinksmartbox.com/product/grid-3/
Prediction engine is SwiftKey (you can see the .net sdk files of learned.json)
Stores all preferences in C:\Users\Public\Documents\Smartbox\Grid 3\Users\UserName\langCode\Phrases\history.sqlite

UserName - is whatever users have been set by the Grid. Usually there will only be one. LanCode = e.g en-gb

NB: This C:\Users\Public\Documents\ directory is in the Registry - HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\Shell Folders\Common Document

FOR /F "tokens=3*" %%A IN ('REG.EXE QUERY "HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\Shell Folders" /V "Common Documents" 2^>NUL ^| FIND "REG_SZ"') DO SET CommonDocs=%%B

Data is in Table PhraseHistory - Phraseid is matched on Table Phrases -id. Each item has a timestamp. So you can do counts on when and how many phrases are in each History. See this SQLlite database for what the History data looks like

**Update: Now underway in this PR Baton-donation/app#3

TobiiDynavox - Communicator

https://uk.tobiidynavox.com/pages/communicator-5-ap
Download at https://uk.tobiidynavox.com/pages/communicator-5-product-support?tab=1 - eg https://download.mytobiidynavox.com/Communicator/software/5.5.7/TobiiDynavox_CommunicatorSuite_Installer_5.5.7.12303_en-GB.exe
Settings at C:\Users\userName\AppData\Roaming\Tobii Dynavox\Communicator\5\Users\User 1\Settings\VocabularyUsage
Cant figure this out. Seems to only update on exiting Communicator.
Prediction engine is SwiftKey (you can see the .net sdk files of learned.json)
Looks like its not possible to "read" a swiftkey learned.lm file which is a bummer: https://support.swiftkey.com/hc/en-us/community/posts/115002963989-How-do-I-see-a-library-of-what-SwiftKey-has-learned-
So we do have some .phr files in "C:\Users\wwade\AppData\Roaming\Tobii Dynavox\Communicator\5\Users\User 1\Phrases"

For example we have one called "Speech history.phr"

It has a file like this:

 ÿþÿ+H e l l o   h o w   a r e   y o u   I   t h i n k   i t ' s   o v e r   c o o k e d . @B� ÿþÿ�H e l l o   h o w   a r e   y o u @B� ÿþÿ�H e l l o @B� ÿþÿ H e l l o ,   m y   n a m e   i s   t h i s   l o w   c a r b ? @B� ÿþÿ�H i ,   t h i s   i s   d e l i c i o u s ! @B�

See file in the repo with this name. There are other files. Note - these are phrases a user/Tobii predefine. They might not actually use them

TobiiDynavox - Snap+Core

https://uk.tobiidynavox.com/products/snap-core-first
A UWP App - So sandboxed - Data can be found at %LOCALAPPDATA%\Packages\
uses swiftkey at leaat on windows

PRC NuVoice

Uses LAM - Language Activity Monitoring.
PRC defined this format many years ago. More on its structure here: https://aacinstitute.org/language-sample-collection-in-aac/ - https://eric.ed.gov/?id=ED441300
Its really not designed for this kind of collection. More like every hit. Its going to take quite a lot of parsing for it to be useful. For example this is hello world - and being corrected then spoken. You would have to look for SPE commands - and what preceded it until you hit a DEL or CLEAR command.
See LAM-example.txt in this repo for a full example.
see https://github.com/CoughDrop/coughdrop/blob/master/lib/stats.rb#L1217 for a parser that @whitmer started. Needs testing and iterating

        ### CAUTION ###
The following data represents personal communication.
Please respect privacy accordingly.

Automatic Data Logging NuVoice PASS
Version 2.17 2020-09-12
Prentke Romich Company

*[YY-MM-DD=21-05-10]*
11:43:25.017 RECORD ON
11:43:30.697 LOC =I5
11:43:32.217 LOC =G4 [ ]
11:43:32.218 SPE "h"
11:43:32.937 LOC =C3 [ ]
11:43:32.938 SPE "e"
11:43:32.952 LOC =C3 [ ]
11:43:32.953 SPE "e"
11:43:35.657 LOC =J5 [ ]
11:43:36.903 LOC =J4 [ ]
11:43:36.905 SPE "l"
11:43:37.097 LOC =J4 [ ]
11:43:37.098 SPE "l"
11:43:37.577 LOC =I3 [ ]
11:43:37.577 SPE "o"
11:43:38.297 LOC =E6 [ ]
11:43:38.306 SPE " "
11:43:41.737 LOC =B3 [ ]
11:43:41.738 SPE "w"
11:43:41.769 LOC =B3 [ ]
11:43:41.770 SPE "w"
11:43:42.537 LOC =I3 [ ]
11:43:42.538 SPE "o"
11:43:43.177 LOC =D3 [ ]
11:43:43.178 SPE "r"
11:43:44.137 LOC =J4 [ ]
11:43:44.138 SPE "l"
11:43:44.777 LOC =D4 [ ]
11:43:44.778 SPE "d"
11:43:49.097 LOC =J5 [ ]
11:43:49.257 LOC =J5 [ ]
11:43:49.417 LOC =J5 [ ]
11:43:49.657 LOC =J5 [ ]
11:43:50.057 LOC =J5 [ ]
11:43:50.937 LOC =I3 [ ]
11:43:50.938 SPE "o"
11:43:51.648 LOC =D3 [ ]
11:43:51.650 SPE "r"
11:43:52.457 LOC =J4 [ ]
11:43:52.458 SPE "l"
11:43:53.257 LOC =D4 [ ]
11:43:53.258 SPE "d"
11:43:53.897 LOC =E6 [ ]
11:43:53.906 SPE " "

iOS only

Proloquo4Text

Something may be in this backup https://www.assistiveware.com/support/proloquo4text/protect-customizations/export-to-dropbox

MultiPlatform

CoughDrop

https://www.coughdrop.com
Uses OBL https://www.openboardformat.org/logs

From Brian:

"Extracting just strings from an obl file should be pretty easy. it's just JSON and you'd do the following pseudo-code:

  session['events'].each |event|
    if event['label']
      // only button events have labels
    end
  end
end

obla files would have gibberish in the label field for any words that weren't considered "core" for that user"

AceCentre/AAC-Corpora-Collecting