/AAC-Corpora-Collecting

Research figuring out how we collect corpora/message histories from AAC system

Primary LanguagePythonMIT LicenseMIT

AAC-Corpora-Collecting

Research figuring out how we collect corpora/message histories from AAC system

Windows OS Based AAC Software

Grid 3

  • Download a demo: https://thinksmartbox.com/product/grid-3/

  • Prediction engine is SwiftKey (you can see the .net sdk files of learned.json)

  • Stores all preferences in C:\Users\Public\Documents\Smartbox\Grid 3\Users\UserName\langCode\Phrases\history.sqlite

    UserName - is whatever users have been set by the Grid. Usually there will only be one. LanCode = e.g en-gb

NB: This C:\Users\Public\Documents\ directory is in the Registry - HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\Shell Folders\Common Document

FOR /F "tokens=3*" %%A IN ('REG.EXE QUERY "HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\Shell Folders" /V "Common Documents" 2^>NUL ^| FIND "REG_SZ"') DO SET CommonDocs=%%B

  • Data is in Table PhraseHistory - Phraseid is matched on Table Phrases -id. Each item has a timestamp. So you can do counts on when and how many phrases are in each History. See this SQLlite database for what the History data looks like

**Update: Now underway in this PR Baton-donation/app#3

TobiiDynavox - Communicator

For example we have one called "Speech history.phr"

It has a file like this:

 ÿþÿ+H e l l o   h o w   a r e   y o u   I   t h i n k   i t ' s   o v e r   c o o k e d . @B� ÿþÿ�H e l l o   h o w   a r e   y o u @B� ÿþÿ�H e l l o @B� ÿþÿ H e l l o ,   m y   n a m e   i s   t h i s   l o w   c a r b ? @B� ÿþÿ�H i ,   t h i s   i s   d e l i c i o u s ! @B� 

See file in the repo with this name. There are other files. Note - these are phrases a user/Tobii predefine. They might not actually use them

TobiiDynavox - Snap+Core

PRC NuVoice

        ### CAUTION ###
The following data represents personal communication.
Please respect privacy accordingly.

Automatic Data Logging NuVoice PASS
Version 2.17 2020-09-12
Prentke Romich Company

*[YY-MM-DD=21-05-10]*
11:43:25.017 RECORD ON
11:43:30.697 LOC =I5
11:43:32.217 LOC =G4 [ ]
11:43:32.218 SPE "h"
11:43:32.937 LOC =C3 [ ]
11:43:32.938 SPE "e"
11:43:32.952 LOC =C3 [ ]
11:43:32.953 SPE "e"
11:43:35.657 LOC =J5 [ ]
11:43:36.903 LOC =J4 [ ]
11:43:36.905 SPE "l"
11:43:37.097 LOC =J4 [ ]
11:43:37.098 SPE "l"
11:43:37.577 LOC =I3 [ ]
11:43:37.577 SPE "o"
11:43:38.297 LOC =E6 [ ]
11:43:38.306 SPE " "
11:43:41.737 LOC =B3 [ ]
11:43:41.738 SPE "w"
11:43:41.769 LOC =B3 [ ]
11:43:41.770 SPE "w"
11:43:42.537 LOC =I3 [ ]
11:43:42.538 SPE "o"
11:43:43.177 LOC =D3 [ ]
11:43:43.178 SPE "r"
11:43:44.137 LOC =J4 [ ]
11:43:44.138 SPE "l"
11:43:44.777 LOC =D4 [ ]
11:43:44.778 SPE "d"
11:43:49.097 LOC =J5 [ ]
11:43:49.257 LOC =J5 [ ]
11:43:49.417 LOC =J5 [ ]
11:43:49.657 LOC =J5 [ ]
11:43:50.057 LOC =J5 [ ]
11:43:50.937 LOC =I3 [ ]
11:43:50.938 SPE "o"
11:43:51.648 LOC =D3 [ ]
11:43:51.650 SPE "r"
11:43:52.457 LOC =J4 [ ]
11:43:52.458 SPE "l"
11:43:53.257 LOC =D4 [ ]
11:43:53.258 SPE "d"
11:43:53.897 LOC =E6 [ ]
11:43:53.906 SPE " "

iOS only

Proloquo4Text

MultiPlatform

CoughDrop

From Brian:

"Extracting just strings from an obl file should be pretty easy. it's just JSON and you'd do the following pseudo-code:

  session['events'].each |event|
    if event['label']
      // only button events have labels
    end
  end
end

obla files would have gibberish in the label field for any words that weren't considered "core" for that user"

AAC Speech Assistant

iOS

Message History can be exported as a plain text file

Android

You cant export message history

Predictable

??