Can't read root directory with `StorageStore`
Closed this issue · 13 comments
With localStorage Storage Backend,
build on top of SyncStore,
we can't read root dir.
This because unsymetric serialization/deserialization
of Uint8Array data into string.
- serialisation SyncStore.js:L40: data.toString() // "1,2,3,4..."
- deserialisation SyncStore.js:L31: encode(data); // TextEncoder asciiToUint8
stack:
h ApiError.js:99
FileError ApiError.js:65
ENOTDIR ApiError.js:80
getDirListing SyncStore.js:364
readdirSync SyncStore.js:251
BrowserFS filesystem.js:503
BrowserFS filesystem.js:9
E filesystem.js:5
readdir filesystem.js:502
Note: strings are stored as utf16, that's about 2 bytes per chars
Simple fix: (storage size = data length * 7) (about 2.57+1 ascii char by data byte)
- serialisation SyncStore.js:L40:
data.toString()
// "1,2,3,4..." - deserialisation SyncStore.js:L31:
Uint8Array.from(data.split(','), Number)
Base64 fix: (storage size = data length * 2.7) (about 1.33 ascii char by data byte)
- serialisation:
window.btoa(Array.from(data, x=>String.fromCharCode(x)).join(''))
- deserialization:
Uint8Array.from(window.atob(data), x=>x.charCodeAt(0))
Utf16 fix: (storage size = data length * 1) (need to handle odd data lenght)
itoc = (x)=>String.fromCodePoint(x>>11!==27?x:x|0x10000)
ctoi = (x)=>x.codePointAt(0)
// Uint16Array will discard high bit- serialisation:
Array.from(new Uint16Array(data.buffer), itoc).join('')
- deserialisation:
new Uint8Array(Uint16Array.from(data, ctoi).buffer)
With utf16, to handle odd data length:
- on serialization: (add last odd byte, and count of extra data bytes; be carefull with LE/BE)
...new Uint16Array(data.buffer, 0, length/2)...
// discard odd last byte- if
(length%2)
- then append
String.fromCodePoint(data[length-1] * 0x101, 0x303)
- else append
String.fromCodePoint(0x202)
- on deserilization: (remove extra bytes)
- len = data.length, extra = data[len -1]
- return new Uint8Array(data, 0, len - extra)
When manipulating buffer, mostly byteoffset should be 0.
But what if byte offset is not 0, and especially if it is odd...
so do ...new Uint16Array(data.buffer, offset/2, (offset%2 + length)/2)...
and then carry {o2: offset%2
, e2: (offset%2 + length)%2 +2
} into then extra char:
- extraCodePoint = ((o2<<2) + (e2)) * 0x101
- len = data.length, extra = data[len -1] & 3, offset = data[len -1] >> 2
- return new Uint8Array(data.buffer, offset, len - extra - offset)
To avoid extra bug, we could do slice buffer:
- return new Uint8Array(data.buffer.slice(offset, len - extra - offset)
function utf16Encode(data8) {
const itoc = (x)=> String.fromCodePoint((x>>11 !== 27) ? x : x|0x10000);
const btoc = (x)=> itoc(x * 0x101); // encode twice to workaround endianess
let {length, buffer, byteOffset, byteLength} = data8 ;
const pfx = byteOffset % 2 ; // extra start byte to embed
const sfx = (byteOffset + byteLength) % 2 ; // either odd and padding end bytes
byteOffset -= pfx ; // embed start byte
byteLength += pfx - sfx ; // ... and remove odd end byte
const data16 = new Uint16Array(buffer, byteOffset, byteLength/2);
const dataChars = Array.from(data16, itoc);
const lastChar = btoc(data8[length - 1]); // odd end byte
const extraChar = btoc(4*pfx + sfx + 2); // padding bytes count
if (sfx) dataChars.push(lastChar);
dataChars.push(extraChar);
return dataChars.join('');
}
function utf16Decode(string) {
const ctoi = (x)=> x.codePointAt(0); // Uint16Array will trunc extra high bit
const data16 = Uint16Array.from(string, ctoi);
const data8 = new Uint8Array(data16.buffer);
let {length, buffer, byteOffset, byteLength, byteEnd} = data8 ;
const pfx = data8[length - 1] / 4 |0; // embeded null byte
const sfx = data8[length - 1] % 4 |0; // duplicate odd end byte, and extra word
byteOffset += pfx ;
byteLength -= pfx + sfx ;
byteEnd = byteOffset + byteLength ;
return new Uint8Array(buffer, byteOffset, byteLength); // performance
// return new Uint8Array(buffer.slice(byteOffset, byteEnd)); // compatibility
}
Thanks for the report. We'd appreciate a PR!
The encoding/decoding of Uin8Array
s was fixed in 2054a8e, which was not published. I went ahead and published all of the changes on core
up until now (core@0.1.0
), and I've updated all of the backends to use the new core version as well as publish those changes.
Your issue with encoding/decoding should be fixed in v0.0.5.
As for UTF16-- I don't think BFS supports using UTF16 with backends since some (e.g. zip) do not support that encoding, and all the backends must implement the internal BFS API.
Does this resolve your issue?
The encoding/decoding of Uin8Array
in core
is about serializing directory object to Uint8Array inode data
.
Her is about encoding Uin8Array inode data into string
.
And I also use you core fixe in my code and still have this bug.
so that fix doesn't resolve this issue, which is specific to fs-dom SyncStore.js
.
And UTF16 is a specific concern of localStorage
. (most of people don't think about because implementation detail are no really published, but it is a problem from large file or large filesystem)
I have to code a test suite for my code proposition,
and then integrate in my code to live test.
After that I could do a git clone and pull request, it is simple.
(Maybe one or two week depending of my planning)
About perf/compat, I think anybody switching between TypedArray
an ArrayBuffer
should learn {ByteOffset, ByteLength}
concern, so I prefer performance solution (and so I have updated my osjs adapter according that, it is not sot terriblous)
((length===byteLength ? unit8array.buffer : unit8array.buffer.slice(byteOffset, byteEnd)
))
I saw you core
fix 2 weeks ago, and I think you have not publish because some other fix are to be tested. Thank you.
Test:
/**
* intData: ... < 128
* surrogateData: 0 or D8-DF
* mixedData: 0...255 including D8-DF
* randomData: any
*/
surrogateData = // 0 or D8-DF
[ '',
'D800', '00D9',
'DADB', '00DCDE00',
'D8D9DA00', '00DBDCDE',
'D8D9DADB', '00DCDEDFD800',
'D8D9DADBDC00','00DEDFD8D9DA',
'D8D9DADBDCDE','00DFD8D9DADBDC00',
''
].join('0000'+'0000')
.match(/../g).map(x=>parseInt(x,16))
intData = Array.from(surrogateData, (v,i)=>i);
mixedData = Array.from(surrogateData, (v,i)=>v||i*17);
randomData = Array.from(surrogateData, (v,i)=>v||Math.random()*256|0);
assert=(test,msg)=>{
if (test) return;
debugger;
throw new Error(`Assertion failed: ${msg}`)
}
test=(data)=>{
buffer = Uint8Array.from(data).buffer;
len = data.length;
nbSurrogate = new Uint16Array(buffer).reduce(
(a,v)=>((v>>11===27)?a+1:a), 0
);
for(i=0;i<8;i++)
for(j=0;j<8;j++)
{
input = new Uint8Array(buffer,i,len-i-j);
serial = utf16Encode(input)
output = utf16Decode(serial)
assert(`${input}`===`${output}`, 'equality')
// extra bytes:
// - counter: 2 bytes
// - padding: 2 bytes
// - surrogate for copied last byte: 2 byte
// - other surrogate: 2*nbSurrogate
assert(2*serial.length >= input.length+2, 'too short')
assert(2*serial.length <= input.length+6+2*nbSurrogate, 'too long')
}
}
test(intData);
test(surrogateData);
test(mixedData);
test(randomData);
"succes"
Edit: utf16Encode/utf16Decode above are fixed
==> then Test Passed
To be exhaustive, serial data should be wrote into localStorage on a first pass,
then after closing and reopening the browser, we should do reload data to finish test....
TextEncoder/TextDecoder are about unicode string, but don't apply to any binary data.
If you decode raw data, TextDecoder could either throw or correct bad unicode sequences ;
And then in turn, TextEncoder will encode different data.
Try that:
input = new Uint8Array([192,192,192]);
string = new TextDecoder().decode(input);
output = new TextEncoder().encode(string);
({input, output, string})
Output: Uint8Array(9) [ 239, 191, 189, … ]
TextEncoder will only work with 7bits ascii, else valid utf8, no raw binary.
And my code (x>>11 !== 27)? ... : ...
is to avoid such invalid utf16 code.
My apologies, I fixed the usage of
Uint8Array.toString()
in the core but completely missed it being used inStorageStore.put
. I've made the fix in 99e92c7. Does v0.0.6 fix the issue?
No, still Error: Error: ENOTDIR: File is not a directory., '/'
but inode data are completly different
previous inode mode : 12332 / 0x302c
new inode mode : 49135 / 0xbfef
stored inode data : "\u0000\u0010\u0000\u0000�A\u0000���\u0001�xB\u0000���\u0001�xB\u0000���\u0001�xB\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000de4547d9-906a-42e4-9db6-c804a84d1b00"
id of deserialized inode: "xB\u0000���\u0001�xB\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000de4547d9-906a-42e4-9db6-c804a84d1b00"
As you can see, ascii data (node id appenedd at end of data buffer) are well encoded, but raw binary (inode numbers) result in a different and too large (>38) output data.
This should be fixed by 792cefe on the core. I will close the issue after I've updated all of the DOM backends to use the latest internal API (probably 0.2.0).
That should work with 'hex' or 'base64' encoding,
but inefficient storage (will consume 2.7x more storage than binary data).
This was an issue with the core that was resolved.