CDMI Extensions - S3 Exports
dslik opened this issue · 11 comments
Extend the CDMI export functionality to allow a CDMI client to specify that a given container within a CDMI namespace should be exported (made available) as a bucket for access by the S3 cloud data access protocols. This is analogous to existing export functionality for CIFS and NFS exports.
Changes to the spec involve:
- Indicate that the export type is S3
- Adding a new "Bucket" section to the Exports clause
- Defining what parameters need to be added to define a bucket export
- Documenting mapping behaviours between bucket key/value semantics and CDMI hierarchical semantics
- Investigate if any additional information is needed to express permission mapping
Also see proposed extension #248 which addresses a data mapping issue between buckets and CDMI's hierarchical data model
Will also have to look to see if there are any namespace issues (plus security changes)
Requirements:
- As a cloud storage manager, I want to be able to export all or part of CDMI namespaces via the S3 protocol, so that S3 clients can access and store data.
- As a cloud storage manager, I want users to be able to export all or part of CDMI namespaces via the S3 protocol, so that their S3 clients can access and store data.
- As a cloud storage manager, I want to be able to configure and restrict how S3 clients are able to access data within a CDMI namespace, so that I can meet security and data integrity objectives.
- As a cloud storage user, I want to be able to use standard S3 tools to access cloud data, so that I have access to a wide variety of tools and workflows.
- As a cloud storage user, I want to be able to have access to the same data via file system and S3 interfaces, so that I can use a both file system and object storage tools and utilities.
Notes:
- Read only flag?
- Two ways to access buckets - via a bucket name at the end of a URL, or via a bucket name in a domain name. Do we need to allow which (or both) types are used?
From 2024-01-26 TWG meeting:
- Need to address differences in operations permitted between file system protocols and S3, e.g. partial updates.
- Need to address differences in permitted resource names (e.g. S3 allows an object named "//./*", but a filesystem does not.
There is also wide variability of permitted object names across different S3 implementations.
To do:
- Add requirements for permission mapping (e.g. read-only)
- Add requirement for specifying where the bucket is accessible (via which URIs)
I checked a few OSes. It seems ubuntu bash shells and python do support UNC paths using the asterisk as a wildcard.
The windows cmd prompt does not like UNC
Windows cmd:
cd \\.\
'\\.\'
CMD does not support UNC paths as current directories.
dir \\.\
The filename, directory name, or volume label syntax is incorrect.
>dir \\.\*
The system cannot find the path specified.
Windows Powershell:
PS C:\Users\garym> dir //./
dir : Cannot find path '//./' because it does not exist.
At line:1 char:1
+ dir //./
+ ~~~~~~~~
+ CategoryInfo : ObjectNotFound: (//./:String) [Get-ChildItem], ItemNotFoundException
+ FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetChildItemCommand
PS C:\Users\garym> dir //./*
Get-ChildItem : Cannot retrieve the dynamic parameters for the cmdlet. Object reference not set to an instance of an
object.
At line:1 char:1
+ dir //./*
+ ~~~~~~~~~
+ CategoryInfo : InvalidArgument: (:) [Get-ChildItem], ParameterBindingException
+ FullyQualifiedErrorId : GetDynamicParametersException,Microsoft.PowerShell.Commands.GetChildItemCommand
Ubuntu on Windows Bash Shell:
root@EARTH:~# cd //./
root@EARTH://# cd //./
root@EARTH://# ls //./
Docker dev init lib64 media proc sbin sys var
bin etc lib libx32 mnt root snap tmp
boot home lib32 lost+found opt run srv usr
root@EARTH://# ls //./* |more
//./init
//./Docker:
host
//./bin:
NF
VGAuthService
X11
....
Ubuntu 20 bash shell:
garym@pro:~$ cd //./
garym@pro://$ ^C
garym@pro:~$ ls //./
bin core home lib64 media proc sbin swap.img usr
boot dev lib libx32 mnt root snap sys var
cdrom etc lib32 lost+found opt run srv tmp
garym@pro://$ ls //./* |more
//./core
//./swap.img
...
garym@pro://$ python
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> print (os.listdir ('//./'))
['opt', 'snap', 'lib64', 'etc', 'var', 'tmp', 'lost+found', 'boot', 'sys', 'run', 'home', 'lib', 'root', 'libx32', 'swap.img', 'srv', 'core', 'media', 'usr', 'dev', 'lib32', 'mnt', 'bin', 'proc', 'cdrom', 'sbin']
>>> arr = next(os.walk('//./'))[2]
>>> print (arr)
['swap.img', 'core']
In CDMI, a "path based namespace" is defined as:
A root path (which is by default "/"), plus "one or more container names that are separated by forward slashes (“/”) and that end with a forward slash (“/”)", plus an optional data object name, plus an optional "?" if the path is a link.
We place no restrictions except as documented in section 5.5.6, that "/" and "?" shall not be permitted in an object name.
A trailing question mark in a CDMI path refers to a link. This is stripped out by most web libraries, as RFC 3986 says this is a the separator between the path and the query parameters.
S3 does allow for object names to include "/" and "?", so we will need to define how these are mapped.
So do we need to handle file names with "?", % encoded? Percent encoding would also be used for "/", "*", etc.
From RFC 3986, section 2.2, reserved characters in URIs that are precent encoded are:
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
Gary M. to explore mapping requirements from S3 object names to CDMI object paths.
- Also, how should we handle escaping issues, e.g. \n, \u1234, etc? Yes, we need to investigate this as well.
It looks like the bash shell will send the '\.' character sequence to the the file system with esc patterns. The file system does create directories.
Make Directory:
garym@yocto:$ mkdir \$ mkdir \\
garym@yocto:
garym@yocto:$ mkdir \\.$ mkdir \\.\
garym@yocto:
List Directory (No Path Specifiers)
garym@yocto:~$ ls -al
drwxr-xr-x 33 garym garym 4096 Feb 8 19:22 .
drwxr-xr-x 3 root root 4096 Jul 26 2022 ..
drwxrwxr-x 2 garym garym 4096 Feb 8 19:20 ''
drwxrwxr-x 2 garym garym 4096 Feb 8 19:21 '\'
drwxrwxr-x 2 garym garym 4096 Feb 8 19:21 '\.'
drwxrwxr-x 2 garym garym 4096 Feb 8 19:22 '\.'
List Directory (Path Specifiers)
garym@yocto:~$ ls -al \
total 8
drwxrwxr-x 2 garym garym 4096 Feb 8 19:20 .
drwxr-xr-x 33 garym garym 4096 Feb 8 19:22 ..
garym@iyocto:~$ ls -al \\
total 8
drwxrwxr-x 2 garym garym 4096 Feb 8 19:21 .
drwxr-xr-x 33 garym garym 4096 Feb 8 19:22 ..
garym@yocto:~$ ls -al \\.
total 8
drwxrwxr-x 2 garym garym 4096 Feb 8 19:21 .
drwxr-xr-x 33 garym garym 4096 Feb 8 19:22 ..
garym@yocto:~$ ls -al \\.\
total 8
drwxrwxr-x 2 garym garym 4096 Feb 8 19:22 .
drwxr-xr-x 33 garym garym 4096 Feb 8 19:22 ..
List Directory (Wildcard Path Specifiers)
garym@halevaiyocto:~$ ls -al \*
'':
total 8
drwxrwxr-x 2 garym garym 4096 Feb 8 19:20 .
drwxr-xr-x 33 garym garym 4096 Feb 8 19:22 ..
'\':
total 8
drwxrwxr-x 2 garym garym 4096 Feb 8 19:21 .
drwxr-xr-x 33 garym garym 4096 Feb 8 19:22 ..
'\.':
total 8
drwxrwxr-x 2 garym garym 4096 Feb 8 19:21 .
drwxr-xr-x 33 garym garym 4096 Feb 8 19:22 ..
'\.':
total 8
drwxrwxr-x 2 garym garym 4096 Feb 8 19:22 .
drwxr-xr-x 33 garym garym 4096 Feb 8 19:22 ..
Change Directory:
garym@yocto:$ cd \\./\.$
garym@yocto:
Create File:
garym@yocto:$ sudo touch \\$ sudo touch \\.
garym@yocto:
garym@yocto:$ sudo touch \\.\$ ls -l \*
garym@yocto:
-rw-r--r-- 1 root root 0 Feb 8 19:46 ''
-rw-r--r-- 1 root root 0 Feb 8 19:46 '\'
-rw-r--r-- 1 root root 0 Feb 8 19:46 '\.'
-rw-r--r-- 1 root root 0 Feb 8 19:46 '\.'
After a review of the above, we have determined that we will need to define a reversible mapping between the allowable S3 object naming restrictions and common file system naming restrictions. E.g. for an S3 objet named "/*?/", etc.
Proposed capabilities:
Cloud storage systemwide capabilities - Add to section 12.2.7, Table 124
"cdmi_containers" - "If present and "true", the CDMI server supports container objects".
cdmi_dataobjects_as_containers - "If present and "true", the CDMI server supports accessing data objects as container objects.
cdmi_containers_as_dataobjects - "If present and "true", the CDMI server supports accessing container objects as data objects.
Data Object Capability - Add to section 12.2.10, Table 127
cdmi_as_container - If present and “true”, this capability indicates that the CDMI server shall support the ability to access the data object as a container.
Data Object Capability - Add to section 12.2.11, Table 128
cdmi_as_dataobject - If present and “true”, this capability indicates that the CDMI server shall support the ability to access the container as a data object.
An open issue we need to discuss: S3 permits the following two separate objects to coexist in a bucket: "a", and "a/", each with a separate value. In CDMI, these would be the same object. Do we need to have a GET as container for "a" return the container object representation for "a/"?
Latest draft extension: https://github.com/SNIA/CDMI-spec/blob/main/cdmi_extensions/s3_exports/s3_exports_2.0.0.pdf
AWS S3 Constraints
Data Model:
- The Amazon S3 data model is a flat structure
- You create a bucket, and the bucket stores objects.
- There is no hierarchy of subbuckets or subfolders.
- You can infer logical hierarchy using key name prefixes and delimiters as the Amazon S3 console does.
AWS S3 has three (3) sets of rules/constraints placed on object stores:
- S3 bucket names
- S3 Object Names
- S3 Key Names
Bucket Naming Constraints:
- Bucket names must be between 3 (min) and 63 (max) characters long.
- Bucket names can consist only of lowercase letters, numbers, dots (.), and hyphens (-).
- Bucket names must begin and end with a letter or number.
- Bucket names must not contain two adjacent periods.
- Bucket names must not be formatted as an IP address (for example, 192.168.5.4).
- Bucket names must not start with the prefix xn--.
- Bucket names must not start with the prefix sthree- and the prefix sthree-configurator.
- Bucket names must not end with the suffix -s3alias. This suffix is reserved for access point alias names. For more information, see Using a bucket-style alias for your S3 bucket access point.
- Bucket names must not end with the suffix --ol-s3. This suffix is reserved for Object Lambda Access Point alias names. For more information, see How to use a bucket-style alias for your S3 bucket Object Lambda Access Point.
- Bucket names must be unique across all AWS accounts in all the AWS Regions within a partition. A partition is a grouping of Regions. AWS currently has three partitions: aws (Standard Regions), aws-cn (China Regions), and aws-us-gov (AWS GovCloud (US)).
- A bucket name cannot be used by another AWS account in the same partition until the bucket is deleted.
- Buckets used with Amazon S3 Transfer Acceleration can't have dots (.) in their names. For more information about Transfer Acceleration, see Configuring fast, secure file transfers using Amazon S3 Transfer Acceleration.
Directory bucket naming Constraints
Directory bucket names must:
- Be unique within the chosen AWS Region and Availability Zone.
- Be no more than 3–63 characters long, including the suffix.
- Consists only of lowercase letters, numbers and hyphens (-).
- Begin and end with a letter or number.
- Must include the following suffix: --azid--x-s3.
Object key naming Constraints
- The object key name is a sequence of Unicode characters with UTF-8 encoding of up to 1,024 bytes long. Object key names are case sensitive.
- Object key names with the value "soap" aren't supported for [virtual-hosted-style requests]. For object key name values where "soap" is used, a [path-style URL] must be used instead.
- The console uses the key name prefixes (Development/, Finance/, and Private/) and delimiter ('/') to present a folder structure.
- Amazon S3 supports buckets and objects, and there is no hierarchy. However, by using prefixes and delimiters in an object key name, the Amazon S3 console and the AWS SDKs can infer hierarchy and introduce the concept of folders.
- The Amazon S3 console implements folder object creation by creating a zero-byte object with the folder prefix and delimiter value as the key. These folder objects don't appear in the console. Otherwise they behave like any other objects and can be viewed and manipulated through the REST API, AWS CLI, and AWS SDKs.
- Objects with key names ending with period(s) "." downloaded using the Amazon S3 console will have the period(s) "." removed from the key name of the downloaded object.
- To download an object with the key name ending in period(s) "." retained in the downloaded object, you must use the AWS Command Line Interface (AWS CLI), AWS SDKs, or REST API.
- Objects with a prefix of "./" must be uploaded or downloaded with the AWS Command Line Interface (AWS CLI), AWS SDKs, or REST API. You cannot use the Amazon S3 console.
- Objects with a prefix of "../" cannot be uploaded using the AWS Command Line Interface (AWS CLI) or Amazon S3 console.
- Safe characters:
The following character sets are generally safe for use in key names:
- Alphanumeric characters: 0-9, a-z, A-Z
- Special characters: Exclamation point (!), Hyphen (-), Underscore (_), Period (.), Asterisk (*), Single quote ('), Open parenthesis ((), Close parenthesis ())
- Characters that might require special handling:
The following characters in a key name might require additional code handling and likely need to be URL encoded or referenced as HEX. Some of these are non-printable characters that your browser might not handle, which also requires special handling:
- Ampersand ("&")
- Dollar ("$")
- ASCII character ranges 00–1F hex (0–31 decimal) and 7F (127 decimal)
- 'At' symbol ("@")
- Equals ("=")
- Semicolon (";")
- Forward slash ("/")
- Colon (":")
- Plus ("+")
- Space – Significant sequences of spaces might be lost in some uses (especially multiple spaces)
- Comma (",")
- Question mark ("?")
Characters to avoid:
- Backslash ("")
- Left curly brace ("{")
- Non-printable ASCII characters (128–255 decimal characters)
- Caret ("^")
- Right curly brace ("}")
- Percent character ("%")
- Grave accent / back tick ("`")
- Right square bracket ("]")
- Quotation marks
- 'Greater Than' symbol (">")
- Left square bracket ("[")
- Tilde ("~")
- 'Less Than' symbol ("<")
- 'Pound' character ("#")
- Vertical bar / pipe ("|")
XML related object key constraints
As specified by the XML standard on end-of-line handling, all XML text is normalized such that single carriage returns (ASCII code 13) and carriage returns immediately followed by a line feed (ASCII code 10) are replaced by a single line feed character. To ensure the correct parsing of object keys in XML requests, carriage returns and other special characters must be replaced with their equivalent XML entity code when they are inserted within XML tags. The following is a list of such special characters and their equivalent entity codes:
- ' as '
- ” as "
- & as &
- < as <
-
as >
- \r as or
- \n as or
The following example illustrates the use of an XML entity code as a substitution for a carriage return. This DeleteObjects request deletes an object with the key parameter: /some/prefix/objectwith\rcarriagereturn (where the \r is the carriage return).
<Delete xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Object>
<Key>/some/prefix/objectwith carriagereturn</Key>
</Object>
</Delete>