Crayon: Preprocessing tags update

Question

Crayon: Preprocessing tags update

Opened this issue 4 years ago · 14 comments

This affects files and folders in the assets directory with these formats

name.crayon_img
name.crayon_img.gz
name.TEXTURE_FORMAT.crayon_spritesheet
name.TEXTURE_FORMAT.png
name.crayon_anim
(And later on #190)

The current method is a bit hard to read and involves renaming files. I suggest this instead.

In every folder we check for a crayon_info.txt file. If present we read that to understand how to modify the files within. It would be divided into sections by platform and each line under a platform will be in the format of
name command parameters

Here's an example

DREAMCAST:
My_Spritesheet DTEX [Format ID (This is a dreamcast)] [Other parameters such as --c for compression]

PC:
My_Spritesheet BITMAP [Pixel format (ARGB8888 by default)] //This will convert the PNG into an uncompressed bitmap with a valid header

ALL:
My_Romdisk ROMDISK [--GZ (If you want to GZ compress it)]

In this case both My_Spritesheet and My_Romdisk are folders. On Dreamcast we only make the DTEX texture, but on PC we make a normal uncompressed bitmap. On both/all platforms we would turn My_Romdisk into a romdisk (That could be compressed).

Answer 1 · 2020-06-28T13:30:35.000Z

Make some of these to take a PNG and make the

VMU LCD image
Dreamcast Savefile Icon
Dreamcast Eyecatcher Icon

All as a .bin that way when I later support other systems I can easily convert the PNG to any format binary I need (Eg. I have one icon loader, but on DC its in the ARGB4444 format, gamecube would be the binary in a different format and on say PS2 it converts that into a 1-frame 3D model of the icon)

Answer 2 · 2020-11-18T14:26:11.000Z

The stuff from the 28th of June comment, issue #254 and issue #190 will NOT be a part of this issue and will be completed at a later date. To summarise, these are the parts and an overview of their behavior

name is either a file name or a folder name of something in the current directory
command is the kind of command we want to run. We can choose from the following
- ROMDISK (Formerly "crayon_img")
  - Folders only
  - Will call the "genromfs" program in KOS to make a romdisk out of the folder
  - If the parameters part contains "--gz" then also gz compress it. This will replace "crayon_img.gz"
- DTEX (Formerly TEXTURE_FORMAT / TEXTURE_FORMAT.crayon_spritesheet)
  - Both folders and image files
  - For files, it will run the "texconv" program with the given parameters. Only "format" is required, any other parameter is optional
  - For folders it first runs the "texturepacker" program on it first to make a png spritesheet, then it runs the "texconv" program on that png with the given parameters. Note that the parameters are entirely for the "texconv" program and not for the "texturepacker" program.
- ANIMATION (Formerly the "name.crayon_anim" file)
  - Files only, and specifically only files in a folder that we ran DTEX on
  - parameters --width / --w, --height, --h, --frames, --f. All 3 parameters are required and would create the name.crayon_anim file we currently have. The file is just one line with 3 numbers; ("%d %d %d", WIDTH, HEIGHT, FRAMES)

Also instead of this file being called crayon_info.txt, maybe it should be called .crayon_asset_info? What do you think @Namdrib ?

Answer 3 · 2020-11-18T14:42:00.000Z

That clears it up quite a bit

Question: is it always going to be the format:

DREAMCAST:
Blah blah blah

PC:
Blah blah blah

ALL:
Blah blah blah

Yo je clear, this is asking whether it will always have the Dreamcast, pc and all rules, and always in that order

Or can it come with any combination (eg just dreamcast and all, or just pc, or just all, etc) and in any order?

And how much input validation should the script be doing? Eg should it just throw an error and exit if the format is wrong, etc.

Answer 4 · 2020-11-18T15:10:49.000Z

Its ok if the file only has some platforms and in whatever order. Something like this is also fine

ALL:
Blah blah blah

DREAMCAST:
Blah blah blah

The script would read line-by-line. It will ignore blank lines and when it comes across a PLATFORM: line it will either do or ignore the commands under it depending on what platform you are building. If it comes across a Blah blah blah line before the first platform line then it throws an error. Eg:


Blah blah blah

DREAMCAST:
Blah blah blah

The definition of a Blah blah blah line is we tokenise it into 3 parts (name command parameters). If name isn't a valid file/folder name, command isn't a valid command or there's just too few parts (Eg. the line is just my_file.png. No command, therefore invalid) then we throw an error.

Don't worry about checking if parameters is valid since the programs called should do the job for us. Just make sure to check those program's return values for error-ous behaviour.

Answer 5 · 2020-11-19T13:21:51.000Z

Since this has been assigned to @Namdrib I thought I might throw a few spanners into the works -- what if:

a file name has spaces
a file name starts with PC: etc.
the file name is an absolute path or points into the parent directory
two rules are specified for the same file
- in the same section
- in ALL as well as the current platform
  - when ALL comes first vs platform first
a rule is specified for the output of another one
- before the rule that produces its input

Answer 6 · 2020-11-19T13:48:29.000Z

Those are all good points I didn't consider.

a file name that has spaces. Yeah, this gets messy. Windows prevents certain characters in a file name, but unix doesn't really do that. One solution is to not allow space-d names, or we could (optionally) surround the name in quotes. So all of the below will correctly be read as one name:

hello
"hello"
"hello world"

That would mean we'd need to check if the first non-whitespace character is a " and if so find the next " so we know where the name ends, then tokenise. @Namdrib can choose if we'd want to support something like "hello nice \"house\""

a file name starts with PC: etc. For "Starting with PC:", we can split a line on spaces and if there's only one part (ie. the PLATFORM:) then we know its a change platform command, if its something like PC: ROMDISK then it will search for a folder named PC: and if present it will make a romdisk out of it

the file name is an absolute path or points into the parent directory No absolute paths. If the user put in the absolute path then it will treat it the same as a file not existing. Pointing into a directory other than the current one isn't allowed since the file should only look at files in this folder. So the name should be a file/folder in the current directory and nowhere else. (NOTE: Some commands can affect files in children dirs, but the name part would still only reference a file in the current folder)

two rules are specified for the same file. I'll assume "In the same section" means twice under the same "PLATFORM" and also this question is referring to two rules with the same name and command, but potentially different parameters (Since having two rules for the same name but with different commands can be valid depending on the user's choice). Intended behaviour should be to throw an error/warning if it observes two rules with the same name and command. This would require one pass to make a list/map of rules, check for common name and command fields like this, then process them. NOTE: its fine to have a rule for say "DREAMCAST" platform and one for "PC" that have the same name and command since those two commands are never run together.

a rule is specified for the output of another one I don't really get this one. You can't make a rule that depends on the output of a previous rule if that's what you're asking.

Answer 7 · 2020-11-20T07:29:44.000Z

In every folder we check for a crayon_info.txt file

Is the preprocessing supposed to recursively descend into every directory in the assets directory?
Or just search the top-level assets directories?

Answer 8 · 2020-11-20T09:07:53.000Z

The first one, recursively descend. For the most part the logic of preprocess.sh will be the same for ``preprocess.py` minus the preprocessing tags stuff in this issue.

Answer 9 · 2020-11-20T09:19:29.000Z

Also if there's a file/folder in the current directory (Lets say its name is a) and either no crayon_info.txt is present or the crayon_info.txt just doesn't mention a, then if its a file the script should just copy a (Preferably symlink to save on wasted storage space) from the assets path straight into the processed path and if its a folder then make a folder of the same name in processed path

Answer 10 · 2020-11-21T02:40:02.000Z

What if you delegate tokenising and escape processing to the shell, and to simplify things, include the platform in the rule. It could look like this:

dtex --dreamcast My_Spritesheet 12345 --c
bitmap --pc My_Spritesheet ARGB8888
romdisk "hello nice \"house\"" --GZ

If the list gets long and they need sections, they can use comments and blank lines.
Then you can process the file like this, or using GNU parallel if you want faster builds. Note that the xargs method handles quotes but does not work with shell escapes.

The preprocess script will be run, in effect, like this:

preprocess dtex --dreamcast My_Spritesheet 12345 --c
preprocess bitmap --pc My_Spritesheet ARGB8888
preprocess romdisk "hello nice \"house\"" --GZ

Now preprocess is only responsible for interpreting one rule, provided as its argument list. Python's argparse module makes handling subcommands and options with parameters pretty easy.

If requiring a copy subcommand (supporting file globs) to copy or link files/directories is too verbose, you could have the preprocess script output the files it processed and have the main program copy anything that wasn't processed. But on the other hand, it might be useful to allow files to only be copied on certain platforms.

Answer 11 · 2020-11-21T04:07:57.000Z

In order for that to work there'd have to be 2 scripts. The one that recursively checks all folders for the info file, then the script you mention that executes the commands. Also don't forget the current preprocess script has a -noRM parameter. This will prevent deletion of intermediate files. That means we'd have to insert that into the command for the 2nd script so it can handle that itself. I don't see any real advantage of using this new method over the current one. I think formatting the info file, like you said in the first example, is better than my idea since we can just take the line as-is and not have to bother with the platform header, but I think it would work fine as 1 script rather than 2 scripts.

I don't exactly get that copy section, other than GLOB-ing to do the same command on multiple files, that's a good idea.

Answer 12 · 2020-11-21T06:32:41.000Z

I don't exactly get that copy section ...

In order to copy files that weren't preprocessed, either the main program needs to figure out which files were mentioned in the file and exclude them, or the file needs to explicitly request to copy those files. If you go with the second method, you could also specify the platform in the copy command to only include certain files on specific platforms (e.g. shaders).

Hmm, if maintaining two scripts will be a problem, you could use the standard shlex module to parse the line into a list of strings. That might be a better solution after all. I might have a go to see how it turns out.

Answer 13 · 2020-11-21T08:28:37.000Z

Things are copied by default, but you did remind me. I can't remember if preprocess.sh has this or not, but there should be an ignore command where it doesn't copy the file. So any file without commands is plain old copied, any file with a non-ignore commands gets processed and output is copied and ignore will skip the file/folder.

Having 2 scripts isn't a problem, I just don't see any advantage aside from "Separating the functionality" and the parallelism part (If we did go the 2-script model, the parallelism part would be its own issue). I'm not opposed to the 2-script model, just that I don't see a big advantage. I'm fine with either model, but I think @Namdrib 's choice will be the most important since he'll be the one implementing it.

Answer 14 · 2020-11-23T13:01:55.000Z

we can just take the line as-is and not have to bother with the platform header

I like that part. It certainly simplifies the required parsing logic