/MapleStoryDetectionSampleGenerator

Generate machine learning object detection samples from Maplestory in different formats (TFRecord, darknet, COCO...)

Primary LanguageC#MIT LicenseMIT

MapleStoryDetectionSampleGenerator

Generate Machine Learning Samples Object Detection In MapleStory

Performance

This generator can generate arbitrarily many annotated samples. All bounding boxes are precisely annotated based on rendering coordinates.

With YOLOv4 and ~5000 samples, it can achieve 99.8%mAP in test set.

Requirement

  • Visual Studio 2019 v16.8 or above with .NET workload installed
  • .NET 5.0 SDK (5.0.0 or above)

Build

  1. Clone this repository with submodules by
    git clone --recursive git@github.com:charlescao460/MapleStoryDetectionSampleGenerator.git.
    Note that --recursive is necessary.
  2. Build WzComparerR2/WzComparerR2.sln (submodule MUST be built first)
  3. Build MapleStoryDetectionSampleGenerator.sln
  4. Run MapleStory.MachineLearningSampleGenerator\bin\Release\net5.0-windows\WzComparerR2.exe. Running WzComparerR2.exe will generate Setting.config, which is required for our MapRender.

Run

(Assuming assemblies are built with Release configuration. Debug configuration is similar)

  1. Cd into executable directory: cd MapleStory.MachineLearningSampleGenerator\bin\Release\net5.0-windows
  2. Use WzComparerR2.exe to find the desired map you want to sample. Assuming 993134200.img is the map you want in Limina.
  3. Prepare your player PNGs in a directory.
    Since WzComparerR2 does not have Avatar supported inside MapRender, we have to draw player images in our post-processing steps. Player images should be transparent PNGs with only the player's appearance. You can get these PNGs by Photoshop or save from WzComparerR2's Avatar plugin. Assuming .\players is the directory containing all images
  4. Run .\MapleStory.MachineLearningSampleGenerator.exe -m 993134200 -x 5 -y 5 -f coco -o ".\output" --post --players ".\players"
    This means run the sampler in map 993134200.img with every 5 pixels in X and every 5 pixels in Y, outputing COCO format, and drawing players in post processor.
    You can run .\MapleStory.MachineLearningSampleGenerator.exe --help for usage hint. Also you can take a look of the entrypoint Program.cs

Note

  • Since NPCs look like players, including them without annotation could result a negative effect on our model. If you want to hide all NPCs from generated samples, simply change WzComparerR2.MapRender/MapData.cs to prevent any NPC data loaded into map render.

Output Formats

Tensorflow TFRecord

According to Tensorflow official document, the output .tfrecord contains multiple tf.train.Example in single file. With each example store in the following formats:

uint64 length
uint32 masked_crc32_of_length
byte   data[length]
uint32 masked_crc32_of_data

And

masked_crc = ((crc >> 15) | (crc << 17)) + 0xa282ead8ul

Each tf.train.Example is generated by protobuf-net according to Tensorflow example.proto

Darknet

Output directory structure:

data/
|---obj/
|   |---1.jpg
|   |---1.txt
|   |---......
|---obj.data
|---obj.names
|---test.txt
|---train.txt

obj.data contains

classes=2
train=data/train.txt
valid=data/test.txt
names=data/obj.names
backup = backup/

And obj.names contains the class name for object. test.txt and train.txt contains samples for testing/training with ratio of 5:95 (5% of images in obj/ are used for testing).

COCO

Output directory structure:

coco/
|---train2017/
|   |---1.jpg
|   |---2.jpg
|   |---......
|---val2017/
|   |---1000.jpg
|   |---1001.jpg
|   |---......
|---annotations/
|   |---instances_train2017.json
|   |---instances_val2017.json

The COCO json is defined as following:

{
  "info": {
    "description": "MapleStory 993134100.img Object Detection Samples - Training",
    "url": "https://github.com/charlescao460/MapleStoryDetectionSampleGenerator",
    "version": "1.0",
    "year": 2021,
    "contributor": "CSR"
  },
  "licenses": [
    {
      "url": "https://github.com/charlescao460/MapleStoryDetectionSampleGenerator/blob/master/LICENSE",
      "id": 1,
      "name": "MIT License"
    }
  ],
  "images": [
    {
      "license": 1,
      "file_name": "30a892e1-7f3d-4c65-bdd1-9d28f1ae5187.jpg",
      "coco_url": "",
      "height": 768,
      "width": 1366,
      "flickr_url": "",
      "id": 1
    },
    ...],
  "categories": [
    {
      "supercategory": "element",
      "id": 1,
      "name": "Mob"
    },
    {
      "supercategory": "element",
      "id": 2,
      "name": "Player"
    }
  ],
  "annotations": [
    {
      "segmentation": [
        [
          524,
          429,
          664,
          429,
          664,
          578,
          524,
          578
        ]
      ],
      "area": 20860,
      "iscrowd": 0,
      "image_id": 1,
      "bbox": [
        524,
        429,
        140,
        149
      ],
      "category_id": 1,
      "id": 1
    },
    ...]

Note that segmentation covers the area as the same as bbox does. No segmentation or masked implemented .