A dataset of mineral images and labels for machine learning purposes. The dataset is created from this website http://www.minerals.net/ . The dataset consists of around 4,000 images of mineral specimens with labels. There are numerous labels included such as mineral name, crystal system, chemical groups, rock types and fracture. A complete list of labels is found in mineral.py
. To generate this dataset run mineral_net.py
and this will scrape all the images and mineral labels as well as generate tfrecords for them. The images are saved in data
and the tfrecords are saved in tfrecords
. When generating records there is a test train split of 80 percent. Here are a few images for show.
I have also included a few scripts to train and test neural networks in predicting labels from images. This proved an extremely difficult task due to the small dataset size. I made a few attempts to overcome this by using a network pretrained on Cifar however this did not help enough. In my opinion this problem is definitely solvable but it will require a larger training set and better transfer learning techniques. I believe that using a well proven network such as Inception 3 or 4 that is trained on Imagenet would be a good step at getting transfer learning working. I also think that the dataset could be expanded and refined heavily by scraping images from Ebay or other cites that sell minerals. Right now I don't really have the time to tackle these problems though.