Feature Request: Support for Additional File Formats in Data Preview
pingdoom opened this issue · 1 comments
Hello CSGHub Team,
I hope this message finds you well. I've been exploring the CSGHub platform and I'm impressed with its capabilities, especially in managing large model assets and datasets. It's evident that a lot of thought and effort has gone into making CSGHub a comprehensive asset management platform.
One area where I believe CSGHub could be enhanced is in the support for additional file formats in the dataset preview functionality. Currently, CSGHub provides excellent support for previewing datasets in common formats. However, as datasets become increasingly complex and diverse, the need to support additional formats becomes apparent.
Enhancement Request:
I would like to request the addition of support for the following file formats in the dataset preview functionality:
- HDF5 (.h5)
- Apache Parquet (.parquet)
- Avro (.avro)
These formats are widely used in the data science and machine learning communities for storing large, complex datasets. Supporting these formats would significantly enhance the usability of CSGHub for a broader audience and facilitate more efficient data exploration and management.
Justification:
- HDF5: Widely used in academia and industry for storing large datasets, especially in the fields of physics, astronomy, and bioinformatics.
- Apache Parquet: Offers efficient data compression and encoding schemes. It's heavily adopted in data engineering pipelines and supports schema evolution.
- Avro: A row-based storage format that's ideal for data serialization. It's commonly used in data streaming architectures.
Potential Implementation:
While I understand that adding support for these formats might require considerable effort, perhaps starting with HDF5, given its widespread use, could be a beneficial first step. Utilizing existing open-source libraries for reading these formats could also streamline the implementation process.
I believe that extending dataset preview capabilities to include these formats would make CSGHub even more versatile and valuable to the data science and machine learning communities.
Thank you for considering this enhancement request. I'm looking forward to seeing how CSGHub continues to evolve and meet the needs of its users.
@pingdoom Thanks for raising this and give more information and justification for data view on those data format. Dataset preview is key feature for us and lots of requirements are coming, make those datasets can be preview on CSGhub are in roadmap and we are working on dataset view to deal with datasets, and there are more things need to be consider include security, performance, usability etc. Looking forward to receiving more feedback from you.
Have a nice day!