Pinned Repositories
ISSAI_SAIDA_Kazakh_ASR
the first industrial-scale open-source Kazakh speech corpus. KSC2 corpus subsumes the previously introduced two corpora: KSC and KazakhTTS2 and supplements additional data from other sources. KSC2 contains around 1.2k hours of high-quality transcribed data comprising over 600k utterances.
kaz-image-captioning
ExpansionNet v2 model trained on the COCO dataset with captions translated into Kazakh
Kazakh_TTS
An expanded version of the previously released Kazakh text-to-speech (KazakhTTS) synthesis corpus. In KazakhTTS2, the overall size has increased from 93 hours to 271 hours, the number of speakers has risen from two to five (three females and two males), and the topic coverage has been diversified.
KazEmoTTS
An open-source Kazakh Emotional Text-to-Speech Dataset
KazNERD
An open-source Kazakh named entity recognition dataset (KazNERD), annotation guidelines, and baseline NER models.
SpeakingFaces
A large-scale publicly-available visual-thermal-audio dataset designed to encourage research in the general areas of user authentication, facial recognition, speech recognition, and human-computer interaction.
TFW
TFW: Annotated Thermal Faces in the Wild Dataset
thermal-facial-landmarks-detection
SF-TL54: Thermal Facial Landmark Dataset with Visual Pairs.
TurkicASR
A multilingual ASR model that can recognize ten Turkic languages—Azerbaijani, Bashkir, Chuvash, Kazakh, Kyrgyz, Sakha, Tatar, Turkish, Uyghur, and Uzbek.
TurkicTTS
A multilingual text-to-speech synthesis system for ten lower-resourced Turkic languages: Azerbaijani, Bashkir, Kazakh, Kyrgyz, Sakha, Tatar, Turkish, Turkmen, Uyghur, and Uzbek.
ISSAI's Repositories
IS2AI/TurkicASR
A multilingual ASR model that can recognize ten Turkic languages—Azerbaijani, Bashkir, Chuvash, Kazakh, Kyrgyz, Sakha, Tatar, Turkish, Uyghur, and Uzbek.
IS2AI/TurkicTTS
A multilingual text-to-speech synthesis system for ten lower-resourced Turkic languages: Azerbaijani, Bashkir, Kazakh, Kyrgyz, Sakha, Tatar, Turkish, Turkmen, Uyghur, and Uzbek.
IS2AI/thermal-facial-landmarks-detection
SF-TL54: Thermal Facial Landmark Dataset with Visual Pairs.
IS2AI/kaz-image-captioning
ExpansionNet v2 model trained on the COCO dataset with captions translated into Kazakh
IS2AI/KazEmoTTS
An open-source Kazakh Emotional Text-to-Speech Dataset
IS2AI/telegram-bot-chatgpt
Telegram bot to interact with ChatGPT via voice messages
IS2AI/Central-Asian-Food-Dataset
42 food classes from Kazakh National and Central Asian cuisine
IS2AI/trimodal_person_verification
This repository contains code and data for "On the Multimodal Person Verification Using Audio-Visual-Thermal Data"
IS2AI/faces-in-event-streams
This repo contains code and instructions for the detection of faces in event streams
IS2AI/Kazakh-Speech-Commands-Dataset
Kazakh Speech Commands Dataset
IS2AI/OpenThermalPose
An Open-Source Annotated Thermal Human Pose Dataset and Initial YOLOv8-Pose Baselines
IS2AI/Soyle
IS2AI/AnyFace
Input-Agnostic Face Detection
IS2AI/KazParC
An open-source parallel corpus for machine translation across Kazakh, English, Russian, and Turkish
IS2AI/KazQAD
An open-source Kazakh Question Answering Dataset
IS2AI/Column-Design-Optimization
Column design optimization
IS2AI/KazSAnDRA
An open-source Kazakh Sentiment Analysis Dataset of Reviews and Attitudes (KazSAnDRA) and baseline sentiment classification models
IS2AI/city-identification
This repo contains dataset and models for city classification
IS2AI/city-sustainability-indexes
This repo contains code and models for detecting city sustainability indexes
IS2AI/COHI-O365
The most diverse in number of images/labels/classes fisheye synthetic dataset with source codes and models. As well as a benchmarking testing real dataset.
IS2AI/Common-Objects-in-Hemispherical-Images-Dataset
39 classes of objects sampled from the MS COCO dataset captured with a hemispherical/fisheye camera
IS2AI/TatarTTS
TatarTTS: An Open-Source Text-to-Speech Synthesis Dataset for the Tatar Language
IS2AI/Vision-Language-Models-for-Activity-Recognition-and-Abnormality-Detection-for-Elderly
VLM PrismerZ model for recognition of emergency and non-emergneyc situations via vision and language transformers. PrismerZ is directed on understanding the contextual information and completing image captioning and visiom qiestion answering tasks.
IS2AI/docker-flask-api-template
This is docker Flask API template with GPU support. As an example the project has X-Ray disease classificator project in it.
IS2AI/talk-llm
Talk with ChatGPT
IS2AI/Enhancing-Ambient-Assisted-Living-with-Multi-Modal-Vision-and-Language-Models
This project is aimed at detecting the abnormal behaviour or emergency cases using vision-language model (VLM), large language model (LLM), human detection model, text-to-speech (TTS) and speech-to-text models (STT). The framework can detect the subtle sings of emergency and actively interact with the user to make an accurate decision.
IS2AI/HPE-depth-fisheye
This project used synthetic data created using Nvidia Omniverse to train a camera-view invariant multi-pose HPE model for depth and fisheye cameras.
IS2AI/serge
A web interface for chatting with Alpaca through llama.cpp. Fully dockerized, with an easy to use API.
IS2AI/TatarSCR
An Open-Source Speech Commands Dataset for the Tatar Language
IS2AI/unified_multimodal_transformer