/lark

Speech Assessment API in FastAPI with Hugging Face 🤗

Primary LanguageJavaScript

Lark API Readme

nextjs logo react logo typescript logo python logo prisma logo tailwindcss logo redis logo

What is it?

  • Lark API is a speech assessment REST API built using NextJS in Typescript.
  • It provides accuracy scores, speech to text transcription, and the projected IELTS pronunciation band.
  • It allows English learning apps and websites to assess and provide real-time feedback on the users’ pronunciation.

How does it work?

The Machine Learning part:

ML
  • Lark utilizes the Wav2Vec2 model from Meta for analyzing the speech sample.
  • It converts the speech to it’s phonetic transcription (S2P) using zero-shot cross-lingual recognition.
  • After recognizing the phonetics of the speech, it compares it with the ideal pronunciation of the transcribed speech using the Jaro-Winkler string similarity algorithm.

The Backend API part:

  • The API is written completely in NextJS using next-pages routing.
  • I have used next-auth for user authentication via GitHub and maintaining/persisting sessions.
  • I used Redis for rate-limiting the API based on the IP of the call.

The Frontend part:

  • The Frontend is written using NextJS in Typescript.
  • I opted for TailwindCSS as the CSS framework for this project.
  • For the tables and icons, Material UI has been used.

The Database part:

  • I used Prisma ORM on top of a PlanetScale database which is a serverless MySQL DB.
  • Here is the UML Diagram for the database:
UML

ML Models used

References