Aman Desai
Projects

ReSign: Real-time Sign Language Recognition

Python · TensorFlow · React

Motivation

This was a weekend hackathon project from DiamondHacks 2024. The idea was to take a webcam stream, recognize sign-language gestures, and turn the output into text in a web app.

The useful constraint was latency. A sign recognition model can look reasonable in a notebook, but the product only works if the webcam feed, model inference, frontend updates, and text output all stay responsive together.

Approach

The backend used OpenCV and MediaPipe Holistic to process webcam frames and extract pose and hand landmarks. Those keypoints were stacked into a short sequence and fed into an LSTM model built with Keras and TensorFlow.

The frontend was a Next.js app. It embedded the Flask video stream, polled the backend for predictions, and displayed the recognized output. The repo also included a text-to-speech path, which made the demo feel more complete even though the recognition quality was still rough.

  • Flask server for webcam capture and model inference
  • MediaPipe keypoints instead of raw image classification
  • LSTM sequence model over 30-frame windows
  • Prediction smoothing before appending text to the output
  • Next.js frontend that displays the video stream and recognized phrase

Results

The demo worked end-to-end, which was the important part for the hackathon. It could capture video, run inference, update the browser, and show the predicted phrase. It was not a robust sign-language system, and I would not describe it that way.

The main lesson was that the integration was as important as the model. The backend, frontend, model window, confidence threshold, and polling loop all affected whether the app felt usable. It was a good practical introduction to turning an ML model into an interactive interface.

References

  • MediaPipe Holistic for pose and hand landmark extraction