At GoSign.AI, we’re taking steady steps toward real-time, seamless sign language recognition, one handshape at a time. Our dream is interoperability across all modalities and all languages. That means going beyond recognizing isolated signs to understanding the full linguistic stack: grammar, vocabulary, pragmatics, morphology, and syntax. Sign Language Recognition (SLR) makes up one half of the broader technology puzzle; the other half is Sign Language Production. In this post, we’ll dig into some aspects of SLR.
Many organizations – academic, commercial, nonprofit – are tackling the problem from different angles. Some are further along than others, and honestly, it’s thrilling to watch. We’re contributing to the space ourselves, with a specific focus on gaming and sign language user interfaces (SLUIs). One milestone along that path is Isolated Sign Recognition (ISR).
From Handshapes to Games: Our Approach to ISR
ISR is often treated as a warm-up act for full SLR. It’s the idea of recognizing individual signs, typically static ones, from a predefined vocabulary. You can think of it as a natural evolution of the now-familiar touch gestures – pinch to zoom, swipe to go back – but taken into 3D space, and injected with a bit of linguistic potential. On its own, ISR isn’t quite language.
But it can start to resemble one. If we borrow higher-level concepts from linguistics and bake them into our recognition logic, suddenly ISR becomes more than a technical curiosity. It becomes a building block for SLUIs – interfaces designed natively for signers, not just adapted for them.
At GoSign.AI, we’ve been collecting data globally using simple gamified tasks. We prompt users to produce samples of signs across a variety of conditions. With that dataset, we trained a basic ISR model.
To build our initial ISR model, we used MediaPipe Model Maker as a framework for on-device gesture recognition. The true innovation, however, came from the dataset we created. We collected handshape samples through gamified tasks, capturing a wide range of signing styles, backgrounds, lighting conditions, and skin tones. This globally sourced dataset gave our model a level of variation and realism that typical benchmark datasets often lack.
Using MediaPipe’s landmark extraction to preprocess the data, we trained a custom model tailored to the specific signs we care about. We fine-tuned the training process, adjusting architecture, training parameters, and export settings, to balance performance and responsiveness. The result is a lightweight, efficient model that performs well in the wild and forms the core of our four interactive games.
Now, there’s a subtle but important distinction between handshape recognition and ISR. Take American Sign Language (ASL)’s manual alphabet: many letters are differentiated only by handshape and palm orientation. Other sign languages, like BSL, use movement and location more prominently in their alphabets. When more parameters are introduced, we move past handshape recognition and into ISR.
Making models is one thing, but what about actually putting them to meaningful uses? To this end, we made four simple games using our partial model of ASL’s manual alphabet. It was a blast making them, and they’re live at games.gosign.ai. Read our blog post about it. We made the games not just to find out what sparks interest, but to probe different areas of sign languages that might yield some amazing multimodal experiences down the road.
Let’s take a closer look at our games.
Four Games, Four Experiments in Sign Interaction
Signing Hero flips the idea of fingerspelling into a rhythm game. A sentence scrolls by, and you have to “type” each letter with your hands as it hits the center mark. But this isn’t your typical fingerspelling flow – we’re imposing a beat. That constraint introduces a new cognitive load: not only do you have to recall the letter, you have to time it. That temporal precision turns a familiar skill into something more demanding and musical. It opens the door to exploring prosody in sign, the rhythm and flow of language, which is often underappreciated in tech but central to communication.
In Sign Blaster, falling balls display letters, and your job is to “blast” them by quickly forming the correct handshape. It’s ISR at its most immediate – see a letter, recall the sign, make the shape. This game puts a spotlight on rapid recall, reaction time, and muscle memory. It’s less about deep language understanding and more about building the physical fluency and confidence needed for fluid signing. The pressure of speed makes it a fun challenge, but it also simulates the kind of quick thinking that fingerspelling often demands in real-life conversations.
Bridge of Hands takes inspiration from Hangman. You fingerspell words letter by letter to lay down letters across a bridge. The game encourages longer-term recall and planning. It’s less reactive than Sign Blaster and more about spelling whole words correctly under gentle pressure. It also taps into a different layer of language: the structure of written words, and the internal mapping between them and fingerspelling. It’s our quietest game, but also the one most focused on linguistic assembly.
In Flappy Fingers, which riffs off the mechanics of Flappy Bird, you flap your “Y” hand to keep your avatar aloft. Why “Y”? Because it kinda looks like a bird flapping its wings. At first glance, it might seem like the simplest game, but it actually touches on some surprisingly rich linguistic ideas. The ‘Y’ hand evokes iconicity, a visual resemblance between sign and referent, and hints at classifier-like usage, where handshape represents a class of things (like flapping or flying creatures). ASL and many other sign languages really shine at encoding meaning visually and spatially in nuanced ways. This kind of iconicity is a visual feast for our Deaf eyes.
We hope to expand Flappy Fingers’s simple mechanic into something that truly showcases sign languages’ unique expressive capabilities. Sure, we’re not exactly using it gracefully as a mechanic (yet), but hey…our flexors and extensors are getting a workout! We want to double down on this visual storytelling in future games, guided by community feedback.
The Road Ahead: From ISR to Real-Time SLR
It’s worth noting that fingerspelling English words isn’t the same as using full ASL. It’s tempting to think of fingerspelling as a standalone system, but in reality, it exists on a rich spectrum. On one end, you have direct transliteration (spelling out English letter by letter) which doesn’t capture ASL’s unique grammar, classifiers, or spatial structure. But on the other end, fingerspelling plays an important role within ASL, especially for names, borrowed words, and emphasis. Our games mostly sit on the transliteration side of that spectrum for now, because it’s a tractable starting point for ISR. But the long-term goal is to move beyond that – to build systems that understand and respond to native sign languages in all their rich, layered expressiveness. That’s where our current experiments come in.
Each of our four games explores a different “primitive” of language, building blocks we think might one day combine into something more expressive. They’re also experiments in UI design. Yes, there’s still a point-and-click wrapper around them (the “traditional” UI, which almost feels quaint now), but the games themselves are interfaces. SLUIs. One day, we’ll see many such interfaces emerge – games, tools, dashboards – each designed with native sign interaction in mind. One fun example from outside our space: the Li Auto L7, an electric car in China with gesture-based controls.
Toward a Multilingual, Multimodal Future
We’re genuinely excited about where ISR and SLR are headed, not just because they can improve access (though that’s hugely important), but because they open up all kinds of expressive, playful, and creative possibilities. That said, we’re not fully there yet. Real-time, natural sign language recognition is still an unsolved challenge for many reasons we can’t go into here. If you’re curious about what’s currently state-of-the-art in SLR and SLP, check out Sign-Speak. They’ve been generous in sharing API access, and we’re genuinely impressed with their work. We’re exploring the entertainment angle partly because it places a lighter burden on rigorous user testing while still yielding valuable insights.
And while we’ve focused on ASL for now, we’re very aware that building games for other sign languages, having their own manual alphabets, spatial grammars, and expressive textures, poses both exciting and difficult challenges. British Sign Language (BSL), for instance, uses a two-handed fingerspelling system, where certain letters require coordinated movement between both hands. Unlike ASL, which places more emphasis on palm orientation and handshape in a single dominant hand, BSL’s alphabet relies heavily on symmetry, contact points, and relative positioning between hands. These differences introduce a fundamentally distinct recognition problem, both in terms of vision models and interaction design.
From a technical standpoint, two-handed alphabets present challenges in hand-tracking, especially when hands occlude one another or overlap. Training models to recognize these shapes requires a different approach to landmark extraction and data collection. In ASL ISR, one hand may be sufficient for classification, whereas in BSL, accurate recognition often hinges on spatial relationships between both hands. For example, the letters “H” and “R” in BSL involve subtle differences in movement or position that don’t exist in ASL’s one-handed system. That means our UI must accommodate not only static shapes but also short, fluid motions.
Beyond the alphabet, grammatical structures also differ. BSL often uses different word order and spatial mapping conventions compared to ASL. Designing interactive experiences that align naturally with those grammars isn’t just a matter of translation. It requires rethinking game flow, feedback mechanisms, and visual layout to reflect native linguistic logic. Creating experiences that feel right in each language means close collaboration with native signers from each community and a flexible technical foundation that can adapt to their needs. It’s important work that demands more perspectives than just our own.
Language is powerful, and so is the full range of human expression that surrounds it. From handshape and movement to facial grammar, from voice to visuals, we’re finally entering an era where technology can keep up. Multimodal, multimedia, and multilingual systems are no longer science fiction. For the first time, we can start designing with signed languages at the center, not as an afterthought, but as a foundation.
Read this article and want to connect over email or video? Reach me at jeff@gosign.ai. I’d love to continue the conversation.
Comments are closed.