How can we efficiently automate note-taking for in-person meetings?
Spoiler: it’s much harder than for online meetings!
Today, there are dozens of cloud-based tools available to automate transcription for virtual meetings (Teams, Meet, Zoom), and many innovative startups are emerging in this space. These solutions have made major progress thanks to recent technological advances—most notably the release of Whisper by OpenAI in 2022, which revolutionized automatic transcription for online meetings.
But what about in-person meetings?
If you’ve ever tried to record an in-person meeting using a Teams session, for example, you’ve likely been disappointed. The result is often a flat, unstructured block of text with no clear distinction between speakers.
Why doesn’t it work well?
Simply put: these tools lack the critical context needed to distinguish and label voices in a physical room. Unlike virtual meetings—where each participant has a dedicated, identifiable audio channel—physical meeting rooms are much more challenging. All voices are captured by a single microphone, making it nearly impossible to accurately identify who is speaking.
What technologies are required to enable reliable in-person transcription?
To succeed in this context, several advanced building blocks are necessary:
- Smart multi-point microphones: placed at different points around the meeting table (sometimes very long), capable of accurately detecting the Direction Of Arrival (DOA) of sound and automatically focusing on the active speaker (beamforming).
- Advanced voice separation algorithms: to isolate individual voices even in fast-paced discussions or when people speak over each other.
- Precise speaker identification: using fine-tuned voice recognition, each participant is automatically tagged.
- A high-performance transcription engine: to generate accurate diarization, i.e., a transcript that clearly states who said what and when.
As you can imagine, this technological combination is extremely complex, especially as the number of participants grows and the conversation becomes more dynamic—with multiple speakers and parallel discussions.
And when it comes to confidential meetings, the challenge grows even further:
For high-stakes discussions (Executive Committees, Boards of Directors, etc.) where data privacy is critical, another constraint comes into play: the entire processing workflow must remain local. Cloud-based transcription becomes unacceptable.
This is where Edge Computing becomes essential—everything must be processed on-site, with zero data ever leaving the meeting room.
At csky.ai, we’ve taken on this technological and operational challenge with the development of ClearMind:
The first fully autonomous, offline meeting assistant built specifically for in-person and hybrid strategic meetings.
ClearMind offers:
- Advanced multi-microphone voice separation for clean, structured audio input
- Speaker identification via voice fingerprint or quick intro at the start of the session
- High-quality automated minutes, generated instantly
- Maximum confidentiality – with zero data ever transferred outside your meeting room
If you're looking to radically transform how you manage strategic meetings while ensuring maximum security, ClearMind can help you automate your meeting notes—so you can stay fully focused on what truly matters: your decisions.