In 2022, a major breakthrough shook the tech world: OpenAI unveiled ChatGPT, an advanced language model capable of tackling complex questions and engaging in human-like conversation.
This incredible innovation came with a small catch: while Hebrew is supported, it’s not up to par.
Hebrew First
Artificial intelligence is transforming our world at breakneck speed, but most developments cater primarily to English. To bring these advancements to our doorstep and allow doctors, engineers, and Hebrew speakers from all walks of life to benefit from these capabilities, we’re committed to enhancing AI’s grasp of Hebrew – both spoken and written.
The main hurdle is the scarcity of large-scale Hebrew datasets for AI tools to “train” on. We aim to provide high-quality data that’s commercially viable, encouraging companies to support Hebrew just as robustly as they do English.
We’re releasing a dataset of over 13,000 raw hours of recorded content, featuring more than 1,000 speakers.
Article: https://arxiv.org/abs/2307.08720
Dataset: ivrit-ai@Huggingface
Want to lend a hand? Get in touch.