Note: If you’re looking for the Hebrew version or our crowdsourced transcription efforts, go here.
ivrit.ai is a non-profit effort aiming to make Hebrew a first-class citizen for AI technologies, by providing favorably-licensed datasets.
As of December 2023, we cater the world’s largest Hebrew audio dataset, over 10,000 hours, for commercial model-training use-cases. It is provided free-of-charge.
All ivrit.ai datasets are provided under a specially crafted license that explicitly allows using the content for AI model training – including for commercial models – while still maintaining key rights of content owners.
Please read our Credits page for a list of content creators and volunteers.
You can read more about our first dataset here, and it is of course available on Huggingface.
For more info, please contact us here.