Datagen raised a $50 million Series B round to boost the growth of its synthetic data solution for computer vision (CV) teams, bringing its total funding to over $70 million, the Israel-born company announced today. The round was led by new investor Scale Venture Partners, with partner Andy Vitus joining Datagen’s board of directors.
With offices in Tel Aviv and New York, Datagen “is creating a complete CV stack that will propel advancements in AI by simulating real world environments to rapidly train machine learning models at a fraction of the cost,” Vitus said. The Palo Alto-based VC predicts that “this will fundamentally transform the way computer vision applications are developed and tested.”
Investors that had backed Datagen’s $18.5 million Series A round 11 months ago participated in this new round. This includes VC firms TLV Partners, and Spider Capital, as well as Series A leader Viola Ventures, this time also through its growth arm Viola Growth. High-profile individuals from the AI and data field doubled down too, such as computer scientists Michael J. Black and Trevor Darrell, NVIDIA’s director of AI Gal Chechik, and Kaggle’s CEO Anthony Goldbloom.
The list of investors could get longer, Datagen’s CEO Ofir Zuk (Chakon) told TC. Although the round closed a few weeks ago, the startup left “a small part in deferred closing” with a few names that remain to be confirmed.
Asked about Datagen’s main milestone since its Series A, Zuk explained that it consisted in building the self-serve platform that its target users demanded in their early feedback. With this approach, Datagen now has a much more scalable way to help clients themselves generate the visual data that they need to train their computer vision applications.
Datagen’s solution is used by computer vision teams and machine learning engineers inside a variety of organizations, including some Fortune 100 and ‘big tech’ companies. It has a wide range of applications, but there are four that are accelerating faster than others, Zuk said: AR/VR/metaverse, in-cabin automotive and automotive in general, smart conferencing, and home security.
In-cabin automotive is a good example to better understand what Datagen does. The term refers to what happens inside a car, such as whether or not the passenger is wearing a seatbelt. Passengers and cars come in many forms, which is where AI comes in handy. Based on some initial real-life 3D motion capture, Datagen lets its customers generate the much larger quantity of data that they need to, for instance, decide where exactly an airbag should be deployed.
We just touched on the common thread of synthetic data: How it leverages real-world data and extrapolates it into the kind of data that need more and more of: plentiful, and enriched to remove bias, cover edge cases, and more.
Datagen’s focus is visual data, but it isn’t tied to a sector in particular. If use cases in retail and robotics take off, for example, it will only need to collect specific real-life data, such as motion capture from warehouses. The algorithms and technology on top of this are domain-agnostic, Zuk said.
“The potential impact of what Datagen has to offer, across a broad range of applications, is staggering,” Vitus commented. A twenty-plus-year-old enterprise-focused VC firm, Scale already invested in automotive simulation platform Cognata, and is bullish about simulated data. So is Zuk: “Synthetic data is taking over real data,” he summed up.