Startup of the Month: Unstructured

Company helps businesses turn raw data into AI-friendly formats

Back Article Jun 7, 2024 By Russell Nichols

This story is part of our June 2024 issue. To subscribe, click here.

Think about a global company where each department has its own unique jargon, and employees speak different languages. To thrive, this company would need a way to keep valuable information from getting lost in translation.

Similarly, businesses looking to take advantage of AI’s powerful computing resources need support in dealing with various file types: PDFs, Google Docs, Slack messages, scanned files. AI systems struggle to process these different formats, but Brian S. Raymond launched Unstructured in 2022 to serve as an efficient AI translator.

After working for another AI company from 2018 to 2022, Raymond saw the challenges firsthand. He recalls spending months to years working with the world’s largest companies, manually converting raw data from formats like PowerPoints and PDFs into a format suitable for algorithms.

“We would hard code these pipelines, and if a document layout changes a little bit … everything would break,” says Raymond, a former CIA intelligence officer, adding that “Algorithms were beginning to be really powerful, but there was nothing to help on the data side of the equation.”

Two months after launching his company, Unstructured released an open-source platform that transforms complex unstructured data formats into AI-friendly JSON files. This prototype “caught on like wildfire,” Raymond says, downloaded more than 8 million times in the past 12 months. But a JSON file is limited. 

“If you’re just doing a proof of concept, that’s fine, you can do it one time and that works really well using our open source,” Raymond says. “But if you’re an organization like a large investment bank that produces maybe a quarter million new files every single day … then feeding them down to the language model, you can’t be doing that manually.”

Transforming massive data from a major company requires a more customized solution. Last year, Unstructured shifted its focus from offering only an open-source solution to providing a commercial one. This new solution can be deployed across an entire enterprise, allowing businesses to use their daily data with advanced AI systems.

Raymond is based in Loomis, but the company of 45 people is fully remote. For funding, the team met with about 50 investors and met Bain Capital Ventures, which led its seed round in raising $5 million. In the past 22 months, the startup has raised a total of $65 million.

Ryan Lewis, a partner at SRI Ventures, joined Unstuctured’s advisory board in 2022 for two reasons: the people and the core concept. He has known Raymond for almost a decade, and believes the idea behind Unstructured offers a vital solution to the pressing problem faced by organizations, both big and small, in making sense of corporate data.

“It is, arguably, one of the most critical bottlenecks for an organization looking to adopt AI technology into its workflow,” Lewis says. “I’ve seen this both as a current investor and formerly as an operator.”

Previously, at one of Amazon Web Services’ AI businesses, Lewis saw firsthand that data preparation and curation for use in an AI model were one of the most time-consuming aspects of almost every project. This is why, when he heard Raymond’s pitch, “it clicked within five seconds,” he says. The staggering number of open-source solution downloads speaks to its real value, Lewis adds. 

“I’ve worked in software my entire career,” he says. “To see this happen so fast, it’s a testament that they’re attacking a problem and resolving it.”

According to Raymond, the way the world is moving, organizations have an imperative to adopt AI models to help drive productivity and work. Much of that value comes when companies join their data. Most are in experimentation mode right now, he adds, working to move proofs of concept into production. 

“We’re a critical piece of that puzzle,” Raymond says. “If you want to connect the data your humans are producing with those (foundation or large language) models, the first step is to get all of your data into a format that models can understand, and we’re the industry standard for that.”

Get all the profiles in our Young Professionals issue delivered to your inbox: Subscribe to the Comstock’s newsletter today!

Recommended For You

Startup of the Month: AgriNerds

Mapping tool helps farmers track carriers of bird flu

In recent years, avian influenza (or “bird flu”) has been on a rampage, wiping out wild and domestic birds, disrupting the environment, and causing a shortage of eggs and poultry meat. The Davis-based startup AgriNerds aims to help farmers to identify potential risks and protect poultry against the threat of diseased ducks.

Feb 2, 2024 Russell Nichols