Jump to content

LightTag is a text annotation platform for data scientists creating AI training data


NelsonG

Recommended Posts

LightTag, a newly launched startup from a former NLP researcher at Citi, has built a “text annotation platform” designed to assist data scientists who need to quickly create training data for their AI systems. It’s a classic picks ‘n’ shovels move, in that the bootstrapped Berlin-based company is hoping to take advantage of the current boom in AI development.

Specifically, LightTag aims to solve one of the main bottlenecks of ‘deep learning’-based AI development: what you get out is only as good as the labeled data you put in. The problem, however, is that labelling data is laborious, and since it’s a job carried out by teams of humans it is prone to inaccuracy and inconsistency. LightTag’s team-based workflow, clever UI, and in-built quality controls is an attempt to mitigate this.

“What I’ve taken from [my previous positions] to LightTag is an understanding that labeled data is more important to success in machine learning than clever algorithms,” says founder Tal Perry. “The difference in a successful machine learning project often boiled down to how well the gathering and use of labeled data was executed and managed. There is a huge gap in the tooling to consistently do that well, that’s why I built LightTag”.

Perry says LightTag’s annotation interface is designed to keep labellers “effective and engaged”. It also employs its own “AI” to learn from previous labelling and make annotation suggestions. The platform also automates the work of managing a project, in terms of assigning tasks to labellers and making sure there is enough overlap and duplication to keep accuracy and consistency high.

lighttag-box.gif

“We’ve made it dead-simple to annotate with a team (sounds obvious, but nothing else makes it easy),” he says. “To make sure the data is good, LightTag automatically assigns work to team members so that there is overlap between them. This allows project managers to measure agreement and recognise problems in their project early on. For example, if a specific annotator is performing worse than others”.

Meanwhile, Perry says acquiring labeled data is one of the silent growth sectors in the recent AI boom, but for many sector-specific industries, such as medical, legal or financial, outsourcing the job is not an option. That’s because the data is often too sensitive, or too specialist for non-subject experts to process. To address this, LightTag offers an on-premise version in addition to SaaS.

lighttag.gif

“Every company has huge text datasets that are unstructured (CRM records, call transcripts, emails etc). ‘Deep Learning’ has made it algorithmically feasible to tap that data, but to use Deep Learning we need to train the model with labeled datasets. Most companies can’t outsource labelling on text because the data is too complicated (biology, finance), regulated (CRM records) or both (medical records),” explains the LightTag founder.

Operating in various pilots and in private beta since December 2018, and publicly launched this month, LightTag has already been used by the data science team at a large Silicon Valley tech company that wants its AI to understand free-form text in profiles, as well as by an energy company to analyse logs from oil rigs to predict problems drilling at certain depths. The startup has also done a pilot with a medical imaging company labelling reports associated with MRI scans.

Techcrunch?d=2mJPEYqXBVI Techcrunch?d=7Q72WNTAKBA Techcrunch?d=yIl2AUoC8zA Techcrunch?i=9yfXrLR6NLs:J5i6p_rGNfs:-BT Techcrunch?i=9yfXrLR6NLs:J5i6p_rGNfs:D7D Techcrunch?d=qj6IDK7rITs
9yfXrLR6NLs

View the full article

Link to comment
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Our picks

    • Wait, Burning Man is going online-only? What does that even look like?
      You could have been forgiven for missing the announcement that actual physical Burning Man has been canceled for this year, if not next. Firstly, the nonprofit Burning Man organization, known affectionately to insiders as the Borg, posted it after 5 p.m. PT Friday. That, even in the COVID-19 era, is the traditional time to push out news when you don't want much media attention. 
      But secondly, you may have missed its cancellation because the Borg is being careful not to use the C-word. The announcement was neutrally titled "The Burning Man Multiverse in 2020." Even as it offers refunds to early ticket buyers, considers layoffs and other belt-tightening measures, and can't even commit to a physical event in 2021, the Borg is making lemonade by focusing on an online-only version of Black Rock City this coming August.    Read more...
      More about Burning Man, Tech, Web Culture, and Live EventsView the full article
      • 0 replies
    • Post in What Are You Listening To?
      Post in What Are You Listening To?
    • Post in What Are You Listening To?
      Post in What Are You Listening To?
    • Post in What Are You Listening To?
      Post in What Are You Listening To?
    • Post in What Are You Listening To?
      Post in What Are You Listening To?
×
×
  • Create New...