Categories
AI

Instabase adds deep learning to make sense of unstructured data

Hear from CIOs, CTOs, and other C-level and senior execs on data and AI strategies at the Future of Work Summit this January 12, 2022. Learn more


Whether they realize it or not, most enterprises are sitting on a mountain of priceless, yet untapped, data. Buried deep within PDFs, customer emails, and scanned documents is a trove of business intelligence and insights that often have the potential to inform critical business decisions – if only it can be extracted and harnessed, that is.

Instabase hopes to help businesses benefit from unstructured data with the help of some good, old-fashioned AI. Today, the business automation platform provider is announcing a set of new deep learning-based tools designed to help enterprises more easily extract and make sense of this unstructured data and build applications that will help them put it all to use.

“Unlocking unstructured data, which is 80% of all enterprise data, is an extremely difficult problem due to the variability of the data,” says Instabase founder and CEO Anant Bhardwaj. “Deep learning algorithms provide greater accuracy as the algorithm learns from the entirety of each training document and identifies many different attributes to make its decision, much like a human does.”

Instabase’s new deep learning features offer low-code and no-code functionality that’s designed to let Instabase customers tap into sophisticated deep learning models and train, run, and make use of these models for their business’s needs. Using drag-and-drop visual development interfaces, Instabase customers can build customized workflows and business applications powered by best-in-class deep learning models.

“These deep learning models have already been trained on very large sets of data and as a result, fewer samples are needed to fine-tune the model for a specific use case,” Bhardwaj explains. “That means enterprises can tackle use cases never before possible, build solutions faster and at unprecedented accuracies for their unstructured data use cases.”

Founded in 2015, Instabase uses technology like optical character recognition and natural language processing to extract and decipher data that is far too often buried in formats that can be difficult for machines to understand. The platform provider, whose customers include companies in the financial services, medical, and insurance industries, hopes that by tapping into this unstructured data, it can help companies automate more of their business processes, inform key decisions, and further their own digital transformation.

With the addition of its new deep learning infrastructure, Instabase hopes to make this unstructured data analysis even faster and more impactful. Using the platform’s Machine Learning Studio, Instabase customers can annotate data points within documents and spin up and train a custom model, which can then be used by others within the organization. The new features also include a Model Catalog, which offers plug-and-play access to a library of deep learning models built by Instabase and other providers.

The platform’s new deep learning features will be publicly available in early 2022.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Why unstructured data is the future of data management

All the sessions from Transform 2021 are available on-demand now. Watch now.


Enterprises are increasingly relying on unstructured data for regulatory, analytic, and decision-making purposes. Unstructured data will power analytics, machine learning, and business intelligence.

According to the latest figures from research firm ITC, the volume of unstructured data is set to grow from 33 zettabytes in 2018 to 175 zettabytes, or 175 billion terabytes, by 2025. There has to be some kind of data management so organizations have the right kind of data available at the right time. Krishna Subramanian, president and COO of Komprise, a data management software provider, sat down with VentureBeat to discuss the business benefits and challenges associated with unstructured data.

Venturebeat: Does the average enterprise IT organization know how much unstructured data they have and how fast it is growing?

Krishna Subramanian: Intuitively they know a lot is unstructured and it is growing in double digits, but they don’t know exactly how much they have and how fast it’s growing. We know that 80-90% of the world’s data is unstructured.

Venturebeat: What’s the problem with this data growth — there is now endless cloud storage after all, right?

Subramanian: The big issue is the cost – over two-thirds of the cost of data is not in the storage, but in its active management. For every piece of data, companies typically keep a few backup copies and a replication copy for disaster recovery. If you think your data is growing at 30%, it’s more like 90-100% when you factor in all the copies of the data. It’s also wise to consider that cloud storage is not necessarily cheaper. For instance, AWS itself today offers over 16 tiers of unstructured file and object storage. If you don’t put your data in the right place and control egress costs, you may end up paying more than if you were storing it on premises because every time you even read the data you’ll be charged. The key here is that over 80% of data is not actually actively accessed and is cold. This cold data can be stored on cheaper storage and does not require the same level of backup and replication. Therefore, you need to manage hot data that is actively used and cold data that is rarely used differently. As just one example, Pfizer researchers generate between 8TB and 10TB a day, and they were running out of datacenter space. They were able to use a data management product to identify the cold data and eliminate it from their expensive storage, backups, and replication by moving it to lower cost-resilient storage in the cloud and taking it out of active management. The company wound up cutting 75% of their data storage and backup costs, all without users having to notice any change. What’s hard about data growth is that a lot of organizations don’t like to delete data. You never know when you might need it. And when you do, you want to be able to find it easily. And users and applications should not have to change their behavior when you move data around. In the past, with archiving to tape, that wasn’t possible, but now it is with cloud storage and with data management software.

Venturebeat: Why is it important to be strategic about how you manage it, store it — isn’t it just about making sure you can find it for the BI team?

Subramanian: Today, data is a valuable corporate asset. You’ve got to be strategic with it because it’s not just for your BI teams, but for the R&D and customer success teams. They need historical data to build new products or to improve the ones they already have. This is super relevant in manufacturing, such as in the semiconductor chip industry, but also in other industries that are so important to our economy, such as pharmaceuticals. COVID researchers depended upon access to SARS data when developing vaccines and treatments. Data often becomes valuable again later, and what if you don’t know what you have or you can’t find it? We’ve had customers in the media and entertainment business, and in the past when they wanted to find an old show, they’d need access to a tape archive. Then, they needed an asset tag to locate the tape. That can be very difficult, and it’s why archiving is not popular. Live archive solutions that are available today make archived data instantly accessible and transparently tier data so users can easily locate files and access them anytime.

Venturebeat: How will tools and practices evolve to help IT departments better leverage this unstructured data for the organization/business users? What’s needed, where are the gaps?

Subramanian: You need a storage-independent way to look at data across all of your storage technologies, whether in your datacenter or in the cloud, to not only move data to the right place, but also to help businesses extract value from the data. Gartner calls this category “data management software,” and it includes companies like Cirrus Data for block data and Komprise for file and object data. The ultimate goal is to help business users leverage historical data, and this requires data search, data analytics, and data intelligence. These are hot areas where a lot of innovation is happening. The cloud providers offer several data warehousing and data analytics solutions that can be leveraged in conjunction with data management software, such as AWS Redshift and QuickSight. For instance, we use distributed Elastic Search in our software to rapidly search billions of files and find just the data relevant to a user, such as all the data for a particular project, and export this data to RedShift for further analysis. Why have all this data if you can’t detect significant trends, such as anomalies or ransomware? I believe we need more predictive analytics around data.

Venturebeat: Will the data management challenge spur a whole new sector of startups in the coming year or two?

Subramanian: Definitely. Analysts are beginning to recognize data management software as a new category. Beyond the use cases above, consider all the new types of data analytics companies getting funded, such as SnowFlake, DataBricks, and Apache Spark. So many companies are coming to light right now to solve data management and data analytics issues at scale.

Venturebeat: How are the big cloud providers responding to problems and opportunities with unstructured data growth?

Subramanian: They are all offering more services to store data at different performance and price points. Amazon Elastic File System (Amazon EFS) and Azure Files were born to address the need for file storage in the cloud. The major CSPs are investing in partners across many areas of unstructured data management, including migration and analytics.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Categories
AI

Katana Graph raises $28.5 million to handle unstructured data at scale

Katana Graph, a startup that helps businesses analyze and manage unstructured data at scale, today announced a $28.5 million series A round led by Intel Capital.

Katana Graph was founded by University of Texas at Austin computer science professor Keshav Pingali and assistant professor Chris Rossbach. The company helps businesses ingest large amounts of data into memory, CEO Pingali told VentureBeat in a phone interview. The UT-Austin research group started working with graph processing and unstructured data two years ago and began by advising DARPA on projects that deal with data at scale. Katana Graph works in Python and compiles data using C++.

Like companies that deal with algorithm auditing, AIOps, and model monitoring and management services, startups have emerged to help businesses analyze and label data, which may be why Labelbox raised $40 million and Databricks raised $1 billion.

Katana Graph is currently working with customers in health, pharmaceuticals, and security.

“One of the customers we’re engaged with has a graph with 4.3 trillion pages, and that is an enormous amount of data. So ingesting that kind of data into the memory of a cluster is a big problem, and what we were able to do with the ingest time is reduce the ingest time from a couple of days to about 20 minutes,” Pingali said.

Today’s round included participation from WRVI Capital, Nepenthe Capital, Dell Technologies Capital, and Redline Capital.

Katana Graph was founded in March 2020 and is based in Austin, Texas. The company has 25 employees and is using the funding to expand its marketing, sales, and engineering teams.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform
  • networking features, and more

Become a member

Repost: Original Source and Author Link