Dataiku releases new version of unified AI platform for machine learning

Hear from CIOs, CTOs, and other C-level and senior execs on data and AI strategies at the Future of Work Summit this January 12, 2022. Learn more

Dataiku recently released version 10 of its unified AI platform. VentureBeat talked to Dan Darnell, head of product marketing at Dataiku and former VP of product marketing at, to discuss how the new release provides greater governance and oversight of the enterprise’s machine learning efforts, enhances ML ops, and enables enterprises to scale their ML and AI efforts.

Governance and oversight

For Darnell, the name of the game is governance. “Until recently,” he told VentureBeat, “data science tooling at many enterprises has been the wild west, with different groups adopting their favorite tools.” However, he sees a noticeable change in tooling becoming consolidated “as enterprises are realizing they lack visibility into these siloed environments, which poses a huge operational and compliance risk. They are searching for a single ML repository to provide better governance and oversight.” Dataiku is not alone in spotting this trend, with competing products like AWS MLOps tackling the same space.

Having a single point of governance is helpful for enterprise users. Darnell likens it to a single “watchtower, from which to view all of an organization’s data projects.” For Dataiku, this enables project workflows that provide blueprints for projects, approval workflows that require managerial sign-off before deploying new models, risk and value assessment to score their AI projects, and a centralized model registry to version models and track model performance.

For its new release, governance is centered around the “project,” which also contains the data sources, code, notebooks, models, approval rules, and markdown wikis associated with that effort. Just as GitHub went beyond mere code hosting to hosting the context around coding that facilitates collaboration, such as pull requests, CI/CD, markdown wikis, and project workflow, Dataiku‘s eponymous “projects” aspire to do the same for data projects. “Whether you write your model inside Dataiku or elsewhere, we want you to put that model into our product,” said Darnell.

ML ops

Governance and oversight also extend into the emerging field of ML ops, a rapidly growing discipline that applies several DevOps best practices for machine learning models. In its press release, Dataiku defines ML ops as helping “IT operators and data scientists evaluate, monitor and compare machine learning models, whether under development or in production.” In this area, Dataiku competes against products like Sagmaker’s Model Monitor, GCP’s Vertex AI Model Monitoring, or Azure’s MLOps.

Automatic drift analysis is an important newly released feature. Over time, data can fluctuate due to subtle underlying changes outside the modeler’s control. For example, as the pandemic progressed and consumers began to see delays in gym re-openings, sales of home exercise equipment began creeping up. This data drift can lead to poor performance for models that were trained on out-of-date data.

What-If scenarios are one of the more interesting features of the new AI platform. Machine learning models usually live in code, accessible only to trained data scientists, data engineers, and the computer systems that process them. But nontechnical business stakeholders want to see how the model works for themselves. These domain experts often have significant knowledge, and they often want to get comfortable with a model before approving it. Dataiku what-if “simulations” wrap a model so that non-technical stakeholders can interrogate the model by setting different inputs in an interactive GUI, without diving into the code. “Empowering non-technical users as part of the data science workflow is a critical component of MLOps,” Darnell said.

Scaling ML and AI

“We think that ML and AI will be everywhere in the organization, and we have to unlock the bottleneck of the data scientist being the only person who can do ML work,” Darnell said.

One way Dataiku is tackling it is to reduce the duplicative work of data scientists and analysts. Duplicative work is the bane of any large enterprise where code silos are rampant. Data scientists redo the work because they simply don’t know if it was done elsewhere. A catalog of code snippets can provide data scientists and analysts greater visibility on prior work so that they can stand on the shoulders of colleagues rather than reinvent the wheel. Whether or not the catalog can work will hinge on search performance — a notoriously tricky problem — as well as whether search can easily identify the relevant prior work, therefore freeing up data scientists to accomplish more valuable tasks.

In addition to trying to make data scientists more effective, Dataiku’s AI platform also provides no-code GUIs for data prep and AutoML capabilities to perform ETL, train models, and assess their quality. This feature is geared at technically-proficient users who cannot code and empowers them to do many of the data science tasks. Through a no-code GUI, users can control which ML models are available to the AutoML algorithm and perform basic feature manipulations on the input data. After training, the page provides visuals to aid in model interpretability, not just regression coefficients, hyperparameter selection, and performance metrics, but more sophisticated diagnostics like subpopulation analysis. The latter is very helpful for AI bias, where model performance may be very strong overall but weak for a vulnerable subpopulation, leading to bias. No-code solutions are hot, with AWS also releasing Sagemaker Canvas, a competing product.

More on Dataiku

Dataiku’s initial product, the “Data Science Studio,” focused on providing tooling for the individual data scientist to become more productive. With Dataiku 10, its focus is shifted to the enterprise, with features that target the CTO as well as the rank and file data scientist. This shift is not uncommon among data science vendors chasing stickier seven-figure enterprise deals with higher investor multiples. This direction mirrors similar moves by well-established competitors in the cloud enterprise data science space, including Databricks, Oracle’s Autonomous DataWarehouse, GCP Vertex, Microsoft’s Azure ML, and AWS Sagemaker, which VentureBeat has written about previously.


VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Tech News

This unified smart home standard will matter

You can count on your smart home setup but all the pomp and show falls flat when a doorbell isn’t on the same ecosystem as the lights and you need to employ different methods to control each of them. This scenario is subject to change with the introduction of Matter – a new smart home protocol that will let smart devices from different manufacturers work seamlessly together as if they were built in harmony under the same smart platform.

Conceived with the intention to facilitate the idea of a ‘perfect smart home,’ where every device works with the other, the new interoperability standard was introduced in May this year. This is not a new idea; in fact, the dream of all smart home devices working through a protocol that integrates Google, Apple and Amazon has been lurking on since 2019.

Matter is a new brand name of a home automation connectivity standard, Project CHIP – Connected Home over IP – as it was formally called. It will work across devices irrespective of the ecosystem they’re based on. If a peripheral meets the Matter certification, it will work seamlessly with the exciting certified devices whether your voice assistant is from Google, its Alexa or some other for that matter.

The background

As of now, manufacturers develop hardware with capabilities to function with a single ecosystem. For instance, Google hardware functions with its own platform and someone with an Amazon or Apple product in the same setup has to rely on a different input. This mars the concept of uniformity because of which developers and users both tend to lose unless some sort of hardware integration bridges the gap.

Zigbee Alliance has been the universal language that smart devices have relied upon to work with each other to an extent. In May, over and above the rebranding of CHIP, even the Zigbee Alliance was rechristened to Connectivity Standards Alliance, which now forms the backbone of Matter – a collaborative initiative of Apple, Google, Amazon, and a range of others in the industry.

The wireless standard remains independent as of now and will work in the same breath as say Z-Wave to connect IoT accessories, something Matter will scale up in a new dimension without involving additional hardware. It is a software-based standard, which works at the application level as opposed to Zigbee which functions at the network level.

Secure, reliable, and seamless

Smart home devices are tweaked and aligned for privacy from the word go. The setup at home still requires the whole environment (of devices) to sync amiably yet remain secure and reliable, without the trust issues. With the seamless environment, that Matter would make possible; security and reliability will be the main pillars.

Talking of the smart home setups currently; for an average consumer, it is disorganized despite uniform connectivity to an extent. Now the bigger companies like Apple, Google and Amazon have spearheaded an alliance to make this system smoother and easier for consumers, manufacturers and developers.

A unified connection between more devices, from more manufacturers, with apps that can work cross-platform; the end-consumer is definitely going to benefit. Working upon Internet Protocol (IP), Matter will enable communication across smart home devices to create an environment for users where they can command the Ring Doorbell directly through the Google Assistant – as it would be done over Alexa – without a hub or bridge.

Matter intends to lay the foundation for connected things, where devices stamped with its logo (that should someday become widely recognizable as Wi-Fi) are officially certified and easy to purchase. People can effortlessly identify it from other protocols and remove the guesswork from the process of purchasing.

Consumers can make an independent selection from a wider range of brands with the comfort and security that irrespective of the smart assistant and devices existing in their smart home ecosystem, the Matter certified device will seamlessly connect and work together. This creates interoperability where devices from multiple bands work natively together with consistency and responsiveness.

The new, unified connectivity protocol is steadily going to arrive in smart home peripherals from Google, Amazon, Apple, the Connectivity Standards Alliance, and other manufacturers. Reportedly, the first release of the Matter protocol, scheduled toward the end of this year, will run over Wi-Fi, Thread network layers and make use of Bluetooth Low Energy for commissioning.

Matter-enabled smart devices

Setting up a smart home with Matter-certified hardware should have become a reality in 2020 if it were not for the pandemic. The Matter-enabled hardware from manufacturers – at least the big three (Google, Apple and Amazon) – was pushed back and now rumors suggest we should have the first batch dropping by the end 2021.

Google took to I/O 2021 to make some sense of its plans for the future of a universal smart home, where walls of individual protocols are broken and a universally acceptable space is created. With the new smart home standard in place, Google will start by making its smart assistant (across all platforms) capable of controlling matter-enabled devices.

The Mountain View company will also be updating its Nest speakers and displays to work seamlessly with the Matter-certified hardware. More devices right from Amazon to Philips Hue and Apple Homekit-enabled hardware and more are all going to join the unified protocol. Apple has also hinted that Matter integration is going to make its way into iOS 15.

Final thoughts

Eliminating barriers of single platform compatibility in favor of unification is definitely going to unleash unlimited possibilities for the end user. New devices that roll out after Matter is officially launched, will be compatible with the new protocol and make the smart home experience amicable.

If you already have certain devices in your setup, you could wonder if all that is suddenly obsolete! Thankfully, developers will be working on backward compatibility. Philips Hue for instance is confirmed to update existing lighting to the new protocol. Manufacturers are likely make the previously devices compatible with Matter using over-the-air firmware updates. Now then, a future smart home, where Google accessories – unable to leverage the Homekit advantage today– will work seamlessly with Siri, seems evident.

Repost: Original Source and Author Link


What’s next: Machine learning at scale through unified modeling 

Elevate your enterprise data technology and strategy at Transform 2021.

Machine learning has become pervasive in businesses across industries as the technology has matured in recent years. A 2020 Deloitte study found that 67% of organizations surveyed have put machine learning to work, and 97% expected to deploy some form of it in the year ahead. With this expanding use, new considerations are emerging, namely the significant investment of resources needed for maintenance of models.

Individual models may number in the hundreds for even a mid-sized company, such as a bank. Each model requires staff attention and computing power every time it needs to be run or updated. Plus, as output volume from separate machine learning models increases, interpretation and decision making become even more complex. Our team at Credit Sesame was experiencing all of this as we added products and continued to grow our business. To make sure that our machine learning work could continue to power the organization forward, we took a step back from our routines and decided to look for scalable options.

The unified modeling we developed is an approach in which a single model, rather than a set of related but separate models, is created to power a process or product. A unified model, not to be confused with unified modeling language, is facilitated by pooling the needed data together into a single array that is passed into the model, allowing all results to be delivered in one run rather than by calling a series of models in sequence.

To develop our approach, we first selected a set of models that were likely candidates for unification. We realized there was a very high level of overlap between the top features. Next, we developed a plan for unifying the models by working with the features being passed into the combined model. Then we ran a proof of concept to test the accuracy of the unified model compared to the individual versions. We were pleased to see equal or improved accuracy from the unified model.

Our experience with this approach has shown that it’s possible to glean a number of benefits by shifting to unified machine learning models. We have seen quantifiable improvements, such as a 60% reduction in people hours needed to maintain and run models that power one of the company’s key offerings. Our gains have opened up team time, improved process efficiency, significantly reduced maintenance costs, and more.

However, applying a unified modeling approach does present a number of challenges. Unified modeling is not a one-solution-fits all, so it is essential to understand the appropriate use cases.

When a unified model makes sense

Model unification can be useful for many types of machine learning problems. Our experience with predictive models, which are widely used by organizations across industries, has shown three important conditions that should be met for taking a unified modeling approach:

  • A prediction is needed for the same target variable across a large number of related entities, or partitions
  • Each partition uses the same set of features
  • The models need to be refreshed on a frequent basis

These conditions typically are present when you’re looking to predict the values of target variables for closely related entities for purposes of comparison, or ranking and selection — for example, if you need to predict, among many lenders, the one with the highest probability of approving a particular loan. Unified models produce custom predictions for each partition (e.g., in our example, it would simultaneously predict the approval probability for each loan product).

Potential benefits of a unified model approach

When the situation is right for unified modeling, there are a number of benefits you can achieve. We tested how a shift to this method could improve six key metrics. As shown in the table below, there would likely be a 23% overall improvement across the metrics combined, with four showing gains from unified modeling and one staying the same.

A more in-depth look at the benefits in our real world experience showed that for the important area of process efficiency, the impact was significant. As our team shifted from deploying dozens of models that each powered a particular offering to using just one unified model, we experienced a 75% reduction in total steps performed. The change allowed a 60% reduction in people hours, which created substantial cost savings and opened bandwidth for the team to pursue other projects.

Additionally, a unified model helps reduce maintenance cost dramatically. Rather than working with dozens of separate models each time a business need occurs, a data science team is much more easily able to maintain one integrated model by updating it more frequently on a proactive, regular cadence.

The speed in which results are delivered is also critical to any business, especially when outcomes are needed in real time. By unifying into one model, you improve latency since all predictions are delivered at one time. We observed improvements to latency of approximately 66% in our work. Moreover, these improvements became more pronounced as the number of partitions in the data set grew.

Accuracy is always an important consideration. In our shift to a unified model, we saw accuracy increase by as much as 4% across the partitions being used. In our experience, pooling together data across partitions and fitting a predictive model on this combined data does not deteriorate the quality of outcomes.

Proceed with caution

The benefits of unified modeling can be significant for an organization. Yet there are a number of considerations to keep top of mind when implementing this approach, including data imbalances, rollbacks, and cold start needs.

Data imbalances
When developing a classification model, it is common to encounter class imbalance in the target variable. For unified models, the data is pooled from several partitions, and there can be a second layer of imbalance because certain partitions may be overrepresented. A team can correct this by upsampling the data for underrepresented partitions to promote fairness.

With unified models, teams lose some flexibility for addressing problems since it is not possible to pick and choose individual partitions to roll back (or roll forward). A team can address this issue by retraining the unified model outside of the regular refresh cycle. Alternatively, if necessary, the model can be reverted for all partitions at once, across the board. For example, if you’ve created a unified model to predict demand for the full range or a set of your company’s products, you may find, after deploying the model, issues with the results for one product. You will then need to either roll back or retrain the full model.

Cold start needs
Sometimes there may be a gap in historical data when a new partition is introduced or an old one is reactivated. While there is no straightforward solution for handling this situation, one option is to create proxies from existing partitions that can be used until enough data is collected for the new one. Organizations are likely to encounter this situation when introducing new products to their inventories.

Ongoing evolution

Unified modeling can bring significant benefits to an organization when the right criteria are met and implementation teams have strategies ready to address challenges that may emerge. As uses for machine learning continue to spread even further throughout organizations and grow in complexity, the discipline must continue to mature. Techniques like the unified modeling approach I’ve described here are a critical part of the ongoing evolution that will help meet increasing demand from organizations for machine learning solutions to solve critical business challenges and help create competitive advantages.

Pejman Makhfi is Chief Technical Officer of Credit Sesame.


VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link


Securiti releases AI-powered data privacy and security platform to provide unified controls

Join GamesBeat Summit 2021 this April 28-29. Register for a free or VIP pass today.

The proliferation of network configurations has added new layers of complexity to digital security. As enterprises embrace approaches like multi-cloud deployments and APIs, it can be more difficult than ever to maintain a broad view of the most sensitive data and detect attacks.

But for Securiti CEO Rehan Jalil, the evolution in network architecture represents a big opportunity for companies to embrace a more unified approach to data management and security. To address this emerging challenge, Securiti today announced the release of its new privacy and control platform.

The goal is to unify data security, privacy, governance, and compliance across all types of networks. This is particularly critical as more sensitive data moves online, creating even more enticing targets for hackers.

“Data brings two big O’s,” Jalil said. “One big O is the opportunity, and we all know about it. But the other big O is the obligations. And those obligations are tied to making sure that you can keep it secure.”

The company also announced that Cisco had invested an undisclosed amount as part of a security partnership. In a blog post, Cisco Investments director Prasad Parthasarathi wrote that Securiti had made big strides toward overcoming the fragmented approach to security. The deal is also part of Cisco’s growing investment in security.

“The issues of security, privacy, governance, and compliance for sensitive assets and data have been addressed in separate silos,” he wrote. “Legacy architectures have proven to be ill-equipped to handle all these needs, particularly at today’s hyperscale environments with data across hundreds of different types of data systems.”

Rather than taking a piecemeal approach to different aspects of data security, the company set out two years ago to build a broad platform, Jalil said. He described the company’s approach as creating a “cyber mesh of a perimeter of security around the data wherever it exists.”

With the expanding number of environments, data is becoming increasingly distributed, making it a bigger challenge to manage security, privacy, and compliance. This has given rise to a market of solutions known as Data SPaC.

“Often these things are done in silos,” Jalil said. “But our company provides all three capabilities.”

The key to enabling that is the company’s sensitive data intelligence (SDI) technology, which helps automate the creation of the security perimeter while building a framework around how data is being used internally. Using artificial intelligence, the SDI discovers, classifies, tags, and catalogs sensitive data that can be found scattered across all cloud environments and on-premises.

Once the company has a unified view of that data, Securiti’s tools can help do everything from monitoring to protection and remediation.

In addition to catching Cisco’s eye, Securiti has landed on CB Insights’ list of the 100 most innovative AI companies. The company has also raised $81 million in venture capital to date.


VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Repost: Original Source and Author Link