Bright Data provides cutting-edge data collection solutions tailored for businesses looking to make informed decisions and gain a competitive edge in their industries. Using their solution, businesses can tap into the performance of retailers and brands across the eCommerce world to measure market shares and conversions, understand shoppers’ journeys, and discover the most influential product attributes.

To achieve this, Bright Data collects, enriches, and normalizes data from thousands of different sources, allowing their users to derive insights from the information. This is especially true for their “Bright Insights” product, which scrapes retail data from thousands of sites and transforms it into actionable market intelligence.

To maintain high-quality data across diverse and evolving data sources, Bright Data built a homegrown data quality monitoring system on top of Athena and Quicksight. While this system effectively detected known issues, such as missing data, the team recognized the need for a more robust approach.

To ensure data reliability as they drive value to their customers, they required a solution that could intelligently identify any potential problems, promptly isolate their root causes, and ensure that only meaningful alerts were raised.

Challenge: Ensuring data reliability in rapidly changing data sources

With thousands of unique data sources, Bright Data faces the challenge of ensuring that its data remains accurate and reliable in a rapidly changing digital environment. Unlike static data sources, the variety of inputs used by Bright Data—ranging from retail websites to social media platforms—undergo frequent changes, potentially introducing new data quality issues that traditional rule-based checks fail to detect.

Bright Data’s primary concern is maintaining the highest quality for its customer-facing data products, such as the Bright Insights platform. Any discrepancies in the data could directly affect the decision-making process for Bright Data’s clients, who rely on precise, timely information to guide their strategies.

Bright Data needs to ensure data reliability both in terms of metadata (e.g., the volume of rows and items collected) and the content of the data itself, ensuring that unknown issues do not erode customer trust.

“We’ve added monitors to detect problems we saw in the past, but we are constantly facing new problems that we didn’t know to anticipate. I personally worry about the unknown unknowns which can creep into the data at any moment,” said Eran Dror, CTO of Bright Insights.

Solution: Data observability with Upriver

Because of the nature of their data, Eran and the team knew they needed an observability tool that could detect any possible issues quickly, at a very high granularity, and pinpoint the exact cause using intelligent root cause analysis capabilities.

“Upriver’s ability to create different profiles for the same data using ‘pivot fields’ was a real game changer,” said Eran. “It allowed us to detect issues at a very granular level for each of the sources we collect, even though they are all collected into the same place.”

Beyond just detecting issues, Upriver’s AI-driven root cause analysis functionality empowered Bright Data’s team to rapidly identify the source and nature of any incident. This reduced the time spent on triage and resolution, letting the team address the core problem immediately.

Just as importantly, Upriver’s very low false positive rate ensured that only relevant issues were surfaced to the team. This helped Bright Data avoid alert fatigue, allowing the engineers to remain focused on resolving genuine issues and continuously to drive value.

Outcome: Sharp decline in data issues found by the customers

Bright Data was able to immediately derive value from Upriver’s platform, benefiting from deep-level profiling and accurately flagged incidents that genuinely required attention.

“Before Upriver, we only had monitors for very basic metrics, but we knew they could be hiding a lot of issues,” said Eran. “With Upriver’s root cause analysis, we can quickly find exactly where the problem lies and resolve it faster than ever before. The low false positive rate means we’re only working on real problems, which has saved us time and improved our overall data quality.”

By automatically detecting and diagnosing new issues from the moment a new data source was added, Upriver enabled Bright Data to confidently deliver more value to its customers. The result has been a sharp decline in customer-discovered data issues, improved data integrity, and streamlined incident resolution.

“In just one hour, Upriver was deployed in our production environment, and since then I’ve trusted my data completely” said Eran.

Today, Upriver monitors hundreds of data sources collected by Bright Data and has become an integral part of their system. With accurate detection and swift root cause identification, Upriver ensures that Bright Data continues to provide exceptional value to its customers.