Providing insights into Emerging Market consumers’ digital usage patterns with Caribou Data

2.5m+

of usage pattern log files processed

100GB

of usage pattern log data processed

20k+

panellists with 20k+ mobile devices registered

5+

emerging market countries

Overview

Caribou Data, a subsidiary of Caribou Digital, is dedicated to providing ethical data solutions and insights into consumer digital behavior in emerging markets.

Faced with the challenge of scaling their analytics platform while maintaining strict user privacy, Caribou Data partnered with Lambert Labs to leverage AWS. By migrating from a single Amazon EC2 instance to a robust AWS architecture, Caribou Data implemented a scalable and reliable platform with a secure, event-driven ETL pipeline. This solution empowers Caribou Data to deliver valuable insights to telecommunications providers and other organizations, while upholding their commitment to user privacy and ethical data handling.

download (5)

Lambert Labs is essential to our business…Highly recommended
Will Croft, Co-Founder, Caribou Data

Opportunity / Customer Challenge

Caribou Data is dedicated to delivering insightful digital product analytics while upholding the highest standards of user privacy and anonymity. Their initial infrastructure, based on a single EC2 instance, was becoming a bottleneck as their data volumes and user traffic increased.

The primary challenge was to migrate to a scalable and reliable AWS environment. This involved building a robust ETL pipeline capable of processing consumer mobile app data, with stringent anonymization and privacy measures embedded throughout. Additionally, Caribou Data needed to provide data scientists with effective data visualization tools to extract actionable insights into user behavior, all while ensuring data privacy.

Ultimately, their opportunity was to demonstrate that powerful analytics and uncompromising user privacy could coexist, setting a new benchmark for ethical data handling within the industry.

Solution

To address Caribou Data’s challenges and enable their ethical data analytics platform, a comprehensive solution was implemented leveraging a diverse range of AWS services. The core of the solution involved migrating from a single EC2 instance to a robust, service-oriented architecture, ensuring scalability, reliability, and security.

The foundation of the solution was the establishment of a scalable and secure web platform using AWS Elastic Beanstalk. This facilitated the deployment and management of the Django applications, ensuring consistent performance and automatic scaling capabilities. The platform was further enhanced by utilizing Amazon Elastic Container Registry to store and manage Docker container images, streamlining the deployment process. Amazon RDS for PostgreSQL was implemented to provide a managed, high-performance database solution, ensuring data integrity and availability. Amazon ElastiCache was used to further increase performance and cache frequently requested data.

A critical component of the solution was the development of an event-driven ETL pipeline, designed to process consumer data while maintaining strict privacy standards. As shown in the diagram, data originates from an Android app and is stored in Amazon S3. This triggers an AWS Lambda function for asset verification. Subsequent Lambda functions perform data processing and transformations, leveraging Python and the Pandas library, ensuring efficient and scalable data handling. Data is then stored in further S3 buckets. For data analysis, developers can securely access data through an OpenVPN connection, running scripts and loading results into Amazon Redshift for data warehousing. The architecture also allows for developers to run specific calculations via dedicated Lambda functions.

This event-driven, serverless approach provides Caribou Data with a scalable, secure, and efficient ETL pipeline, enabling them to deliver powerful analytics while upholding their commitment to user privacy.

Lambert Labs is essential to our business, having taken on responsibilities across our entire tech stack. In addition to the team being a pleasure to work with, they are incredibly diligent and knowledgeable, always offering insightful, astute solutions. Highly recommended. (Will Croft, Co-Founder, Caribou Data)

Outcome

The successful implementation of the AWS-based solution delivered significant positive outcomes for Caribou Data, enabling them to realize their vision of ethical and impactful data analytics.

Caribou Data’s platform has garnered significant interest from major telecommunications providers in Kenya, who are leveraging the generated analytics and insights to enhance their services and understand customer behavior. This real-world application of the platform demonstrates its effectiveness in providing valuable data-driven intelligence.

Furthermore, the Gates Foundation recognized the potential of Caribou Data’s work and provided funding, highlighting the importance and impact of their ethical approach to data collection and analysis in emerging markets. In essence, the AWS infrastructure has empowered Caribou Data to deliver on its core mission: providing valuable analytics while upholding the highest standards of user privacy, leading to tangible results and recognition from key industry players and philanthropic organizations.

As well as the usual challenges associated with web platforms and ETL pipelines, there was the additional goal of strictly preserving users’ privacy and anonymity. AWS’s flexibility proved essential for this. (George Lambert, Founder & CEO, Lambert Labs)

About Caribou Data

Caribou Data is a technology company focused on providing ethical data solutions and insights within the realm of digital product analytics. They are committed to empowering organizations with valuable data-driven intelligence while prioritizing user privacy and anonymity. Caribou Data works to provide insights into how consumers use digital products via their mobile apps. With a focus on responsible data handling, Caribou Data collaborates with various organizations to unlock the potential of data for positive impact.