ADAS/AD Data Processing Pipeline — Constructing internally or acquiring a pre-assembled solution

By IVEX.ai

IVEX.ai
4 min readMar 8, 2024

The shift towards software-defined vehicles has led to a massive increase in the amount of data generated by vehicles. This data encompasses a wide range of information, including sensor data, telemetry, diagnostics, and more. Effectively processing and gaining insights from this data during the development process is crucial.

Deciding whether to build and maintain a data processing pipeline in-house is a crucial consideration for OEMs (Original Equipment Manufacturers) and automotive Tier 1 suppliers. The data processing pipeline is a fundamental component in handling the massive amounts of data generated by modern software-defined vehicles.

Building a robust data processing pipeline for Advanced Driver Assistance Systems (ADAS) and Autonomous Driving (AD) data on a cloud platform like AWS requires a comprehensive understanding of various services and their integration. The specific services you might use can depend on the requirements, data sources, and processing needs of your application.

At IVEX, we developed a data processing pipeline specifically for handling ADAS/AD data. The primary objective of this pipeline is to autonomously identify noteworthy events and scenarios within the input data, which comprises solely raw sensor records. Constructing such a data processing pipeline entails careful consideration of various technical aspects. Questions arise regarding the storage of raw sensor data on the cloud, the execution of algorithms on this raw data (including machine learning algorithms with specific requirements such as GPU access), the storage mechanism for post-processed data like events and scenarios, version tracking for employed algorithms, visualization of results, and ensuring that data visibility is restricted to authorized users.

Our data processing pipeline relies on several AWS services for seamless functionality. The following AWS services have been strategically employed:

- Data from raw sensors, including lidar point clouds, camera images, and GNSS information, is stored within the S3 storage service. S3 serves as a staging ground for recorded data, providing extended storage for post-processed data and a short-term storage solution for cost-effective processing. Additionally, S3 functions as our primary “processing volume” using the S3 Mount Point feature, enabling its utilization as a file system. While it is not fully POSIX compliant, presenting a limitation for certain workloads, we have addressed this concern by incorporating EFS and the potential addition of FSx to ensure compatibility as needed.

- The processed data, which includes significant events and scenarios, is stored in both the RDS (Relational Database Service) and DocDB. RDS serves as an efficient repository for organizing tagged data essential for analytics. Meanwhile, DocDB operates as a document store specifically designed for rapidly changing data and binary data necessary for display purposes.

- EKS and EC2 handle both algorithm execution and visualization tasks. EKS serves as the host for a range of services, including back-end, data services, front-end, and processing services. EC2 is primarily employed to provision machines in alignment with our specified rules for EKS.

- Versioning of algorithms is managed through ECR, which is utilized for storing Docker container images.

- Authentication is conducted using Cognito, with the flexibility to be replaced by any OpenID Connect (OIDC) solution if needed.

- Transfer of data and temporary data storage are managed through EFS. EFS operates as our transient processing area, serving as a space where any data processing pipeline can deposit intermediate data and facilitate data sharing among various processes. It is chosen as an alternative to S3 as a file system due to its full adherence to POSIX compliance.

This example highlights the involvement of numerous cloud services and underscores the need to address various technical complexities in building a robust ADAS/AD data processing pipeline. Additionally, challenges such as organizing input data, ensuring compatibility in data formats, and managing and monitoring changes in data formats must be addressed. For instance, as the ADAS/AD system evolves, the addition of more sensors and the necessity to manage different vehicle configurations become crucial considerations in the data processing pipeline. Without proper attention, these factors may lead to incorrect data processing results and misinterpretation of the outcomes.

As an indication, here is the breakdown for the estimated effort and cost to build such data processing pipeline which tags 12 types of driving scenarios, extract driving parameters, and allows the visualization of large files (≥ 10TB).

Resolving all these issues demands a substantial amount of effort. It is easier to see that such effort easily takes a team of 18 engineers for more than 2 years of work.

Ultimately, it becomes evident that opting for a pre-built data processing pipeline would incur lower costs, both in terms of finances and time. The time and cost saved can then be allocated to developing crucial aspects in OEM and Tier1 products.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

IVEX.ai
IVEX.ai

No responses yet

Write a response