Portfolio - Breast Cancer QA Project

Research Poster Thumbnail — Presented at BC Cancer Summit 2024 – Quality Assurance of AI-based IMN Contours

Background

Breast cancer can spread to the internal mammary lymph nodes (IMNs), located near the center of the chest. When involvement is confirmed, IMN irradiation (IMNI) can improve outcomes; however, its use in low-risk patients remains controversial due to potential radiation exposure to the heart, lungs, and contralateral breast. This project contributes to a retrospective population study at BC Cancer investigating how primary tumor location and IMN radiation dose relate to survival outcomes. To support this research, a data pipeline is being developed to automate the import and segmentation of approximately 19,000 patient CT datasets (2005–2014) using Limbus Contour, a deep learning-based auto-segmentation software. My contributions include developing C# scripts for contour transfer and a QA system to evaluate the accuracy of AI-generated contours for future pipeline integration.

1.0 Project Objectives

Develop a QA pipeline to assess quality of AI-generated IMN contours.
Establish a reliable ground-truth set for evaluation.
Support retrospective research into the effects of IMN irradiation on survival outcomes.

Driving Challenges:

Clinical IMN contours were inconsistent due to varying guidelines (RTOG vs ESTRO) and incomplete slices.

Inconsistent IMN — Inconsistencies in IMN contouring: Circled mislabeled/incomplete structures.

AI-generated contours required systematic QA to identify errors before use in dosimetric studies.

Good IMN Contour Example — Example of correctly contoured (top) and poor (bottom) IMN structures identified by the QA tool (the bottom contour in the saggital view is too long).

Bad IMN Contour Example — Example of correctly contoured (top) and poor (bottom) IMN structures identified by the QA tool (the bottom contour in the saggital view is too long).

2.0 My Contributions

Developed contour transfer and QA scripts in C# using the Eclipse Scripting API.
Designed statistical filtering logic (Median ± 2SD, IQR, Min/Max) for quality flagging.
- Compared performance of different thresholds using a reviewed subset of 100 patient cases.
- Refined the algorithm to balance false positive and false negative rates.
Introduced normalized ratio metrics (IMN:Lung, IMN:Chest Wall) to reduce anatomical variability bias.
- Implemented relative ratio metrics between the IMN and reference organs to enhance filter reliability and account for anatomical variability across patients.
Contributed to the research poster and presentation at the BC Cancer Summit 2024.

3.0 Methodology

3.1 Implementation

In-house scripts were written in C# using the Eclipse Scripting API (ESAPI) to analyze contour quality. The QA tool flagged patients with contours outside of statistically defined ranges (median ± 2SD, Min/Max, IQR) or with structural faults such as missing slices.

QA Tool Workflow Diagram — Pipeline for contour QA analysis and flagging process.

3.2 Evaluation Approaches

Method A: Compared AI contours against clinical contours. This was unreliable due to inconsistent guidelines (RTOG vs ESTRO) and incomplete slices.
Method B: Compared AI contours to a reviewed subset of 100 Limbus-generated contours (RTOG guideline), providing a consistent baseline.

3.3 Technical Challenges

One of the main challenges was that filtering contours based purely on their absolute length along the z-axis could inadvertently exclude patients with atypical anatomy, such as very tall or very short individuals. To address this, we instead used ratios (such as IMN:Lung and IMN:Chest Wall) to normalize contour dimensions relative to each patient’s anatomy.

Ratios — Comparison of IMN:reference organ ratios: IMN:Chest wall (top), IMN:Lung (bottom)

4.0 Key Results

Method	Contours Flagged (n=100)	Most Common Error
Method A (Clinical)	3	Missing slices
Method B (AI-reviewed subset)	32	Z-length errors
Clinician Review	47	Z-length errors

Comparison of baseline contour sources used for QA evaluation.

Comparison of filter sensitivity and specificity using absolute organ lengths vs length ratios

5.0 Findings & Discussion

Clinical contours were too inconsistent to serve as a reliable ground-truth. The reviewed subset of Limbus contours was more effective for QA. The most common failure mode was incorrect Z-dimension contouring. Absolute length metrics introduced patient-size bias, making normalized ratios a better approach.

6.0 Tools & Technologies

C# Eclipse Scripting API (ESAPI) Medical Imaging Deep Learning QA Data Normalization

7.0 Next Steps

Expand the dataset used for defining clinically acceptable ranges.
Incorporate anatomical context (e.g., relative placement to chest wall or lungs).
Integrate QA directly into the pipeline for real-time flagging of errors.
Validate the tool across larger and independent patient cohorts.

8.0 Acknowledgments

This project was supported by the BC Cancer Foundation’s Sprakkar Award and a research agreement with Limbus AI/Radformation. Collaborators include Amy Frederick (PhD), Alanah Bergman (PhD), Tania Karan (MSc), and Alan Nichol (MD).

9.0 Reflection

This project taught me how to bridge machine learning with clinical workflows, emphasizing the importance of quality assurance in large-scale AI studies. It also deepened my understanding of radiotherapy planning and the practical challenges of integrating AI into healthcare.