Open Projects - Ding Lab

Project 1: Automatic GTV Segmentation from IGTV in planning CT

Aim

Automatically generate clinically usable GTVs on planning CT / a selected respiratory-phase CT given an existing IGTV. Combine conventional 3D segmentation methods with segmentation foundation models (e.g., SAM2 / MedSAM2 ) and explore weak/semisupervised strategies to reduce annotation needs. Produce end-to-end results and provide basic uncertainty indicators for human review.

Meaning

IGTV is commonly available in clinical practice, while per‑phase GTV annotations are scarce. Using IGTV as a constraint, inferring single‑phase GTVs is reasonable and can improve workflow efficiency and consistency. Foundation models’ prompt and fine‑tuning capabilities can substantially lower labeling cost; combined with weak/semisupervised methods, practical accuracy can be achieved with few annotations. The outcome can support routine planning and adaptive workflows and reduce repetitive manual contouring.

Resources

Data: Existing Johns Hopkins set (60 cases: planning CT / 4D CT + IGTV) and supplement with public datasets to improve generalization.
Concept: Model the task as “predict GTV on a CT within the IGTV region.” Use a hybrid strategy: a 3D segmentation baseline, foundation-model prompt/fine‑tune usage, and weak/semisupervised training to expand supervision.
Baseline model: 3D U-Net / nnU-Net variant taking CT + IGTV mask as input; incorporate IGTV-based constraints during training/postprocessing.
Foundation models: Apply SAM2 / MedSAM2, and so on, in prompt-as-mask mode and/or with light fine‑tuning/adapters; include a simple 3D fusion step to preserve inter-slice consistency.
Weak / semi‑supervised methods: Pseudo-label self-training, mean‑teacher / consistency training, and hybrid supervision that treats IGTV as an upper‑bound constraint.
Postprocessing & QC: Connected-component filtering, volume thresholds, uncertainty flags for clinician review; support export to DICOM-RTSTRUCT.
Evaluation: Quantify Dice, 95% Hausdorff distance, relative volume error, centroid shift, and relevant dosimetric metrics; compare baseline, foundation‑model variants, and semi‑supervised approaches.

Interested? Contact [email protected]

Project 2: A vision language model approach for lung nodule detection in planning CT

Aim

Build an assistance system that takes planning CT + EHR (radiation therapy records, radiology reports, pathology, clinical summaries, etc.) as input and outputs candidate lesions (heatmap / bbox / center points) and, based on EHR, provides confidence/prioritization cues for rapid review and downstream contouring by radiation therapists/physicists.

Meaning

Clinical value: Use EHR priors (side, lobe, history) to reduce misses (e.g., “previous right upper lobe lesion” increases priority for RUL candidates), shorten review and contouring time, and lower risk of target-miss in radiation planning.
Scientific value: Evaluate image–text fusion benefits in low-sample / weak-label medical settings and investigate synthetic report generation for cross-modal pretraining. Provide a foundation for downstream lesion characterization (benign vs malignant, surveillance recommendation).
Engineering & deployment value: Modular design (candidate generator + textual re-ranker) eases PACS/TPS integration and deployment. Produce auditable outputs and evidence snippets for clinical traceability and regulatory review.

Resources

Targets:

Recall-first behavior: on the FROC curve, noticeably outperforms an image-only baseline at common FP/scan operating points (e.g., recall improvement ≥ 5–10% at ~1–2 FP/scan).
Localization accuracy: center distance error within clinically acceptable range (e.g., mean center error < 5 mm, or IoU ≥ 0.3 for small nodules).
Usability: outputs include explainable evidence (candidate heatmap + corresponding EHR snippets) and allow threshold tuning to control the number of candidates.
Design priorities: maximize recall → interpretability → easy integration (modular interfaces, minimal disruption of existing workflow).

Data sources:

Local: ~200 planning CTs from Hopkins (or similar), with associated EHR: RT-planning notes, radiology reports, discharge summaries, prior imaging descriptions, pathology/surgery records.
Public: LUNA16 / LUNA25 / other public CT nodule sets for visual pre-training and candidate generator tuning.

Pre/Annotation:

Image preprocessing: Resample to consistent spacing (e.g., 1 mm^3 or clinical standard), clip window (e.g., [-1000, 400] HU), intensity normalization. (Optional: lung segmentation to reduce search space and noise in thoracic cases.)
Annotation normalization: Standardize annotations as center+radius or bbox (x,y,z, size) and include confidence/malignancy scores when available.
De-identify (remove names, IDs, absolute dates), extract key fields with rule-based and model-based NLP: laterality, anatomical location (lobe/segment), timing, prior surgeries/radiation, pathology results; retain the original sentence as short evidence snippets.
For public image sets, generate synthetic short reports (templated by location, size, and suspicion) to enable cross-modal pretraining.

Model architecture (modular):

Visual candidate generator (recall-oriented): 3D heatmap/detection network (e.g., nnU-Net, nnDetection, 3D RetinaNet, 3D U-Net heatmap) producing N candidate RoIs per scan with high sensitivity.
Text encoder: Pretrained clinical transformer (ClinicalBERT, PubMedBERT, Bio+SBERT) to convert EHR snippets into vectors.
Fusion / re-ranking module (lightweight preferred initially): Extract visual features per RoI (RoI pooling or small CNN/MLP), concatenate with text embedding, and pass through an MLP or shallow cross-attention layer to produce a re-score for each candidate.
Optional stronger approach if data allows: cross-modal transformer or CLIP-style contrastive alignment for tighter semantic alignment.

Outputs:

Ranked candidate list (coordinates, size, re-score), corresponding EHR evidence snippet(s), and visualization heatmap for review.

Training:

Phase A — Visual pretraining: Train the high-recall detector on public datasets (LUNA) with losses such as focal/BCE for heatmaps, L1 for center regression, IoU loss for box refinement. Use augmentations (translation, rotation, intensity perturbation) and hard negative mining for robustness.
Phase B — Cross-modal pre-warming (optional): Use public CTs + synthetic reports for CLIP-style contrastive pretraining to align RoI visual embeddings with text embeddings.
Phase C — Fine-tuning with local EHR (critical): Freeze most of the pretrained visual backbone; train the RoI feature head and the fusion/re-ranking layers to avoid overfitting. Use ranking losses (pairwise ranking loss) or cross-entropy re-scoring target so that true EHR-matching RoIs get higher scores. Include hard negatives (other RoIs in same scan that do not match the text) to improve discrimination.
Phase D — End-to-end fine-tuning (if sufficient data): If local aligned image-text pairs become plentiful, consider deeper fine-tuning of cross-modal transformers or end-to-end training.

Evaluation & Studies:

Technical metrics: FROC (recall vs FP/scan), recall@1/2 FP, AP, mean center distance (mm), IoU distribution.
Clinical/user studies: Measure time saved in contouring, rate of treatment-plan changes prompted by candidates, small reader studies comparing assisted vs unassisted reading for accuracy and efficiency.
Ablations: Compare image-only, image+synthetic text, image+real EHR; test different fusion methods and sensitivity to candidate-threshold settings.

Deployment:

Deployment design: Modular deployment: candidate generator runs on image server; fusion/re-ranker can be optional service. Expose configurable thresholds and maximum candidate counts; integrate via PACS/TPS plugins or a web UI.

Interested? Contact [email protected]

Project 3: Exploring Frequency-Guided Diffusion Model for Medical Imaging Translation

Aim

Reproduce and adapt Frequency-Guided Diffusion Models (FGDM) for CBCT→CT / MRI→CT translation; prototype extensions (band decomposition, adaptive weighting, learnable filters, frequency-domain losses) and evaluate on medical datasets.

Meaning

Frequency priors (edges / FFT bands) help preserve structure and reduce artifacts—crucial for medical image translation where anatomical fidelity matters.
Dynamic or learnable frequency guidance may better balance global structure vs fine detail across diffusion steps.

Resources

Codebases:

FGDM: https://github.com/Kent0n-Li/FGDM

Datasets:

SynthRAD2023: https://synthrad2023.grand-challenge.org/
SynthRAD2025: https://synthrad2025.grand-challenge.org/

Methods:

Frequency Band Decomposition: Split images into low/mid/high bands using filters such as wavelets and contourlets and guide the diffusion model with selected bands.
Dynamic Frequency Guidance: Adjust frequency emphasis at different diffusion steps (low->global, high->refinement).
Learnable Frequency Filters: Replace fixed Sobel/FFT with trainable convolutional filters to capture task-specific frequencies.
Frequency Loss Functions: Add explicit loss in frequency domain (FFT-MSE, bandwise SSIM) to enforce fidelity.

Study Guide:

Reproduce FGDM → adapt to CBCT–CT / MRI–CT → test extensions on medical datasets; measure structural fidelity and artifact suppression.

References:

[1] Li, Y., Shao, H.-C., Liang, X., Chen, L., Li, R., Jiang, S., Wang, J., & Zhang, Y. (2024). Zero-Shot Medical Image Translation via Frequency-Guided Diffusion Models. *IEEE Transactions on Medical Imaging*, *43*(3), 980–993. https://doi.org/10.1109/TMI.2023.3325703
[2] Zhang, Y., Li, L., Wang, J., Yang, X., Zhou, H., He, J., Xie, Y., Jiang, Y., Sun, W., Zhang, X., Zhou, G., & Zhang, Z. (2025). Texture-preserving diffusion model for CBCT-to-CT synthesis. *Medical Image Analysis*, *99*, 103362. https://doi.org/10.1016/j.media.2024.103362
[3] Chen, J., Ye, Z., Zhang, R., Li, H., Fang, B., Zhang, L., & Wang, W. (2025). Medical image translation with deep learning: Advances, datasets and perspectives. *Medical Image Analysis*, *103*, 103605. https://doi.org/10.1016/j.media.2025.103605

Interested? Contact [email protected]