Background:
The diagnosis of a malignancy is typically informed by clinical presentation and tumor tissue features including cell morphology, immunohistochemistry, cytogenetics, and molecular markers. However, in approximately 5-10% of cancers1,2, ambiguity is high enough that no tissue of origin can be determined and the specimen is labeled as a Cancer of Occult/Unknown Primary (CUP). Lack of reliable classification of a tumor poses a significant treatment dilemma for the oncologist leading to inappropriate and/or delayed treatment. Gene expression profiling has been used to try to identify the tumor type for CUP patients, but suffers from a number of inherent limitations. Specifically, tumor percentage, variation in expression, and the dynamic nature of RNA all contribute to suboptimal performance. For example, one commercial RNA-based assay has sensitivity of 83% in a test set of 187 tumors and confirmed results on only 78% of a separate 300 sample validation set3.
Methods:
55,780 tumor patients with NGS data were used to construct a multiple parameter tumor type specific classification system using an advanced machine learning approach.
Conclusions:
- Final performance of DNA-based tumor type identification on an independent test of 15,000+ patient samples is superior to current standards using gene expression based methods
- Unbiased training machine learning techniques applied to more than 45,000 enabled detection of tumor types independent of sampling location or tumor percentage
- Tumor type predictors can render a histologic diagnosis to CUP cases that can inform treatment and potentially improve outcomes
- Cancer of unknown primary remains a substantial problem for both clinicians and patients, diagnosis can be aided with the algorithms presented here.
- Returning both diagnostic and therapeutic information that optimize patients treatment strategy from a single test is a substantial improvement over the current standard of multiple tests that require more tissue