How to analyze  Copy Number Variation Calculation

Copy Number Variation (CNV) detection is a critical tool for estimating the variation in the copy number of genomic segments, especially in cases of cancer and other proliferative disorders. In cancer patients, the oncogenes may be amplified to high copy numbers. For example, in Human Breast Cancer patients with a human epidermal growth factor 2 (HER2) positive type breast tumor classification, the amplification status of HER2 is critical for determining the type of therapy to be indicated for patients.

Given a digital PCR experiment, a well containing the partitioned sample of interest, and a target gene to quantify with respect to a reference gene, the calculation of CNV is based on the following variables:

  • \(N\): total number of analyzable partitions in the well
  • \(p_1\) : number of positive partitions for the target gene
  • \(p_2\) : number of positive partitions for the reference gene

The CNV is simply given by the ratio of the estimated concentration of the target gene to the estimated concentration of the reference gene. Both of these concentrations are estimated by applying the Poisson law as explained in the item Poisson Law Computation.

The simplified formula for the estimated concentration ratio is:

\(R = \dfrac{ln \left(  1 – \dfrac{p_1}{N} \right)}{ ln \left(  1 – \dfrac{p_2}{N} \right)} \)

The confidence interval of the estimated concentration ratio is more difficult to determine. One approximation method is provided by Whale et. al. [PMID:22373922], which assumes that the concentration ratio follows a Normal distribution with standard score \(z_c\) if \(N\) is large enough, and uses a first order error propagation:

\(R_{min} = R \ e^{- \dfrac{z_c}{N}  \sqrt{ \dfrac{p_1}{\left(1 – \dfrac{p_1}{N} \right) \left( ln (1 – \dfrac{p_1}{N}) \right)^2} +  \dfrac{p_2}{\left(1 – \dfrac{p_2}{N} \right) \left( ln (1 – \dfrac{p_2}{N}) \right)^2} } } \)

\(R_{max} = R \ e^{+ \dfrac{z_c}{N}  \sqrt{ \dfrac{p_1}{\left(1 – \dfrac{p_1}{N} \right) \left( ln (1 – \dfrac{p_1}{N}) \right)^2} +  \dfrac{p_2}{\left(1 – \dfrac{p_2}{N} \right) \left( ln (1 – \dfrac{p_2}{N}) \right)^2} } } \)

Then, the relative uncertainty (also called uncertainty coefficient) is:

\(U = \dfrac{R_{max} – R_{min}}{2 \ R} \)

Note that, for a 95% confidence level: \(z_c = 1.96\)

If you need to automatically compute the CNV of target genes with respect to reference genes, together with their confidence interval and relative uncertainty, an online calculator is provided for Copy Number Variation, let’s try it!

Other methods are provided in the state of the art to estimate the confidence interval of a CNV. One of them, proposed by Dube et. al. [PMID:18682853], is a method based on Fieller’s theorem and assuming independence between the target reference genes.

References:

  • Whale et al., “Comparison of microfluidic digital PCR and conventional quantitative PCR for measuring copy number variation”, Nucleic Acids Res. 40(11):e82., June 2012, PMID:22373922
  • Dube et al., “Mathematical analysis of copy number variation in a DNA sample using digital PCR on a nanofluidic device”, PLoS One. 6;3(8):e2876, August 2008, PMID:18682853