高通量癌症研究_0620下载_在线阅读_38

is_372501

暂无简介

高通量癌症研究_0620 基于高通量测序技术的癌症研究林钊 linzhao@genomics.cn Cancer Background CACER GENOMICS n Cancers are caused by changes that have occurred in the DNA sequence of the genomes of cancer cells n Characteristic: The high heterogenicity in the different cancer tissue,different...

基于高通量测序技术的癌症研究林钊 linzhao@genomics.cn Cancer Background CACER GENOMICS n Cancers are caused by changes that have occurred in the DNA sequence of the genomes of cancer cells n Characteristic: The high heterogenicity in the different cancer tissue,different developing period n Target: ü a comprehensive catalogue of somatic mutations cancer samples ü identification of further potentially druggable cancer genes ü utility of somatic mutations as biomarkers for prognosis hypothesis-driven data-driven, large scale analysis 6  Unable to detect rare variants，MAF>5%.  Rare SNPs were true diseases risk variants.  Classical methods have just looked at cancer cells and sequenced genes known or suspected to be linked to cancer,it may overlooked key mutations, especially new ones.  Hypothesis genes chosen, long cycle time and low successful rate. Problems and difficulties of classical methods 7 MR Stratton et al. Nature 458, 719-724 (2009) All these can be solved by sequencing It’s time to sequencing！ 8 Overview of Cancer Solutions Exome sequencing Whole genome sequencing Cell line Single-cell sequencing Research design 100 tumor and 100 control 50X /sample 10 groups (blood+ tumor tissue) 30X per sample whole genome sequencing 50X 170- 800bp PE; 20X 2k- 40kbp PE; 50X exome of 20 normal and 100 tumor single cells; Deliverable s find SNV ， Indel find SNV, indel, CNV,SV,Viru s integrations or rearrange- ments find SNV， indel find SNV ,SV, novel squence by assembly 9 100 tumor and 100 control> 50X /sample Background: Ø The high heterogenicity in the same cancer tissue Ø Require hundreds of cases to be sequenced to identify a cancer gene that is mutated in Scientific goal: Ø To detect the most of the somatic mutations Ø Try to Identify drive and passenger Cancer Solution 1: Exome squencing Exome Sequencing:>50×depth Alignment with SOAPaligner SNVs detected by SNVdetector or other softwares Quality control Potential somatic SNVs Excluding SNVs in dbSNP/YH/1000 genomes Somatic mutations Indels (short reads) Alignment to reference genome Indels detected by SoapSV or other softwares Excluding indels in dbSNP/YH/1000 genomes Filtering out indels in normal tissues Somatic indels Analysis Pipeline Sequencing Data Production Normal Sequencing analysis GC-201 GC-202 GC-203 GC-204 GC-205 GC-206 GC-207 GC-208 GC-209 GC-210 Total effective reads(M) 11.7 11.76 11.75 11.83 21.44 12.19 12.46 21.02 21.52 9.33 Total effective yield(Mb) 856.08 861.88 808.88 823.36 1558.41 899.66 915.95 1509.57 1558.94 746.08 Effective sequence on target(Mb) 334.59 302.87 290.43 281.15 550.05 318.27 321.69 529.31 549.31 293.7 Average sequencing depth on target 9.81 8.88 8.51 8.24 16.13 9.33 9.43 15.52 16.1 8.61 Coverage of target region 92.7% 90.8% 91.8% 92.5% 94.3% 93.2% 92.0% 94.3% 94.6% 92.2% Tumor Sequencing analysis GC-201 GC-202 GC-203 GC-204 GC-205 GC-206 GC-207 GC-208 GC-209 GC-210 Total effective reads(M) 40.08 37.04 32.16 32.7 37.62 35.96 32.1 37.15 34.95 44.38 Total effective yield(Mb) 2930.61 2831.84 2395.21 2433.29 2864.62 2728.28 2381.05 2823.45 2644.37 3550.2 Effective sequence on target(Mb) 1075.9 971.22 824.17 851.02 1040.93 986.48 865.74 1024.37 995.13 1397.8 Average sequencing depth on target 31.54 28.47 24.16 24.95 30.52 28.92 25.38 30.03 29.18 40.98 Coverage of target region 95.5% 94.8% 94.8% 95.1% 95.0% 95.2% 94.6% 95.0% 95.3% 95.5% 8277 somatic SNVs 760 (9.2%) new SNVs 414 (54.5%)non- synonymous and splice-site SNVs 249 random select SNV for technical validation 216 (86.7%)validated 357 predicted cancer genes 244 novel predicted cancer genes 113 recorded in COSMIC 7517 present in dbSNP and 1000 genome project 346 synonymous and UTR’s SNVs Schematic diagram of SNVs filtering process and gene annotation SNV profile SNV spectrum SNVs location Transcription factor network in 3 pathways The expression alteration of MUC17 Patients with varied MUC17 were represented good prognostic comparing with ones of wild-type MUC17 18 10 groups (blood/normal tissue +tumor tissue) 30X per sample Background: u Need to know the whole aspect of genomics,including intro、 promotor region to find mutations Research: Large-scale analyses of genes in tumors have shown that the mutation load in cancer is abundant, hetero-geneous, and widespread Cancer solution 2: Whole Genome Sequencing Alignment Demographic analysis SNV annotation InDel annotation Short InDel calling SNV calling Selection Others HiSeq 2000 sequencing Library construction DNA sample prepration Basic bioinformatics analysis Advanced bioinformatics analysis Personalized bioinformatics analysis Workflow SV calling SV annotation CNV calling CNV annotation Others Mutations Summary 21 Cancer solution 3: cell line Advantage： 1.give out very clear pattern about what happened in that cell line. 2.build a systematic characterization of the genetics and genomics 3.High-accuracy SV,CNV, information /clear pattern Introduction: Human immortal cancer cell lines--an accessible, easily usable set of biological models 22 Workflow Denovo sequencing Re-sequencing 23 Cancer solution 4: Single-cell sequencing Background: Cancer are mixture of different cells, it's hard to identify the tumor and adjacent tissue, it's nessesary to research on the single cell level. Advantages:  single cell sequencing can give out the real frequency of mutations  give out the progress of mutation during cancer development by the phylogenetic tree of sequenced single cell 50X exome of 20 normal and 100 tumor single cells 50X exome of 20 normal and 100 tumer single cells 1. coverage (>=1) 90 ％;(>=10) 80% 2.SNV technical validation rate 90% 3.Indel technical validation rate 80% 1. point muttaions in each cells 2. mutation frequency spectrum of normal and cancer cells; 3. relationship of different cells Solution 25 Demo Case: Renal cancer tumor, BGI on-going collaborative project:  No significant differences in detecting SNPs and InDels between single cell sequencing and multiple cells sequencing  Genetic comparisons among cancer cells, normal cells and leukocytes of two renal cancer patients, respectively l Sample set: single cell from the first Asian genome donor (YH); and control form the same tissue. l Data set : 13X and 18X for two replications Single cell1 Single cell2 Control Raw data (Gb) 35.47 47.99 48.72 Average depth 13.32 17.82 18.03 Genome coverage (%) 95.77 94.46 99.91 Method Evaluation 27 Method Evaluation • No obvious genome wide coverage limitation by single cell sequencing l Depth bias strongly affected by GC content Method Evaluation l Depth are not affected by repeat or chromosome location 29 Method Evaluation SNP calling Population analysis Progression on inferring 1000 single cell sequencing Analysis pipeline Mutation types for CCRCC, ET and AML CCRCC and two different leukemia samples shows no significant (P>0.05) higher proportion of C:G->T:A than T:A- >C:G mutation types 1000 single cell sequencing Mutation types in BTCC, Gastric, Colorectal cancers 32 1000 single cell sequencing BTCC has significant (P<0.01) higher proportion of C:G->T:A than T:A->C:G mutation types, while others are not. 1000 single cell sequencing Differentiate the cancer and normal cells by PCA + : cancer *: normalGastric l Most cancer types distinguished apparently, but ET, AML and BTCC can not, reflect the heterogeneous nature of these cancers. 34 1000 single cell sequencing + : cancer *: normalET AML l Phylogenetic trees clearly show subpopulations in AML cancers 1000 single cell sequencing AML Consensus Tree Inferring key genes in AML (a typical heterozygous cancer) 1000 single cell sequencing Key Gene? Key Gene for sub-pop? G1~G6: different subpopulations from AML Key genes means cancer specific or subpopulation specific high prevalence mutated genes during tumor progression Branches which are less than 50 were set to one group MP: Metabolic pathway SP: Signaling pathway CAM: Cell adhesion molecules Richment Cutoff: P-values<0.01 5 cancers 5 cancers related 3 cancers related 2 cancers realted 1 cancer related 4 cancers related Mapping comprehensive progression pathways in five tumors 1000 single cell sequencing 38 BGI, Your Premier Scientific Partner Welcome to join us !

本文档为【高通量癌症研究_0620】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。

高通量癌症研究_0620

热门搜索

历史搜索