Earth Microbiome Vuegen Demo Notebook
The Earth Microbiome Project (EMP) is a systematic attempt to characterize global microbial taxonomic and functional diversity for the benefit of the planet and humankind. It aimed to sample the Earth’s microbial communities at an unprecedented scale in order to advance our understanding of the organizing biogeographic principles that govern microbial community structure. The EMP dataset is generated from samples that individual researchers have compiled and contributed to the EMP. The result is both a reference database giving global context to DNA sequence data and a framework for incorporating data from future studies, fostering increasingly complete characterization of Earth’s microbial diversity.
You can find more information about the Earth Microbiome Project at https://earthmicrobiome.org/ and in the original article.
Exploratory Data Analysis
This section contains the exploratory data analysis of the Earth Microbiome Project (EMP) dataset.
Sample Exploration
Metadata Random Subset
#SampleID | BarcodeSequence | LinkerPrimerSequence | Description | host_subject_id | study_id | title | principal_investigator | doi | ebi_accession | target_gene | target_subfragment | pcr_primers | illumina_technology | extraction_center | run_center | run_date | read_length_bp | sequences_split_libraries | observations_closed_ref_greengenes | observations_closed_ref_silva | observations_open_ref_greengenes | observations_deblur_90bp | observations_deblur_100bp | observations_deblur_150bp | emp_release1 | qc_filtered | subset_10k | subset_5k | subset_2k | sample_taxid | sample_scientific_name | host_taxid | host_common_name_provided | host_common_name | host_scientific_name | host_superkingdom | host_kingdom | host_phylum | host_class | host_order | host_family | host_genus | host_species | collection_timestamp | country | latitude_deg | longitude_deg | depth_m | altitude_m | elevation_m | env_biome | env_feature | env_material | envo_biome_0 | envo_biome_1 | envo_biome_2 | envo_biome_3 | envo_biome_4 | envo_biome_5 | empo_0 | empo_1 | empo_2 | empo_3 | adiv_observed_otus | adiv_chao1 | adiv_shannon | adiv_faith_pd | temperature_deg_c | ph | salinity_psu | oxygen_mg_per_l | phosphate_umol_per_l | ammonium_umol_per_l | nitrate_umol_per_l | sulfate_umol_per_l |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Loading ITables v2.2.5 from the init_notebook_mode cell...
(need help?) |
Animal Samples Map
Plant Samples Map
Saline Samples Map
Physicochemical properties of the EMP samples
Pairwise scatter plots of available physicochemical metadat are shown for temperature, salinity, oxygen, and pH, and for phosphate, nitrate, and ammonium
Metagenomics
Alpha Diversity
This subsection contains the alpha diversity analysis of the EMP dataset.
Alpha Diversity Host Associated Samples
Alpha Diversity Free Living Samples
Average Copy Number
Average Copy Number Emp Ontology Level2
Average Copy Number Emp Ontology Level3
Nestedness
Nestedness Random Subset
Unnamed: 0 | SAMPLE_RANK | OBSERVATION_RANK | SAMPLE_ID | OBSERVATION_ID | empo_3 | METADATA_NUMERIC_CODE |
---|---|---|---|---|---|---|
Loading ITables v2.2.5 from the init_notebook_mode cell...
(need help?) |
All Samples
Plant Samples
Animal Samples
Non Saline Samples
Shanon entropy analysis
This subsection contains the Shannon entropy analysis of the EMP dataset.
Specificity of sequences and higher taxonomic groups for environment
- Environment distribution in all genera and 400 randomly chosen tag sequence. b) and c) Shannon entropy within each taxonomic group.
Network Analysis
Phyla Association Networks
Phyla Counts Subset
Unnamed: 0 | 722.NP2.2.s.2.1.sequence | 846.Fagna24102011Soil12C2 | 722.M11Tong.7.s.8.1.sequence | 1883.2009.147.Crump.Artic.LTREB.main.lane3.NoIndex | 1580.A23.1.sed.D1 | 1883.2008.146.Crump.Artic.LTREB.main.lane2.NoIndex | 1747.DZF.6252012.A.metal.wall | 1039.L.Jacarepia.HB | 1580.2CB.sed.D1 | 722.M11Plmr.3.s.3.1.sequence | 1883.2008.086.Crump.Artic.LTREB.main.lane2.NoIndex | 894.OS604.lane3.NoIndex.L003 | 1717.32.high.fertilizer | 2182.CPZFOB | 1773.Salt.max3.ugi | 659.NZFACE.R5.Browntop | 2192.H04a.Nose.1857.lane6.NoIndex.L006 | 722.NP5.6.s.7.1.sequence | 1883.2008.127.Crump.Artic.LTREB.main.lane2.NoIndex | 2382.HE003.C181.HA.5.774.leav.9.12.lane8.NoIndex.L008.sequences | 1453.54374SDZ1.F2.Tcris.feces | 1642.MS00554 | 2382.GM.181.R4.gp.10.12.lane8.NoIndex.L008.sequences | 895.Puhimau.soil.1 | 1064.G.CV05 | 1580.WPC.filt. | 1773.Columb.talpa5.crop | 1043.Hopland.20C.Wood.TP1.02 | 864.5.CON.addV.noG.noW.B.lane2.NoIndex | 2192.H01a.Bathroom.Door.Knob.102.lane1.NoIndex.L001 | 1041.S008.2m.off.bottom | 1064.G.CJV172 | 2382.SH007.C6.RH.4.716.bulk.9.12.lane7.NoIndex.L007.sequences | 807.B.S.11.a | 933.N.2.3.S.E.4 | 1481.PO5.5.T0 | 963.Iguana.221.061011.BOH.vial.914 | 804.H03.072705.R0229 | 1774.257.Skin.Puer | 1747.v.indicus.58.oral | 945.P3.A9.lane2.NoIndex.L002 | 1748.5.15.12.FI.25.V | 1036.P.Ac.14.1.s.4.1.sequences | 1453.45300SDZ4.D5.Pnem.stom | 2229.S2.T3.6.HP1.Thomas.CMB.Seaweed.lane6.NoIndex.L006 | 2382.SH005.C3.RH.1.655.gp.9.12.lane8.NoIndex.L008.sequences | 1453.54323SDZ1.B8.Cguer.mesy | 1453.54379SDZ1.G5.Tcris.feces | 1747.DZF.6132012.TJ.side.basking.rock | 1627.KZC2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Loading ITables v2.2.5 from the init_notebook_mode cell...
(need help?) |
Phyla Correlation Network With 0.5 Threshold Edgelist
Number of nodes: 33
Number of edges: 42