Earth Microbiome Vuegen Demo Notebook

The Earth Microbiome Project (EMP) is a systematic attempt to characterize global microbial taxonomic and functional diversity for the benefit of the planet and humankind. It aimed to sample the Earth’s microbial communities at an unprecedented scale in order to advance our understanding of the organizing biogeographic principles that govern microbial community structure. The EMP dataset is generated from samples that individual researchers have compiled and contributed to the EMP. The result is both a reference database giving global context to DNA sequence data and a framework for incorporating data from future studies, fostering increasingly complete characterization of Earth’s microbial diversity.

You can find more information about the Earth Microbiome Project at https://earthmicrobiome.org/ and in the original article.

Exploratory Data Analysis

This section contains the exploratory data analysis of the Earth Microbiome Project (EMP) dataset.

Sample Exploration

Metadata Random Subset

#SampleID BarcodeSequence LinkerPrimerSequence Description host_subject_id study_id title principal_investigator doi ebi_accession target_gene target_subfragment pcr_primers illumina_technology extraction_center run_center run_date read_length_bp sequences_split_libraries observations_closed_ref_greengenes observations_closed_ref_silva observations_open_ref_greengenes observations_deblur_90bp observations_deblur_100bp observations_deblur_150bp emp_release1 qc_filtered subset_10k subset_5k subset_2k sample_taxid sample_scientific_name host_taxid host_common_name_provided host_common_name host_scientific_name host_superkingdom host_kingdom host_phylum host_class host_order host_family host_genus host_species collection_timestamp country latitude_deg longitude_deg depth_m altitude_m elevation_m env_biome env_feature env_material envo_biome_0 envo_biome_1 envo_biome_2 envo_biome_3 envo_biome_4 envo_biome_5 empo_0 empo_1 empo_2 empo_3 adiv_observed_otus adiv_chao1 adiv_shannon adiv_faith_pd temperature_deg_c ph salinity_psu oxygen_mg_per_l phosphate_umol_per_l ammonium_umol_per_l nitrate_umol_per_l sulfate_umol_per_l
Loading ITables v2.2.5 from the init_notebook_mode cell... (need help?)

Animal Samples Map

Plant Samples Map

Saline Samples Map

Physicochemical properties of the EMP samples

Pairwise scatter plots of available physicochemical metadat are shown for temperature, salinity, oxygen, and pH, and for phosphate, nitrate, and ammonium

Metagenomics

Alpha Diversity

This subsection contains the alpha diversity analysis of the EMP dataset.

Alpha Diversity Host Associated Samples

Alpha Diversity Free Living Samples

Average Copy Number

Average Copy Number Emp Ontology Level2

Average Copy Number Emp Ontology Level3

Nestedness

Nestedness Random Subset

Unnamed: 0 SAMPLE_RANK OBSERVATION_RANK SAMPLE_ID OBSERVATION_ID empo_3 METADATA_NUMERIC_CODE
Loading ITables v2.2.5 from the init_notebook_mode cell... (need help?)

All Samples

Plant Samples

Animal Samples

Non Saline Samples

Shanon entropy analysis

This subsection contains the Shannon entropy analysis of the EMP dataset.

Specificity of sequences and higher taxonomic groups for environment

  1. Environment distribution in all genera and 400 randomly chosen tag sequence. b) and c) Shannon entropy within each taxonomic group.

Network Analysis

Phyla Association Networks

Phyla Counts Subset

Unnamed: 0 722.NP2.2.s.2.1.sequence 846.Fagna24102011Soil12C2 722.M11Tong.7.s.8.1.sequence 1883.2009.147.Crump.Artic.LTREB.main.lane3.NoIndex 1580.A23.1.sed.D1 1883.2008.146.Crump.Artic.LTREB.main.lane2.NoIndex 1747.DZF.6252012.A.metal.wall 1039.L.Jacarepia.HB 1580.2CB.sed.D1 722.M11Plmr.3.s.3.1.sequence 1883.2008.086.Crump.Artic.LTREB.main.lane2.NoIndex 894.OS604.lane3.NoIndex.L003 1717.32.high.fertilizer 2182.CPZFOB 1773.Salt.max3.ugi 659.NZFACE.R5.Browntop 2192.H04a.Nose.1857.lane6.NoIndex.L006 722.NP5.6.s.7.1.sequence 1883.2008.127.Crump.Artic.LTREB.main.lane2.NoIndex 2382.HE003.C181.HA.5.774.leav.9.12.lane8.NoIndex.L008.sequences 1453.54374SDZ1.F2.Tcris.feces 1642.MS00554 2382.GM.181.R4.gp.10.12.lane8.NoIndex.L008.sequences 895.Puhimau.soil.1 1064.G.CV05 1580.WPC.filt. 1773.Columb.talpa5.crop 1043.Hopland.20C.Wood.TP1.02 864.5.CON.addV.noG.noW.B.lane2.NoIndex 2192.H01a.Bathroom.Door.Knob.102.lane1.NoIndex.L001 1041.S008.2m.off.bottom 1064.G.CJV172 2382.SH007.C6.RH.4.716.bulk.9.12.lane7.NoIndex.L007.sequences 807.B.S.11.a 933.N.2.3.S.E.4 1481.PO5.5.T0 963.Iguana.221.061011.BOH.vial.914 804.H03.072705.R0229 1774.257.Skin.Puer 1747.v.indicus.58.oral 945.P3.A9.lane2.NoIndex.L002 1748.5.15.12.FI.25.V 1036.P.Ac.14.1.s.4.1.sequences 1453.45300SDZ4.D5.Pnem.stom 2229.S2.T3.6.HP1.Thomas.CMB.Seaweed.lane6.NoIndex.L006 2382.SH005.C3.RH.1.655.gp.9.12.lane8.NoIndex.L008.sequences 1453.54323SDZ1.B8.Cguer.mesy 1453.54379SDZ1.G5.Tcris.feces 1747.DZF.6132012.TJ.side.basking.rock 1627.KZC2
Loading ITables v2.2.5 from the init_notebook_mode cell... (need help?)

Phyla Correlation Network With 0.5 Threshold Edgelist

Number of nodes: 33

Number of edges: 42

Phyla Correlation Network With 0.5 Threshold