This is a repository for scripts, codes and data referred to in the manuscript Filtering nucleotide sites by phylogenetic signal to noise ratio increases confidence in the Neoaves phylogeny generated from ultraconserved elements(In Review).
If you find this repository helpful please cite it using the following DOI: https://zenodo.org/badge/latestdoi/124963673
Thanks & Cheers,
PS Gilbert
-
Run PI on the nexus alignment files of your choice.
To do this , navigate to the directory where each nexus.rates file live and see below:
-
Choose UCEs of Interest
#How I chose my UCEs of Interest
Exported list of UCEs of interest from R.
Open UCE interest text file and Find & replace using textwrangler so that each line in the text file matches the following:
find .nex
& Replace with .nex.rates
.
Remove carriage returns and quotations so that every UCE is on the first line separated by a space.
Add cp to the beginning of line 1.
Add destination folder to the end of line 1 (i.e. ~/DestinationFolder)
Save as YourChosenFilename.txt.
#You might want to run each folder as one giant locus. If this is the case, you'll need to isolate the .nexus files that match the UCEs you're interested in, put them into a single directory and concatenate them using phyluce.
FROM THE BIN DIRECTORY OF THE ANACONDA FOLDER:
phyluce_align_format_nexus_files_for_raxml \
--alignments ~/LocationOfAlignments \
--output ~/LocationOfOutput \
--charsets \
--log-path ~/LocationOfLogFile
#Alter each file name so that it can be used in the Mathematica signal/noise notebook. #Refer to BigBirdFileFormating_nexus.rates_files.txt
#Now parse each file into 3 files: switches, rates & percentages.->>> #This can also be done on the UCLA Analysis computer using the online version of R studio ->>>>> (be sure that you are connected to the internet by eduroam or a vpn). #You'll just need to transfer all the files over using FileZilla. And set the working directory to that folder in R.
#Now transfer the dataset to the alfaro analysis computer. SKIP IF YOUR USED ONLINE R STUDIO.
#Create a text file with the name of each marker in the directory.
ls > FileNames.txt
#Use this command in terminal to ls
#Clean up the FileNames.txt file so that it only has the UCE names followed by a comma.
#Cut & Paste the list of UCE names into the signal/noise mathematica notebook.
#Correct the Tnaught and T value inputs.
#57-55MYA=2MYA so Tnaught is 2, and T is 55.
#Change file paths so that they match the input dataset locations.
#Change file path so that they match the output directory location.
#Enter mathematica
#on linux-> Enter ./math
#Concatenate all the Mathematica Output files into one text file. #Use cat *.csv >> FileName.txt ##If file list is exceedingly long use ls | xargs -n 32 -P 8 cat >> ~/SNImportFiles/filename.txt
## If the file list is exceedingly long use
vim filename.txt # Enter vim or vi
:%s/,\n/\r/g #Remove the comma at the end of each line and replace with a character return
:wq #Save file
Run SNImport.R, be sure that the output is sorted by UCE because Jing's script won't work if the UCEs are NOT read in consecutively.
python test.py Corrected_CuratedUCE_List
OR
: perl -e 'for(<*>){((stat)[9]<(unlink))}'
#Concatenate All Nexus files into one alignment. This script will also export the concatenated dataset in phylip format for RaxML.
#WARNING: DO NOT CREATE THE OUTPUT DIRECTORY BEFORE RUNNING THE SCRIPT!!!
#RUN THE FOLLOWING THE anaconda/bin DIRECTORY ON ANALYSIS:
phyluce_align_format_nexus_files_for_raxml
--alignment ~/CuratedNexusFiles
--output ~/CuratedTest/
--log-path ~/Logs/ \
######Example Problem Run:
phyluce_align_format_nexus_files_for_raxml
--alignment ~/PSG/FolderLocationOfNexusFiles
--verbosity INFO
--output ~/ProblemName_Date.RAxML_Input/
--log-path ~/PSG/ProbemName_RAxML_Logs/ \
CuratedNexusFiles.phylip -> CuratedPhylipFile.phylip
#RUN FROM THE BIN DIRECTORY OF THE ANACONDA FOLDER ON YOUR COMPUTER:
phyluce_align_convert_one_align_to_another
--alignment ~/CuratedNexusFiles
--output ~/CuratedPhylipFiles
--input-format nexus
--output-format phylip
--cores 12
--log-path ~/Logs/
--shorten-names
#First Generate your two random numbers: for i in 1 2; do echo $RANDOM; done
#First find the "best" ML tree. #!!!RUN FROM THE DIRECTORY YOU WANT YOUR RAxML OUTPUT FILES TO LIVE there will be at least as many as you specify with -N!!!
#9.28.16
#Now calculate the bootstrap support values for each branch of your phylogeny.
#From the Phyluce Tutorial, you can also "reconcile the bootstraps with the best tree" by entering
-m GTRGAMMA \
-f b \
-t RAxML_bestTreeFileName \
-z RAxML_BestTreeFileName_bootstrap.bootstraps \
-n Run.CurrentDate
#Now you can open the tree for viewing in FigTree. Use the file without "BranchLabels" in the filename.
#You'll want to edit the .tree file so that the tip labels are common names instead of scientific names if you're making the trees we did Gilbert et al. 2018.