GregorySchwartz/too-many-cells

Not able to find the output files

zhoujoeyyi opened this issue · 4 comments

Hi Dr. Schwartz,

Great talk last week in the Stem Cell Club! TooManyCells seem to be a powerful and fast algorithm for cell clustering. I have relatively limited knowledge on informatics and it's my first time using a language other than R. So please excuse my question to be naive.

We have a Windows 7 in our lab and I had to install the Docker Toolbox and run everything in a virtual box using the command line "docker-machine ssh default".
Then I followed the instructions on your website, trying to replicate the figures you have there. Please see below.

#download brain data
mkdir -p data/brain #Make directory
cd ./data/brain #Enter the directory
wget http://cf.10xgenomics.com/samples/cell-exp/3.0.0/neuron_1k_v3/neuron_1k_v3_filtered_feature_bc_matrix.tar.gz #Download the data
tar xvf neuron_1k_v3_filtered_feature_bc_matrix.tar.gz #Uncompress data
cd .. #go to upper folder directory

#download heart data
mkdir -p heart #Make the data directory
cd ./heart #Enter the directory
wget http://cf.10xgenomics.com/samples/cell-exp/3.0.0/heart_1k_v3/heart_1k_v3_filtered_feature_bc_matrix.tar.gz #Download the data
tar xvf heart_1k_v3_filtered_feature_bc_matrix.tar.gz #Uncompress data
cd ..
cd ..

#Prevent overlapping
#Backup barcodes
cp ./data/brain/filtered_feature_bc_matrix/barcodes.tsv{.gz,.gz.bk}
cp ./data/heart/filtered_feature_bc_matrix/barcodes.tsv{.gz,.gz.bk}
#Edit barcodes
cat ./data/heart/filtered_feature_bc_matrix/barcodes.tsv.gz.bk | gzip -d | sed "s/-1/-2/g" | gzip > ./data/heart/filtered_feature_bc_matrix/barcodes.tsv.gz #Now let's edit the heart barcodes to have -2 instead of -1.
cat ./data/heart/filtered_feature_bc_matrix/barcodes.tsv.gz | gzip -d | head

So up until these steps, everything is fine. Then I ran:
sudo docker run -it --rm -v "/home/docker:/home/docker"
gregoryschwartz/too-many-cells:0.2.2.0 make-tree
--matrix-path ./data/brain/filtered_feature_bc_matrix/
--matrix-path ./data/heart/filtered_feature_bc_matrix/
--output out > clusters1.csv
printf "./out/dendrogram.svg"

Here's where I have questions. Still no error messages, but I can't find or visualize "clusters.csv" or "dendrogram.svg" on my Windows computer.
-For "clusters.csv", I can see it's in "/home/docker/" folder, but I couldn't export the file to my local Windows computer.
-And for "dendrogram.svg", I was not able to find the file anywhere at all.

Therefore I have two questions,
(1) Is there a way I can export those files back to the Windows computer?
(2) For the annotated "clusters.csv" and "labels.csv", do you have a way or code to export those annotations back to the Seurat file so that I can compare the two clustering method? (Just like what you did in Figure 6i of your Nature Methods paper.)

Great appreciation to your time and looking forward to your feedback and insights.

Regards,
Joey

Hi Joey, thanks for your interest! You can change the svg file to many different file formats such as pdf, just specify the dendrogram-file with a supported file ending. The dendrogram file will be in the output folder. In terms of where those are, the original workshop was written without docker in mind.

  1. For docker, you need to specifically assign the output directory. Right now it's in --output out, which points to the root of docker which is not mounted. So to solve this, point it to something below your mount, such as --output /home/docker/out or the like.

  2. too-many-cells supports a projection argument which will take in a projection like tSNE or UMAP (like exported from Seurat into a csv) and plot the labels and too-many-cells leaves on top of that projection. You can also export Seurat's clustering for each barcode in a similar way and provide those as a label file to see how Seurat clusters relate to each other. The other option is to go the other way and import the csvs from too-many-cells into R to use them there.

Thanks for the quick response!

Following your response # 1, I was able to get the svg file using "--output /home/docker/out > dendrogram.svg". Thanks!

However, all the output files, including "clusters.csv", "dendrogram.svg", and the downstream ones, seem to be in some kind of virtualbox or sandbox. I can only see those tiles exist by the "ls" command, but I do not know how to open them on the Windows 7 computer. I was hoping I can somehow export the csv and svg files to my local "D:\Users\joey" folder so that I can visualize and further process them. Would you comment on that please?

Regards,
Joey

I'm not familiar with Windows docker, but I assume that you could specifically mount the output folder of the docker container to whatever directory you want on your host machine.

Thanks for the feedback! It's now solved. Indeed I need to mount the folders in Windows.
I used the following instructions. http://support.divio.com/en/articles/646695-how-to-use-a-directory-outside-c-users-with-docker-toolbox-docker-for-windows

#Mount an arbitrary host directory in a Docker container
#Stop Docker Machine if its running, with:
docker-machine stop
#In VirtualBox, add a Shared Folder: Settings > Shared Folders > Add share - this will be the directory where you want to locate your project, such as D:\Projects\Divio. Give it an appropriate Folder Name, such as Divio.
#Restart Docker Machine, with:
docker-machine start
#SSH into the Docker Machine, with:
docker-machine ssh default
#Create a directory in the machine as a mount point for the project directories, for example:
mkdir projects #. This will be /home/docker/projects - you can verify it by running pwd.
#Mount the Shared Folder you named above (Divio) at the mount point you have created:
sudo mount -t vboxsf -o uid=1000,gid=50 Divio /home/docker/projects
#Your Docker Machine will now be able to access the files in D:\Projects\Divio (shared in VirtualBox under the name Divio) as /home/docker/projects.

Thanks!