/csv-tools

Primary LanguagePython

This is CSV data manipulation scripts

First you need to build index DB with position, size and filename of each molecule.

`indexer_csv.py --readfromdir mydata/ --db mydata_index.sqlite`

To extract molecules to csv file after building index:

`fetcher_csv.py  --readfromdir mydata/ --db mydata_index.sqlite --input-file in.txt --output-file results.csv`

Also you can put data to Wasabi.com  bucket.

First install some cli tools and python modules:

`pip3 install awscli boto3`

Next put key and token to `~/.aws/crednetials`
```
[default]
aws_access_key_id = YOUR_KEY_ID
aws_secret_access_key = YOUR_KEY
```

And upload data:
```
aws s3 sync mydata/ s3://mybucket/
```
Next you can fetch molecules from Wasabi with `fetcher_csv_s3.py`:
```
fetcher_csv_s3.py -f 100k.txt --readfrombucket mybucket -o out.csv --db mydata_index.sqlite
```

Index data can be extracted to csv file for instance:
sqlite3 -header -csv index.sqlite 'select sid from data' > output.csv

This csv can be imported to mysql  (use substance1.sql as example. `mysql yourdbname < substance1.sql`)