This is a python implementation for trace reconstruction with marker code [1] in the context of DNA-based data storage. You can run it directly with either the CNR dataset [2] or a simple simulated IDS channel. A derivation for relevant formulas is also provided.
pip3 install -r requirements.txt
# Run simulation
python3 main-sim.py
# Or test on the CNR dataset
python3 main-cnr.py
It will take some time if you choose to test on the CNR dataset using the provided configuration. To have a quick start, you may be more willing to run the simulation. Also, please refer to the configuration files to customize your experiments. The results and statistics will be available in ./data
after the program terminates.
- python3
- numpy
- matplotlib
- tqdm
- yaml
For available configurations for experiments, please check config-cnr.yaml and config-sim.yaml.
A derivation for relevent formulas is available, with LaTeX source.
- Learn the properties of IDS channels with EM algorithm.
I am grateful to Prof. Djordje Jevdjic for his module CS6219 (22/23-SEM1). It is both inspiring and interesting. I must also thank Prof. Gim Hee Lee for the great module CS5340 (22/23-SEM1), which is really rewarding.
[1] E. A. Ratzer, “Marker codes for channels with insertions and deletions,” Ann. Télécommun., vol. 60, no. 1–2, pp. 29–44, Feb. 2005, doi: 10.1007/BF03219806.
[2] S. R. Srinivasavaradhan, M. Du, S. Diggavi, and C. Fragouli, “Symbolwise MAP for Multiple Deletion Channels,” in 2019 IEEE International Symposium on Information Theory (ISIT), Paris, France, Jul. 2019, pp. 181–185. doi: 10.1109/ISIT.2019.8849567.
[3] C. M. Bishop and N. M. Nasrabadi, Pattern recognition and machine learning, vol. 4. Springer, 2006.
[4] R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge university press, 1998.
[5] D. Lin, Y. Tabatabaee, Y. Pote, and D. Jevdjic, “Managing reliability skew in DNA storage,” in Proceedings of the 49th Annual International Symposium on Computer Architecture, New York New York, Jun. 2022, pp. 482–494. doi: 10.1145/3470496.3527441.
[6] M. C. Davey and D. J. C. Mackay, “Reliable communication over channels with insertions, deletions, and substitutions,” IEEE Trans. Inform. Theory, vol. 47, no. 2, pp. 687–698, Feb. 2001, doi: 10.1109/18.910582.
[7] S. R. Srinivasavaradhan, S. Gopi, H. D. Pfister, and S. Yekhanin, “Trellis BMA: Coded trace reconstruction on IDS channels for DNA storage,” in 2021 IEEE International Symposium on Information Theory (ISIT), 2021, pp. 2453–2458.