Create top-level `types.go` to define types shared across packages
carreter opened this issue · 6 comments
Is your feature request related to a problem? Please describe.
As poly
grows, more and more types will be shared between nominally unrelated packages.
For example, many packages will deal with sequences. Currently, we use the string
type to represent sequences in most of the packages. However, sequences often also need to be annotated with whether they are circular or linear, as is already done in the clone
package and as we are considering doing to resolve #225 :
Describe the solution you'd like
A new file (types.go
) in the root of the repository that defines types commonly used across packages. Whenever it feels like you're reusing a similar struct or interface across packages, it should be placed in said file.
Describe alternatives you've considered
- Place the shared types in a package (e.g.
types
orcommon
)- Pros: Will keep the top level of the package cleaner
- Cons: feels unnecessary, adds another level of indirection to package imports
- Divide up
poly
into more specific packages that share common types- Anything that deals with a sequence will be put in the package
sequence
. Anything that deals with structural info will be put in the packagestructure
. These packages will have a top-leveltypes.go
file that contains shared types. - Pros: Might be necessary in the future as we add more functionality anyway. Even now, the top level of the package is getting a little crowded.
- Cons: Adds another level of indirection to package imports, currently feels unnecessary as we only have sequence-based tools. Might cause import cycles for some types.
- Anything that deals with a sequence will be put in the package
- Do nothing.
- If it ain't broke, don't fix it. And it definitely ain't broke here in that it doesn't block development/use of
poly
, but I still think this needs addressing.
- If it ain't broke, don't fix it. And it definitely ain't broke here in that it doesn't block development/use of
This is gonna be entirely unhelpful, but lemme just put some comments below -
- When I physically get oligos from IDT, I often add in modifications to the sequence. For example
GTAAAACGACGGCCAGTACTTGGTTCAGGTGGAGTAGGGATAA/iMe-dC/AGGGTAATGGTGTCAATCGTCGGAGCATGGTCATAGCTGTTTCCTG
, and this is a physically different molecule thanGTAAAACGACGGCCAGTACTTGGTTCAGGTGGAGTAGGGATAACAGGGTAATGGTGTCAATCGTCGGAGCATGGTCATAGCTGTTTCCTG
, as represented by the lil methylation - although the sequence representation WILL be the same, and any subsequent PCRs or anything are gonna remove that methylation. This actually have a meaningful impact on my cloning reactions that I just have to kinda work around right now. - The representation of overhanged fragments isn't great. For example, lets say I have two oligos
AAAAGGGG
andCCCCAAAA
. Well, these anneal to each other at the GC sites, giving us a new fragment with 2AAAA
overhangs, which then we could use for cloning or the like (this is another real thing I do). However, we have no way of representing these kinds of fragments: are the forward and reverse on 5' or 3' end? Are they phosphorylated (this MATTERS and often it is assumed if they come from fragments, but cannot be assumed if they come from synthesized products). - Seqhash doesn't work with either of the above
- AFAIK, there literally aren't any standard open-source formats for things like internal DNA modifications or representation of annealed oligos. So I don't have a way of storing this kind of information. Right now I do some REAL hacky duct-tape shit to make this work with my design software.
Now, opinions on types in general:
- Parser types should live with their parsers, and not be shared. Import the package if you needa use something with a parser type.
- I cannot think of a type other than "Part" that fits the general category.
- IMO part information, when it comes to PCR and cloning, have to be WAY more detailed than anything else. Perhaps this struct should specifically be for synthetic biology-related functions. Maybe we have a
synbio
package that thepcr
,synthesis
andclone
subpackages get thrown into
Just addding my 2 cents on this, I think it'd be nice to have a package with a good name that has a clear scope (so not named types.go
but the idea of pooling very common data structures e.g DNA is good)
The reason why is articulated well here
A common cause of poor package names is what call utility packages. These are packages where common helpers and utility code congeals over time. As these packages contain an assortment of unrelated functions, their utility is hard to describe in terms of what the package provides. This often leads to the package’s name being derived from what the package contains--utilities.
Package names like utils or helpers are commonly found in larger projects which have developed deep package hierarchies and want to share helper functions without encountering import loops. By extracting utility functions to new package the import loop is broken, but because the package stems from a design problem in the project, its name doesn’t reflect its purpose, only its function of breaking the import cycle.
It would be good to at least Alias the different sequences to make function signatures clearer and make it impossible to use the wrong sequence type, aka instead of
Translate(string)
Translate(dna.Sequence)
You can put the parsers in those packages to get nice semantics too (e.g dna.FromGenbank()
dna.FromFasta()
ect...)
A type alias allows you to keep both GTAAAACGACGGCCAGTACTTGGTTCAGGTGGAGTAGGGATAACAGGGTAATGGTGTCAATCGTCGGAGCATGGTCATAGCTGTTTCCTG
and GTAAAACGACGGCCAGTACTTGGTTCAGGTGGAGTAGGGATAA/iMe-dC/AGGGTAATGGTGTCAATCGTCGGAGCATGGTCATAGCTGTTTCCTG
in your code, just making it clearer that it's a DNA seq and typesafe.
Ultimately though I suspect it would be nicer (but a lot harder) to have a DNA
struct that all poly package use, and that way the parsers can all do their best to fill that Poly struct. e.g for the methylation example we'd add a methylationSites map[index]struct{}{}
to the DNA struct.
I also think I agree that methylationSites should be separate - perhaps like map[int]MethylationSite
where MethylationSite is just a byte that representing that specific internal mutation.
For the sequence itself - I'm not sure if a dna
package is really needed. For the majority of stuff, only a sequence is necessary. For a smaller amount, sequence + circular. For an even smaller amount, methylation and overhangs. Right now, we would be putting Part
into types
, even though the end game the two uses WOULD probably be different - clone
requires the complicated methylation data, pcr
does not.
I agree with @cachemoi regarding package names like type
or util
. The subpackages poly has been split into have been very carefully made to avoid kitchen sink-like packages that make it hard for devs to find what they need.
We do have a sub package called alphabet
that @ragnorc made which defines dna and protein sequences but I believe it's being exclusively for the align
package as the scoring matrices we borrowed from biogo needed them to function properly.
As far as sharing types between packages are there any pressing corner cases where we need to do this? It's an interesting thought but right now I don't think there's anything that's particularly broken.
Also agreed re: avoiding type
/util
package names. To be clear, what I was proposing was an additional file types.go
in the top-level poly
package.
For the sequence itself - I'm not sure if a dna package is really needed. For the majority of stuff, only a sequence is necessary. For a smaller amount, sequence + circular. For an even smaller amount, methylation and overhangs. Right now, we would be putting Part into types, even though the end game the two uses WOULD probably be different - clone requires the complicated methylation data, pcr does not.
This could be solved using Go's interfacing system I think. Something like this would go in the root of the package in types.go
:
type DNA struct {} // Carries sequence, circularity, methylation, and overhang info
type Sequence interface {
Sequence() string
}
type Part interface {
IsCircular() bool
}
type Methylated interface {
MethylationSites() map[int]struct{}{}
}
type Overhanged interface {
LeftOverhang() string
RightOverhang() string
}
Then, client packages can compose the interfaces to specify what functionality it is they need from that DNA
struct.
This issue has had no activity in the past 2 months. Marking as stale
.