More probes in bioc annotations than in the GSE
Opened this issue · 1 comments
Hi,
I gave a question which is in some way "opposite" to what I asked in issue #3.
When I check which probes are mapped to the gene "404636" in GPL5188 using GPL5188ENTREZID I get the following list:
mapp = unlist(as.list(GPL5188ENTREZID[mappedkeys(GPL5188ENTREZID)]))
mapp[mapp=="404636"]
3266973 3266988 3266989 3266997 3267000 3267001 3267002 3267003 3267004 3267005
"404636" "404636" "404636" "404636" "404636" "404636" "404636" "404636" "404636" "404636"
But it appears that only one probe was used in this platform in my GSE:
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?view=data&acc=GSM1163714&id=15498&db=GeoDb_blob99
How is this possible that so many probes are available but are finally not on the chip?
Thanks,
Rachelly.
This is an exon array, so, by design, there are many probes mapping to a
given gene.
The data processing description of that GSM states:
"The analysis was limited to the probe sets designated by Affymetrix as
'core' probe sets, i.e. those with the best mapping to known genes."
The matrix files that you typically download via getGEO gives you the
normalised/pre-processed data, i.e. the one filtered as described above.
The raw data (CEL files) will contain values for all probes on the array.
The annotation package generally (but not always) also contains annotation
for all the probes on the array, that is, in you case, more than those
included in the pre-processed data.