01-Sep-2023 - Modified python code in Python/bin # The python script have now the .py suffix in the name # The initial 'python' statement has been removed at the beginning of each script # More tests have been added in the new_tests directory (the tests can be run using 'pytest') # The scripts in Python/bin can now be called as scripts or as routine inside your python code 25-Apr-2022 - Incremented ADES version from v2017 to v2022 03-Feb-2022 - Added a few new fields and other minor revisions # Added shapeOcc, obsSubID and trkMPC elements. # obsID can be up to 25 alphanumeric characters # Minor typographical and layout corrections 15-Jan-2019 - Some changes to the schema were made to reflect historical data # PermIDType for permID needs to accept `1I' and any more `I' objects # ProvIDType should restrict to P-L, T-1, T-2 and T-3 only and not allow T-L or P-3 # CatType for astCat and photCat needs to accept the '.' character (e.g., GSC1.2) # ObsIDType for obsID should allow up to 25 characters # TrkIDType for trkID should allow the hyphen `-` # TrkSubType for trkSub should allow the hyphen `-` # (Not for submissions) TrkSubType for trkSub should allow these characters: "/", "\", "(", ")", "@", "?", ".", "+" # (Not for submissions) ProvIDType for provID should allow pre-1925 values of the form "A902 AA" # TimePrecType for precTime should allow additional values (prec not in submissions) 41667 (integer hours) 4167 (tenths of an hour) 694 (integer minutes) 69 (tenths of a minute) # Expand length of remarks to 300 characters 13-Jul-2018 - Minor fixes were applied to the documentation and schema. See ADES_Description.pdf for details. CONTENTS: xml/ The adesmaster.xml file lives here. This is not the place for example xml files adesmaster.xml The adesmaster.xml file is transformed by various .xlst files into .xsd files and .tex files fo ps and pdf documentation xslt/util/ location for xslt files used by the /bin files as helpers Currently only has adestables.xslt adestables.xlst xslt/xsd/ location for xslt files used to create xsd files. I didn't include the xsd files themselves since they can be made with applyxslt.py. They'd go in a top-level xsd/ directory anyway distribhumanxsd.xslt #currently not used distribxsd.xslt #currently not used generalhumanxsd.xslt #currently not used generalxsd.xslt submithumanxsd.xslt #currently not used submitxsd.xslt xslt/latex/ Location for xslt files used to translate adesmaster.xml into latex input. docades.xslt docelementstable.xslt docgrouptypestable.xslt docsimpletypestable.xslt tests/ Location of test files. It has its own README. The runtests script must be run when in the tests/ directory -- it creates some extra dirs and knows about the sub-directories. xsd/ Contains generated xsd files and makexsdfiles makexsdfiles generates xsd files if run in this directory Currently only submit.xsd and general.xsd are needed doc/ contains pdf and ps files documenting ADES tables ades.ps # generated ades documenation file ades.pdf # generated ades documenation file docsrc contains code to build these in latex. It uses xslt to generate the tex files from adesmaster. You'll need to edit the makedoc file to point to latex your tex installation. ./makedoc will generate ades.ps and ades.pdf in this directory. Copy those to doc/ to update the documentation if adesmaster.xml or the xslt files have changed. ./cleanum removes the evidence since the latex temp files should not be in github. There are example programs demontrating how to read and write xml files using lxml in Fortran/readxmlfox.f90 Fortran/writexmlfox.f90 C/src/readxmlc.c C/src/writexmlc. Python/bin/readxmlpy Python/bin/writexmlpy These all use the xml library. Python: install lxml C: make sure liblxml2 is available Fortran: install FoX The Python and FoX libarires use liblxml2 INSTALLATION and PREREQUISITES: Untar this tarball. Python: Ensure you have a correctly installed python 2 or 3 and know its path. You can have both. You'll have to install the python lxml module for your python separately; the best way to do that is to build from source using a compatible C compiler. See google for instructions, which change regularly. Alternatively, install Python package requirements using pip: $ python -m pip install -r ./Python/requirements.txt C: Ensure you have a correctly installed C/C++ compiler and you know its path. You will need liblxml2.a and liblxml2.so, which normally come installed as prt of the compiler installation. If not, you'll need to obtain and install this library Fortran: Ensure you have a correctly installed Fortran compiler and you know its path You will need to install FoX, a Fortran XML library (or something similar). This is available (it has a FreeBSD-like license) from: https://github.com/andreww/fox You'll retrieve fox-master.zip. Unzip that into the Fortran directory BUILD C Examples: To build the C programs, go to the C/ directory, configure to build Makefile.config, and then cd into src and type 'make'. The README file in C/ has more details. If you're on a MAC OS X, you'll need to read it since the instructions are different. BUILD Fortran Examples: First, build FoX. Go to the fox-master directory and run the ./configure, which may pick up the wrong fortran. If it does, edit the "configure" file and edit the two lines containing "gfortran" so that your Fortran compiler is *first* in the list. The make sure your Fortran compiler in in you PATH and run ./configure again. The run "make" and "make check" to build FoX. Documentation for FoX is in FoX/DoX as html. After that, go to the Fortran directory and run "make" to build writexmlf90 and readxmlf90 using FoX. USAGE: The following are the main executables available from Python. All of these work in python 2 and 3 although they pick /usr/bin/env python if run as commands. These require the Python lxml library, available both for Python 2 and 3 adestest/Python/bin/ psvtoxml <psvfile> <xmlfile> # converts psv file to xml file xmltopsv <xmlfile> <psvfile> # converts xml file to psv file # the mpc80col converters are incomplete. They do not translate # header records or Satellite observations. mpc80coltoxml <mpc80colfile> <xml file> xmltompc80col <xmlfile> <mpc80colfile> valall <xml file> # validates against all possible formats # using both human-readable and non- # human-readable xslt-generated xsd files valsubmit <xml file> # validates against submit format valgeneral <xml file> # validates against general format applyxslt # <xml file> <xslt file> > <output file> # example to create the submit schema Python/bin/applyxslt xml/adesmaster.xml xslt/xsd/submitxsd.xslt > submit.xsd writexml # example script to write xml file There is code in C for the all of the above except mpc80coltoxml and xmltompc80col, in adestest/C/src. To build it, run "./configure" "cd src; make". If your are on a Mac, source the forMacOS... file first before running configure. mpc80coltoxml and xmltompc80col are not yet in C, but the above programs all work the same way. TEST CASES: The "adestest/tests" directory contains numerous correct and incorrect test cases. To run them, "cd tests" and run .runtests prog_python2 # to test python 2 .runtests prog_python3 # to test python 3, if python3 is in your path .runtests prog_c # to test in C, if you built the C Also, the tests/mpc/ directory has some mpc 80-column examples. The test cases for these are not yet finished DOCUMENTATION: adestest/doc/ contains pdf and ps files documenting ADES tables adestest/doc/src contains code to build these in latex. It uses xslt to generate the tex files. You'll need to edit the makedoc file to point to your tex installation. ----------------------------------------- These are the README file for some previous distribution tests. Some of the information may be useful but some may be obsolete. 2016 Dec GMH --- older notes This is a not-quite-ready-for-prime-time attempt at a distribution. Known Issues: 1) xmltopsv produces different header orders on different systems for the headers whose order is not specified. This round-trips OK but shows diffs in the tests. I'm not sure what the right order should be. 2) The WINDOWS-1252 codec is broken on some systems in the library 3) Different xml libraries use ' or " for attribute quoting of the <? xml version="1.0" or '1.0' line. This is fine and legal but makes testing hard. Other legal differences are possible * 4) I've decided to make the main interface the DOM and not some C struct. This is mainly because most use cases fill less than half of the struct and memory management is tricky. I've written an example program (writexml) in both C and Python for writing a new xml file using the ElementStack interface. I don't have a design for reading yet but we need to know what we want. 5) This code words on complete documents. Using SAX/iterparse for large files is possible with pretty much the same interface. 6) The timings are dominated by program launch times for the 100-item examples. I'm not sure how much performance is needed. 7) The code needs some organization. I wanted to put out something working. Specific distribution notes: 1) This uses the python lxml module, which is not part of the default python. There are numerous clever ways to try to do binary installs but the most reliable thing to do is obtain a source tarball (such as lxml-3.6.4.tar.gx) and run "python setup build" and make and so forth on your machine. Just Google "python packages lxml" and poke around untill you find the source tarball. This is important because all the web sites try to help you by guessing what your configuration is, and they guess wrong all the time. Find the source tarball and go from that. This is especially important if your want to make both a python 2 and python 3 installation. 2) The runtests script source's a script for picking up the executables it uses. This makes it easy to test your own executables Several issues remain: The tests are imcomplete. You can help by expanding them :-) The runtests point out that between python2, python3 and C there is a disagreement about the order of fields in PSV. The ones we specifiy are all fine, but the order of extra ones can be arbitrary. All the files round-trip just fine, so this may only be a problem for testing. xmlUTF8Strlen does not return the *width* of a unicode string but rather just the number of unicode characters (I *think* it handles the combining characters correctly). This means padding to achieve justification in Chinese etc. will be wrong. NOTE: although the maximum allowed field width is 200, that means 200 unicode characters. This may even be longer than 200 unicode code points because of combining characters. Python handles memory management properly; in C you're on your own. Usage: The executables in the varous bin/ directories (should) have the same interface. To run tests, go to the tests directory and run ./runtests prog_python2 ./runtests prog_python3 ./runtests prog_c Run these into a file since the output can be long. prog_python2 assumes #!/usr/bin/env python is python 2.7 prog_python3 needs to point to your python3 not mine prog_c script uses python for the encoding check. xmltopsv and psvtoxml are in C. Note that the C code my version seems to use single quotes instead of double quotes on the version line <?xml version="1.0" encoding="UTF-8"?> vs. <?xml version='1.0' encoding='UTF-8'?> This confuses diff. The attributes in the doc are coded the same way. Notice the EBCDIC and UTF-7 encodings are fine, but the quote differences make them look different. Notes: For now, all the executables start by transforming the xml/adesmaster.xml file into the internal tables using xslt/util/tableades.xslt. This is hard-coded into the executables. Eventually we may want to have the tables hard-coded into the executables instead once things stabilize. For now, all the xsd files are generated from adesmaster.xml using xslt/xsd/<name>xsd.xslt files. We could create external xsd files once we know what the final format will be. Those two above items add surprisingly little to program start overhead. Everything works by converting input files, including input files, into an internal xml etree and doing operations on that. We may want to use iterparse to handle large files but so far this is not an issue. I'm not sure what large means. It's really important for performance to not have memory leaks. Memory management is tested with the C executables through some commented-out code using the "nMemoryTest" #define in ades.h. ----------------------- This directory has several sub-directories: C/ ./configure creates Makefile.config. cd src; make clean; make # builds and puts executables in bin cd src; make realclean; # removes executables from bin README configure.ac configure install.sh # what a mess aclocal.m4 # yup, a mess forMacOSXwithout_pkg_config # did I say a mess Makefile.config.in src/ # make puts executables in bin include/ bin/ # same interface as Python. At least they're supposed to :=) Executables: psvtoxml # psvtoxml <psv file> <xml file> xmltopsv # xmltopsv <xml file> <psv files> valall # valall <xml file> valades # see tests/runtests unittest # this is woefully incomplete writexml # writexml myfile The encoding flags for PSV files do not work. They always assume the PSV encoding is UTF-8 Python bin/ python executable files and modules. The modules are not executable and are in bin because I didn't want to bother with setting pythonpath yet. All the python scripts are good with python2 and python3 <script> # runs a script with #!/usr/bin/env python <python2> <script> # runs a script with python2 <python3> <script> # runs a script with python3 Python/bin/xmltopsv <args> python xmltopsv <args> python3 xmltopsv <args> Executables: applyxlst validate encoding psvtoxml # psvtoxml <psv file> <xml file> xmltopsv # xmltopsv <xml file> <psv files> valall # valall <xml file> valades # see tests/runtests unittest # this is woefully incomplete writexml # writexml myfile writexml myfile <encoding> works in both Python and C++. The C and Python conversions don't match, at least on my machine, because one of the says <?xml version='1.0' encoding='UTF-8'?> and the other <?xml version="1.0" encoding="UTF-8"?> Both of these are legal. "writexml myfile UTF-7" is interesting. --------------------------------- Some other thoughts: A) Use iterparse to process documents as a stream Both the Python and C work on xml documents, which mean the entire input is in memory as an xml tree (even psv input is converted to an xml tree. Larger documents may require an iterparse structure. B) User interface Right now I don't have much for this. The basic idea is to use xml documents for everything an supply routines to walk through them. To make a new document, build an xml document, validate it, and then write it either as xml or psv. To read a document, read it into an xml tree and use methods on the tree. Obviously we can build a layer on top of this but I haven't given that much work yet. I think it is not a good idea to make a big struct of xmlChar* pointers, since that's going to 1) be a recipe for memory leaks 2) be slow because it's mostly going to be empty I think going through the node interface by strings is better, In C++ and Python that's easy. In C and Fortran this is harder but I think we should be dealing with the xml directly or indirectly (but conceptually) in all cases. C) Unicode handling -> Use native UTF-8 whenever possible Note python3 will not write UTF-8 to stdout unless the right environment variables are set. This is going to be a bigger problem in the future. While C/C++ will write bytes, having improper terminal settings can create surprises. Recommendation: Transform from file to file. View files with an editor that supports utf-8 or use file:// on you web browser, which is happy with utf-8. -----------------------------------