/gitpan

Git repositories for all of CPAN

Primary LanguagePerl

A placeholder for the gitPAN project so it can have an issue tracker,
wiki and all that.

There is very little code written for gitpan.  It is mostly just gluing
together other people's work:  Leon's Parse::BACKPAN::Packages and Yanick's
git-cpan-patch.  The patched versions of these necessary to make gitpan
work are linked in as git submodules.

FAQ
***

What is gitPAN?
---------------

gitPAN is a project to import the entire history of CPAN (known as
BackPAN) into a set of git repositories, one per distribution.


What is CPAN?
-------------

CPAN is the Comprehensive Perl Archive Network at cpan.org.  It is an
archive of tens of thousands of Perl modules written by thousands of
authors.  A good interface to it is http://search.cpan.org


What is BackPAN?
----------------

In order to limit CPAN's size, authors are requested to delete old
releases.  BackPAN maintains all CPAN releases, even deleted ones, and
is a complete history of CPAN.  There are only a few BackPAN mirrors
such as http://backpan.perl.org


Why is gitPAN?
--------------

CPAN (and thus BackPAN) is a pile of tarballs organized by author.  It
is difficult to get the complete history of a distribution, especially
one that has changed authors or is released by multiple authors (for
example, Moose).  Because releases are regularly deleted from CPAN
even sites like search.cpan.org provide an incomplete history.  Having
the complete history of each distrubtion in its own repository makes
the full distribution history easy to access.

gitPAN also hopes to make patching CPAN modules easier.  Ideally you
simply clone the gitPAN repository and work.  New releases can be
pulled and merged from gitPAN.

gitPAN hopes to showcase using a repository as an archive format,
rather than a pile of tarballs.  A repository is far more useful than
a pile of tarballs, and contrary to many people's expectations, the
repository is turning out smaller.

Finally, gitPAN is being created in the hope that "if you build it
they will come".  Getting data out of CPAN in an automated fashion has
traditionally been difficult.


Where is gitPAN?
----------------

The repositories are on github.com at http://github.com/gitpan
(watch out, it may overload your browser).

Code, discussion, and issues can be had at http://github.com/schwern/gitpan.


How do I access a distribution on gitPAN?
-----------------------------------------

Simplest way is to go to http://github.com/gitpan/<distribution>.
For example, Acme-Pony can be found at http://github.com/gitpan/Acme-Pony.
Instructions for futher access can be found there.

The clone URL for a given distribution is git://github.com/gitpan/<distribution>.git.
You can clone without a github account.


How big is BackPAN?
-------------------

BackPAN (just the modules, we're not doing perl releases) contains
about 120k archive files (mostly gzipped tarballs) representing about
21,000 distributions from 5000 authors taking up 14 gigs of space.
Tarballs consume about 12 gigs, not sure where the other 2 is going
(readme files, meta files, random non-distro junk, block size
rounding?).


How big is gitPAN?
------------------

gitPAN consists of over 21,000 repositories representing each CPAN
distribution.  Disk usage (garbage collected repositories with no
checkout) is 4.3 gigs.  It imported 120,000 files weighing in at 9.7
gigs giving a compression ratio of over 2x.

gitPAN consumes about 150 gigs on github, presumably due to indexing.


Did gitPAN skip anything?
-------------------------

Yes.  It skipped perl, parrot and parrot-cfg.  They're not really CPAN
modules and they have far more complete repositories.  It may
skip more in the future, these are the ones I noticed.


Will you be adding X to gitPAN?
-------------------------------

The primary focus is to get accurate repositories for each CPAN
distribution and to make this data available for others to use.  When
you think "will gitPAN do X" instead think "how can I use gitPAN to
build X?"

That said, suggestions on how to improve the data available from
gitpan heartily accepted.


How can I merge gitPAN's history with my module?
------------------------------------------------

If you are the owner of a CPAN module and have an existing, but
incomplete, repository you can fill in the history using gitPAN.  The
technique is outlined in this article.
http://use.perl.org/~schwern/journal/39974


How do I update my module on gitpan?
------------------------------------

gitpan will automatically pull new releases from CPAN.

Updates are currently happening sporatically pending setting up the
hosting for the importer and improving logging and error recovery.


Where can I get a list of all the repositories?
-----------------------------------------------

You can get it from github's API <http://develop.github.com> with the
URL <http://github.com/api/v2/json/repos/show/gitpan>.  At the time of
this writing the call is timing out.  Your alternative is to get a
list of distributions from BackPAN::Index.


How can I help?
---------------

See http://github.com/schwern/gitpan/issues for a list of open problems.

You can also contribute by looking through imported CPAN distributions,
checking for mistakes and reporting them as issues.


I'm the author of X distribution and my real repository is over here!
---------------------------------------------------------------------

gitPAN is intended to co-exist with, not compete with, the development
repository for a distribution.  It provides a consistent, easy to find
interface to your releases so you don't have to.

You can make your own development repository more visible by adding
a repository resource to your release meta-data.  See
http://module-build.sourceforge.net/META-spec-v1.4.html#resources


I'm the author of X distribution, can I get commit access to gitPAN?
--------------------------------------------------------------------

Sorry, no.  gitPAN is intentionally read-only to provide a consistent
interface over all of CPAN.  Allowing developers to commit directly to
gitPAN would endanger this consistency.  In this sense, gitPAN is
simply a read-only view on your releases.

As the developer of the project, you should continue to develop
against your regular repo.  However, it is helpful to fill in back
history should you be missing it.  You can use the release tags and
dates on the gitpan repo to place tags into your development repo.  If
history is completely missing, you can splice your development
repository on top of the gitpan repo.  See "How can I merge gitPAN's
history with my module?" above.

You could develop off a gitpan fork, but the actual development
history of your project up to this point would be lost.  Merging your
dev repo with gitpan is left as an exercise for the reader to do
usefully.  If you tag your releases in a consistent manner and publish
the location of your repository, gitpan doesn't offer anything new to
the developer.


I noticed a problem with a repository
-------------------------------------

Please report it at <http://github.com/schwern/gitpan/issues> or to
schwern+github@gmail.com.


Who do we have to thank for gitPAN?
-----------------------------------

gitPAN exists on top of a pile of pre-existing technology and services.
Very little new code was written and the yaks were already well shorn.

Elaine Ashton for instituting BackPAN.
Jarkko, Graham Barr and the rest of the CPAN cabal.
Andreas König for tirelessly maintaining PAUSE.
brian d foy for spearheading BackPAN archeology.
Léon Brocard for Parse::BACKPAN::Packages to access the backpan index
  and maintaining the BackPAN index.
Linus and the git devs for git (this was tried before on SVN and guh...)
github.com for a generous donation of space and support and angry unicorns
Yanick Champoux for git-cpan-patch which does most of the work.
ftp.funet.fi for an rsync-able BackPAN mirror.
Michael Schwern glued it all together.


How can I contact gitPAN?
-------------------------

Email:   schwern+gitpan@gmail.com
Web:     http://github.com/gitpan/
Dev:     http://github.com/schwern/gitpan
Issues:  http://github.com/schwern/gitpan/issues
Twitter: #gitpan