Create new view-report.cgi
preaction opened this issue · 7 comments
The existing view-report.cgi is very slow and is often the single cause of server overload. To help reduce this overload, we can migrate this first thing to a new Mojolicious web app. The larger purpose of this web app will be to maintain legacy APIs for backwards compatibility.
To replace the existing view-report.cgi, we need a web app with the following routes:
/report/:guid
- View the given report as HTML
- Fetch the report using the schema API (cpan-testers/cpantesters-schema#4)
- Use a template that looks as much like the current template as possible
- Example: http://www.cpantesters.org/cpan/report/7185287a-1bf3-11e7-8a18-c6c6a528974d
/report/:guid?json=1
- Get the report as JSON
- Fetch the report using the schema API (cpan-testers/cpantesters-schema#4)
- Dump the result as JSON (example: http://www.cpantesters.org/cpan/report/7185287a-1bf3-11e7-8a18-c6c6a528974d?json=1)
/report/:guid?raw=1
- This is just the text of the report without the HTML around it
- Example: http://www.cpantesters.org/cpan/report/7185287a-1bf3-11e7-8a18-c6c6a528974d?raw=1
The application should be started by a script in bin/cpantesters-web-legacy
. The application should be deployed by the Rexfile, and started by a runit service file written in etc/runit/web-legacy
(remember to add the corresponding log
service too).
The current report also expects requests using nttp instead of guid as parameter:
sub retrieve_report {
$tvars{body}{result} = '""';
if ( $cgiparams{id} =~ /^\d+$/ ) {
#$dbi is from Labyrinth::Variables, configured by Labyrinth::DBUtils DBConnect
#config/phrasebook.ini:GetStatReport=SELECT * FROM cpanstats.cpanstats WHERE id=?
my @rows = $dbi->GetQuery( 'hash', 'GetStatReport', $cgiparams{id} );
if (@rows) {
if ( $rows[0]->{guid} =~ /^[0-9]+\-[-\w]+$/ ) {
my $id = guid_to_nntp( $rows[0]->{guid} );
_parse_nntp_report($id);
}
else {
$cgiparams{id} = $rows[0]->{guid};
_parse_guid_report();
}
}
else {
#$tvars{errcode} = 'NEXT';
#$tvars{command} = 'cpan-distunk';
}
}
elsif ( $cgiparams{id} =~ /^[\w-]+$/ ) {
my $id = guid_to_nntp( $cgiparams{id} );
if ($id) {
_parse_nntp_report($id);
}
else {
_parse_guid_report();
}
}
See new branch for this issue at https://github.com/cpan-testers/cpantesters-web/tree/new_view-report.cgi
I would encourage dropping the nntp number support, as these were only for reports submitting upto September 2010, for < 9 million reports. It's rare that anyone (excluding bots) would even use that number for retrieval any more. All those reports now have GUIDs anyway, so they aren't lost.
Thanks, @barbie, I was wondering about that: If all the reports got migrated to Metabase, then, yeah, that whole section of code can be ignored.
If we want to be super extra safe, we can view the logs to see if anyone's accessed those IDs in the last week or so. I can't think of any way they've been linked, so likely even bots won't show up in the logs. I have no strong feeling on that. And if someone comes forward saying it was important to them, we can always add it back (they won't, so I'm not worried).
I have some more doubts here.
This issue refers to the rewrite of view-report.cgi, but:
The local metabase cache holds the raw text of all the reports that we have. This will soon be the only metabase (see cpan-testers/cpantesters-api#3), so we should have an API for this in the schema. Using this new API as an abstraction will allow us to change the details of the local metabase cache later.
I'm not seeing that view-report.cgi uses a "local" cache (unless the local cache is the Mysql server at IP ending in 45), this is from a debug session of view-report.cgi:
Labyrinth::Globals::DBConnect(lib/Labyrinth/Globals.pm:412):
412: my $logfile = $settings{logfile};
DB<4>
Labyrinth::Globals::DBConnect(lib/Labyrinth/Globals.pm:413):
413: my $phrasebook = $settings{phrasebook};
DB<4>
Labyrinth::Globals::DBConnect(lib/Labyrinth/Globals.pm:414):
414: my $dictionary = $settings{dictionary};
DB<4>
Labyrinth::Globals::DBConnect(lib/Labyrinth/Globals.pm:416):
416: $dbi = Labyrinth::DBUtils->new({
417: driver => $settings{driver},
418: database => $settings{database},
419: dbfile => $settings{dbfile},
420: dbhost => $settings{dbhost},
421: dbport => $settings{dbport},
422: dbuser => $settings{dbuser},
423: dbpass => $settings{dbpass},
424: autocommit => $settings{autocommit},
425: logfile => $logfile,
426: phrasebook => $phrasebook,
427: dictionary => $dictionary,
DB<4> p $settings{driver}
mysql
DB<5> p $settings{database}
reports
DB<6> p $settings{dbhost}
###########.45
DB<7> q
These are the queries executed by the CGI that I checked so far (in this specific order) for getting reports from GUID:
- config/phrasebook.ini:AllDistIndices=
SELECT uploadid,dist,version,type FROM cpanstats.uploads
- config/phrasebook.ini:AllOSNames=
SELECT osname,ostitle FROM cpanstats.osname
- config/phrasebook.ini:GetMetabaseByGUID=
SELECT * FROM metabase.metabase WHERE guid=?
- config/phrasebook.ini:GetTesterFact=
SELECT mte.*,tp.* FROM metabase.testers_email mte LEFT JOIN testers.address ta ON ta.email=mte.email LEFT JOIN testers.profile tp ON tp.testerid=ta.testerid WHERE mte.resource=? ORDER BY tp.testerid DESC
This issue refers to a REST API that retrieves data based on DBIx::Class based https://github.com/cpan-testers/cpantesters-schema, but I don't see any the databases/tables that the view-report.cgi uses nowadays. I listed the EER diagrams of the tables that the CGI uses:
So, if I understood correctly, it would be necessary to implement the corresponding DBIx::Class
es of metabase
and testers
and then implement the corresponding REST methods? Or these are already implemented (DBIx::Class
and/or REST)?
#9 is ready for revision.
I missed this thread, but from the commits in #9 you've done the right thing and connected to the database directly instead of through DBIx::Class. I tried writing DBIx::Class results for the tables in the metabase
database, like you saw, but it was annoying at best (it made things really hard to test). The new plan is to only write DBIx::Class result classes for tables found in the cpanstats
database (and, if necessary, move tables into that database, like the new metabase_user
table).
The metabase cache is in the MySQL database you found, yes. I had called it the "local" metabase as that database used to live on the CPAN Testers server with the website and everything else, and it's "local" as distinct from the Amazon SimpleDB (which is a different database system entirely).
The REST API I referred to was for the cpantesters-api project, but that all is done now in a different way (using the new test report format). The cpantesters-web project is meant to access the database directly either through DBI (DBIx::Connector) or CPAN::Testers::Schema (DBIx::Class), not through the API server. Right now the idea is that it will make the website faster to not have the layer of abstraction there, and it's easier to write the website in Perl using DBIx::Class than it would be to write it in Perl or JavaScript using the JSON API (also developing that JSON API would mean everything would have to be done twice).