One of a series of test instances for migrating the Koha Wiki MediaWiki database.
For the current Koha Wiki, visit https://wiki.koha-community.org .Check-url enhancements
check-url.pl
This tool is available in /misc/cronjobs/check-url.pl.
Documentation
Script itself contains a documentation : perldoc check-url.pl.
NAME
check-url.pl - Check URLs from 856$u field.
USAGE
check-url.pl [--verbose|--help] [--host=http://default.tld]
Scan all URLs found in 856$u of bib records and display if resources are available or not.
PARAMETERS
--host=http://default.tld
Server host used when URL doesn’t have one, ie doesn’t begin with ’http:’. For example, if --host=http://www.mylib.com, then when 856$u contains ’img/image.jpg’, the url checked is: http://www.mylib.com/image.jpg’.
--tags
Tags containing URLs in $u subfields. If not provided, 856 tag is checked. Multiple tags can be specified, for example:
check-url-quick.pl --tags 310 410 856
--verbose|-v
Outputs both successful and failed URLs.
--html
Formats output in HTML. The result can be redirected to a file accessible by http. This way, it’s possible to link directly to biblio record in edit mode. With this parameter --host-pro is required.
--host-intranet=http://koha-pro.tld
Server host used to link to biblio record editing page.
--timeout=10
Timeout for fetching URLs. By default 10 seconds.
--maxconn=1000
Number of simulaneous HTTP requests. By default 200 connexions.
--help|-h
Print this help page. Example command line that only puts out the "bad" urls standard dependancies for perl directory etc.. perl check-url-v5.pl --html --htmldir=/path to docs/koha-tmpl --host=http://koha.xxx.xxx your server here:8080 or 80 if required for staff access.
Example output:
Comments
Questions from David Schuster, Plano ISD, answers from Frédéric Demians.
We've added the ability to run it through a proxy if needed, but you will have to get your proxy information to edit the .pl
Q -- Designed as a cron job that I believe would email the results to the cron email? But have not tested it in production yet.
R -- It's you choice depending on you needs. See below.
Q --Depending on the number of URL's you have in your database this may take 1-3 minutes per URL to run.
A -- It depends of the time required to fetch an URL. For local URL, response is obviously very quick. For remote resources, it depends. There is solution: parallelization! See a module like [[1]].
Q -- This tool does an sql on the existing database for 856 URL links and checks the link reporting back the Biblio number, URL from the Biblio, and the status "OK" or the response from the server, "404...", "500..." etc.. -
R -- It shouldn't write its result into a file. You have to redirect the result to a file if you want, or to a MTA. If you use --html option and --host-pro, the result can be redirected into an html file, for example in the koha-tmpl directory: this way, librarians can open this file and get instant access to biblio records with invalid URLs in modification.
The output of the file provides - Biblio number hotlinked, URL from the 856, results OK or error message.
On my test system with a proxy set correctly(behind firewall with proxy server) I have 16000 urls and takes about 2 hours to check them all and complete the output.