get_catalogue¶
- pyrcs.parser.get_catalogue(url, update=False, confirmation_required=True, json_it=True, verbose=False)[source]¶
Gets the catalogue of items from the main page of a data cluster.
This function scrapes a catalogue of entries (typically hyperlinks) from a specified URL. It offers the option to save the catalogue as a JSON file.
- Parameters:
url (str) – The URL of the main page of a data cluster.
update (bool) – Whether to check for updates to the package data; defaults to
False
.confirmation_required (bool) – Whether user confirmation is required before proceeding; defaults to
True
.json_it (bool) – Whether to save the catalogue as a JSON file; defaults to
True
.verbose (bool | int) – Whether to print relevant information to the console; defaults to
False
.
- Returns:
The catalogue in the form of a dictionary, where keys are entry titles and values are URLs, or
None
if the operation is unsuccessful.- Return type:
dict | None
Examples:
>>> from pyrcs.parser import get_catalogue >>> elr_cat = get_catalogue(url='http://www.railwaycodes.org.uk/elrs/elr0.shtm') >>> type(elr_cat) dict >>> list(elr_cat.keys())[:5] ['Introduction', 'A', 'B', 'C', 'D'] >>> list(elr_cat.keys())[-5:] ['Lines without codes', 'ELR/LOR converter', 'LUL system', 'DLR system', 'Canals'] >>> line_data_cat = get_catalogue(url='http://www.railwaycodes.org.uk/linedatamenu.shtm') >>> type(line_data_cat) dict >>> list(line_data_cat.keys()) ['ELRs and mileages', 'Electrification masts and related features', 'CRS, NLC, TIPLOC and STANOX Codes', 'Line of Route (LOR/PRIDE) codes', 'Line names', 'Track diagrams']