get_catalogue

pyrcs.parser.get_catalogue(url, update=False, confirmation_required=True, json_it=True, verbose=False)[source]

Gets the catalogue of items from the main page of a data cluster.

This function scrapes a catalogue of entries (typically hyperlinks) from a specified URL. It offers the option to save the catalogue as a JSON file.

Parameters:
  • url (str) – The URL of the main page of a data cluster.

  • update (bool) – Whether to check for updates to the package data; defaults to False.

  • confirmation_required (bool) – Whether user confirmation is required before proceeding; defaults to True.

  • json_it (bool) – Whether to save the catalogue as a JSON file; defaults to True.

  • verbose (bool | int) – Whether to print relevant information to the console; defaults to False.

Returns:

The catalogue in the form of a dictionary, where keys are entry titles and values are URLs, or None if the operation is unsuccessful.

Return type:

dict | None

Examples:

>>> from pyrcs.parser import get_catalogue
>>> elr_cat = get_catalogue(url='http://www.railwaycodes.org.uk/elrs/elr0.shtm')
>>> type(elr_cat)
dict
>>> list(elr_cat.keys())[:5]
['Introduction', 'A', 'B', 'C', 'D']
>>> list(elr_cat.keys())[-5:]
['Lines without codes',
 'ELR/LOR converter',
 'LUL system',
 'DLR system',
 'Canals']
>>> line_data_cat = get_catalogue(url='http://www.railwaycodes.org.uk/linedatamenu.shtm')
>>> type(line_data_cat)
dict
>>> list(line_data_cat.keys())
['ELRs and mileages',
 'Electrification masts and related features',
 'CRS, NLC, TIPLOC and STANOX Codes',
 'Line of Route (LOR/PRIDE) codes',
 'Line names',
 'Track diagrams']