get_page_catalogue

pyrcs.parser.get_page_catalogue(url, head_tag_name='nav', head_tag_txt='Jump to: ', feature_tag_name='h3', verbose=False)[source]

Get the catalogue of the main page of a data cluster.

Parameters:
  • url (str) – URL of the main page of a data cluster

  • head_tag_name (str) – tag name of the feature list at the top of the page, defaults to 'nav'

  • head_tag_txt (str) – text that is contained in the head_tag, defaults to 'Jump to: '

  • feature_tag_name (str) – tag name of the headings of each feature, defaults to 'h3'

  • verbose (bool | int) – whether to print relevant information in console, defaults to False

Returns:

catalogue of the main page of a data cluster

Return type:

pandas.DataFrame

Example:

>>> from pyrcs.parser import get_page_catalogue
>>> from pyhelpers.settings import pd_preferences

>>> pd_preferences(max_columns=1)

>>> elec_url = 'http://www.railwaycodes.org.uk/electrification/mast_prefix2.shtm'

>>> elec_catalogue = get_page_catalogue(elec_url)
>>> elec_catalogue
                                              Feature  ...
0                                     Beamish Tramway  ...
1                                  Birkenhead Tramway  ...
2                         Black Country Living Museum  ...
3                                   Blackpool Tramway  ...
4   Brighton and Rottingdean Seashore Electric Rai...  ...
..                                                ...  ...
17                                     Seaton Tramway  ...
18                                Sheffield Supertram  ...
19                          Snaefell Mountain Railway  ...
20  Summerlee, Museum of Scottish Industrial Life ...  ...
21                                  Tyne & Wear Metro  ...

[22 rows x 3 columns]

>>> elec_catalogue.columns.to_list()
['Feature', 'URL', 'Heading']