get_hypertext

pyrcs.parser.get_hypertext(hypertext_tag, hyperlink_tag_name='a', md_style=True)[source]

Gets hyperlinked text from a specified HTML tag.

This function scrapes hypertext content, optionally returning it in Markdown format if requested.

Parameters:
  • hypertext_tag (bs4.element.Tag | bs4.element.PageElement) – The tag containing hyperlinked text.

  • hyperlink_tag_name (str) – The tag name of the hyperlink within the hypertext; defaults to 'a'.

  • md_style (bool) – Whether to return the hypertext in Markdown style, defaults to True.

Returns:

The hypertext.

Return type:

str

Examples:

>>> from pyrcs.parser import get_hypertext
>>> from pyrcs.line_data import Electrification
>>> import bs4
>>> import requests
>>> elec = Electrification()
>>> url = elec.catalogue[elec.KEY_TO_INDEPENDENT_LINES]
>>> source = requests.get(url)
>>> soup = bs4.BeautifulSoup(source.content, 'html.parser')
>>> h3 = soup.find('h3')
>>> p = h3.find_all_next('p')[8]
>>> p
<p>Croydon Tramlink mast references can be found on the <a href="http://www.croydon-traml...
>>> hyper_txt = get_hypertext(hypertext_tag=p, md_style=True)
>>> hyper_txt
'Croydon Tramlink mast references can be found on the [Croydon Tramlink Unofficial Site](...