2.1 BeautifulSoup 解析网页: 基础
soup = BeautifulSoup(html, features='lxml')
print(soup.h1)
"""
<h1>爬虫测试1</h1>
"""
print('\n', soup.p)
"""
<p>
这是一个在 <a href="https://morvanzhou.github.io/">莫烦Python</a>
<a href="https://morvanzhou.github.io/tutorials/scraping">爬虫教程</a> 中的简单测试.
</p>
""""""
<a href="https://morvanzhou.github.io/tutorials/scraping">爬虫教程</a>
"""
all_href = soup.find_all('a')
all_href = [l['href'] for l in all_href]
print('\n', all_href)
# ['https://morvanzhou.github.io/', 'https://morvanzhou.github.io/tutorials/scraping']Last updated