站長資訊平臺

BeautifulSoup模塊的簡單使用

2018-07-20 來源：open-open

可以通過dir(BeautifulSoup.BeautifulSoup)查看其有什么函數(shù)，如果想知道某個函數(shù)的含義可以使用help(BeautifulSoup.BeautifulSoup.find)來查看其官方文檔。

可以使用pprint來整輸出，使用dir和help之前一定要import BeautifulSoup。

    # -*- coding:utf8 -*-  
    import urllib  
    import urllib2  
    import BeautifulSoup  
    import re  
       
    htmlSource = urllib.urlopen("http://www.taobao.com/").read(200000)  
    soup = BeautifulSoup.BeautifulSoup(htmlSource)  
      
    #輸出<head>...</head>  
    print soup.head  
      
    #輸出<title>...</title>  
    print soup.head.title  
      
    #會返回一個列表，每個列表元素都是<a>...</a>   
    tags = soup.findAll('a')  
    print tags  
      
    print '京東放養(yǎng)的爬蟲'  
      
    #取<a></a>中間包含的元素，如果有href則輸出  
    for item in soup.fetch('a',href=True):  
        print item['href']  
          
    #找到所有的<a></a>,如果其中href元素中含有taobao則輸出  
    for a in soup.findAll('a',href=True):  
        if re.findall('taobao', a['href']):  
            print "Found the URL:", a['href']  
              
    #輸出<div></div>中間class屬性等于J_Tanx mod，只輸出第一個  
    print str(soup.find("div",{"class":"J_Tanx mod"}))

標(biāo)簽：

版權(quán)申明：本站文章部分自網(wǎng)絡(luò)，如有侵權(quán)，請聯(lián)系：west999com@outlook.com
特別注意：本站所有轉(zhuǎn)載文章言論不代表本站觀點！
本站所提供的圖片等素材，版權(quán)歸原作者所有，如需使用，請與原作者聯(lián)系。

上一篇:php隨機密碼生成器

下一篇:php MIME類型數(shù)組

相關(guān)文章

最新資訊

熱門推薦

為學(xué)習(xí)和知識分享目的，本站文章部分自網(wǎng)絡(luò)，本站文章部分自網(wǎng)絡(luò)，如有侵權(quán)，請聯(lián)系：2653426586@qq.com QQ：2653426586

如有其他需求，請聯(lián)系：2653426586@qq.com QQ：2653426586

友情鏈接：網(wǎng)絡(luò)安全運維經(jīng)驗 IT技術(shù)分享運維隨筆錄鮮花東郊到家往約到家

中文字幕在线观看,亚洲а∨天堂久久精品9966,亚洲成a人片在线观看你懂的,亚洲av成人片无码网站,亚洲国产精品无码久久久五月天

BeautifulSoup模塊的簡單使用