python 搜索页面标签-白红宇

python 搜索页面标签

阅读量：6148 次

发布时间：2019-06-21

本文共 1074 字，大约阅读时间需要 3 分钟。

import urllib2 from sgmllib import SGMLParser class ListName(SGMLParser):	def __init__(self):		SGMLParser.__init__(self)		self.is_h4=""		self.name=[]	def start_h4(self,attrs):		self.is_h4=1	def end_h4(self):		self.is_h4=""	def handle_data(self,text):		if self.is_h4==1:			self.name.append(text)#content = urllib2.urlopen('http://list.taobao.com/browse/cat-0.htm').read()listname = ListName()listname.feed(content)for item in listname.name:	print item.decode('gbk')

显示以下内容：

虚拟票务

数码市场

家电市场

女装市场

男装市场

童装童鞋

女鞋市场

男鞋市场

内衣市场

箱包市场

服饰配件

珠宝饰品

美容市场

母婴市场

家居市场

日用市场

食品/保健

运动鞋服

运动户外

汽车摩托

玩具市场

文化用品市场

爱好市场

生活服务

－－－－－－－－－－－－－－－－－－－－－

另一种方式：

pyQuery

是jQuery在python中的实现，能够以jQuery的语法来操作解析HTML文档，十分方便。使用前需要安装，easy_install pyquery即可，或者Ubuntu下

sudo apt-get install python-pyquery

以下例子：

from pyquery import PyQuery as pyqdoc=pyq(url=r'http://list.taobao.com/browse/cat-0.htm')cts=doc('.market-cat') for i in cts:	print '====',pyq(i).find('h4').text() ,'===='	for j in pyq(i).find('.sub'):		print pyq(j).text() ,	print '\n' －－－－－－－－－－－－－－－ （转）

转载于:https://www.cnblogs.com/xmyy/articles/2839363.html

你可能感兴趣的文章

代码描述10313 - Pay the Price

查看>>

jQuery最佳实践

查看>>

centos64i386下apache 403没有权限访问。