python抓取股票信息,如何爬取股票数据

获取股票数据的接口很多。免费界面有新浪、网络、雅虎的API界面，收费的是证券公司和相应公司提供的界面。

付费试用界面中常见的数据是最近一年或三年的，限制很多，除非money足够多。

因此，本文主要研究免费数据的获取和处理。

sinajs、money.163.com、雅虎等提供国内股票数据的接口，提供的API接口不同，每个提供的数据大同小异，可以选择一个数据进行处理。

目前国内有开源财经数据获取包，封装了上述接口，无论数据源从哪里走，都从最快的源优先获取数据。很容易使用。我是图共享。具体的安装方法请参考链接。

本文介绍了一种基于TuShare数据获取开发，获取a股所有股票历史k线数据的方法。

一、获得a股上市公司名单

import tushare as ts

导入pandas as PD

def download_stock_basic_info () :

try:

df=ts.get_stock_basics (

直接保存到csv

print 'choose csv '

df.to_CSV(stock_basic_list.CSV )；

打印下载CSV完成' 1

股票清单包含目前a股2756只股票的基本信息。包括以下内容：

代码

name，名字

产业，所属行业

area，地区

pe，市盈率

outstanding，流通股东

totals，总股本(万) )。

总资产(万) )。

liquid资产，流动资产

固定资产，固定资产

保留，公积金

reservedPerShare，每股公积金

eps，每股收益

bvps，每股净资产

pb，市值

时间到市场，上市日期1

二、获得个股历史k线

获取的kline数据如下。

date :个交易日(索引) )

open :开盘价(前复权，默认) )。

high :最高值(前恢复权，默认) ) ) ) ) )。

close :收盘价(前恢复权，默认)

low :最低价(前复权，默认) )

open_nfq :开盘价(权利恢复) ) ) ) ) ) )

high_nfq :最高价格(恢复权利) ) ) ) ) )

close_nfq :收盘价(停止权利) ) ) ) ) )。

low_nfq :最低价(权利停止) ) ) ) )

open_hfq :开盘价(后复权) ) ) ) ) ) ) ) ) )。

high_hfq :最高值(后恢复) ) ) ) ) ) ) ) )。

close_hfq :收盘价(后复权) ) ) ) ) ) )。

low_hfq :最低价(后复权) ) ) ) ) ) ) )

volume :成交量

amount :成交金额1

下载股票代码代码代码的股票历史kline，默认为上市日期到今天的kline数据，本地下载60000数据到2015-6-19，重新运行后从6.20下载，本地

#默认值为从发售日到今天的kline数据

#可以指定开始、结束日期：“2015-06-28”的格式

efdownload_stock_kline(code，date_start='，date_end=datetime.date.today ) ) :

code=util.getsixdigitalstockcode (code ) #将股票代码格式化为6位数字

try:

filename=' h _ Kline _ ' str (代码) '.csv '

writeMode='w '

ifos.path.exists (cm.downloaddirfilename ) :

# print (退出：代码) ) ) ) ) ) ) )。

df=pd.Data

Frame.from_csv(path=cm.DownloadDir+fileName)

se = df.head(1).index #取已有文件的最近日期

dateNew = se[0] + datetime.timedelta(1)

date_start = dateNew.strftime("%Y-%m-%d")

#print date_start

writeMode = 'a'

if date_start == '':

se = get_stock_info(code)

date_start = se['timeToMarket']

date = datetime.datetime.strptime(str(date_start), "%Y%m%d")

date_start = date.strftime('%Y-%m-%d')

date_end = date_end.strftime('%Y-%m-%d')

# 已经是最新的数据

if date_start >= date_end:

df = pd.read_csv(cm.DownloadDir+fileName)

return df

print 'download ' + str(code) + ' k-line >>>begin (', date_start+u' 到 '+date_end+')'

df_qfq = ts.get_h_data(str(code), start=date_start, end=date_end) # 前复权

df_nfq = ts.get_h_data(str(code), start=date_start, end=date_end) # 不复权

df_hfq = ts.get_h_data(str(code), start=date_start, end=date_end) # 后复权

if df_qfq is None or df_nfq is None or df_hfq is None:

return None

df_qfq['open_no_fq'] = df_nfq['open']

df_qfq['high_no_fq'] = df_nfq['high']

df_qfq['close_no_fq'] = df_nfq['close']

df_qfq['low_no_fq'] = df_nfq['low']

df_qfq['open_hfq']=df_hfq['open']

df_qfq['high_hfq']=df_hfq['high']

df_qfq['close_hfq']=df_hfq['close']

df_qfq['low_hfq']=df_hfq['low']

if writeMode == 'w':

df_qfq.to_csv(cm.DownloadDir+fileName)

else:

df_old = pd.DataFrame.from_csv(cm.DownloadDir + fileName)

# 按日期由远及近

df_old = df_old.reindex(df_old.index[::-1])

df_qfq = df_qfq.reindex(df_qfq.index[::-1])

df_new = df_old.append(df_qfq)

#print df_new

# 按日期由近及远

df_new = df_new.reindex(df_new.index[::-1])

df_new.to_csv(cm.DownloadDir+fileName)

#df_qfq = df_new

print 'ndownload ' + str(code) + ' k-line finish'

return pd.read_csv(cm.DownloadDir+fileName)

except Exception as e:

print str(e)

return None1

三、获取所有股票的历史K线

# 获取所有股票的历史K线

def download_all_stock_history_k_line():

print 'download all stock k-line'

try:

df = pd.DataFrame.from_csv(cm.DownloadDir + cm.TABLE_STOCKS_BASIC + '.csv')

pool = ThreadPool(processes=10)

pool.map(download_stock_kline, df.index)

pool.close()

pool.join()

except Exception as e:

print str(e)

print 'download all stock k-line'1

Map来自函数语言Lisp，map函数能够按序映射出另一个函数。

urls = ['http://www.yahoo.com', 'http://www.reddit.com']

results = map(urllib2.urlopen, urls)1

有两个能够支持通过map函数来完成并行的库：一个是multiprocessing，另一个是鲜为人知但功能强大的子文件：multiprocessing.dummy。

Dummy就是多进程模块的克隆文件。唯一不同的是，多进程模块使用的是进程，而dummy则使用线程(当然，它有所有Python常见的限制)。

通过指定processes的个数来调用多线程。

附：文中用到的其他函数及变量，定义如下：

TABLE_STOCKS_BASIC = 'stock_basic_list'

DownloadDir = os.path.pardir + '/stockdata/' # os.path.pardir: 上级目录

# 补全股票代码(6位股票代码)

# input: int or string

# output: string

def getSixDigitalStockCode(code):

strZero = ''

for i in range(len(str(code)), 6):

strZero += '0'

return strZero + str(code)1

···