qpython,python数据分析实例

作为17gb CSV文件(fileData )存储的大型数据集最多包含30，000个可变记录)，在特定客户) fileSelection-总共90000个客户中，我拥有1500个

我对Python不熟悉，之所以使用它是因为vba和matlab无法处理文件大小。 (我用aptanastudio编写代码，但为了加快速度，我直接从cmd行运行python。运行64位Windows 7。）

我写的代码提取了一些客户，但有两个问题：

1 )大数据集找不到大多数客户。 (我相信它们都在数据集上，但不能完全确定。）

2 )那太慢了。如果能更好地利用核心代码，那就更好了

代码如下。 ` def main () :

# Initialisation :

#-identifycolumnsinslectionfile

fs=打开(文件选择，' r ' ) )。

if fS.mode=='r':

header=fS.readline (

selheaderlist=header.split ('，') )。

cust key=selheaderlist.index (customer _ key ) )。

# Identify columns in dataset file

fileData=path2 file_data

软盘=打开(文件数据，' r ' ) ) ) )。

if fD.mode=='r':

header=fD.readline (

dataheaderlist=header.split ('，') )。

custid=dataheaderlist.index (customer _ id ) )。

软盘. close () )

# foreachcustomerintheselectionfile

客户计数=1

for sr in fS:

# findcustomerkeyandlocateitincustomeridfieldindataset

selrecord=sr.split ('，')

请求客户=sel record [ cust key ]

# lookforrequiredcustomerindataset

found=0

软盘=打开(文件数据，' r ' ) ) ) )。

if fD.mode=='r':

while found==0:

dr=fD.readline (

if not dr: break

datrecord=dr.split ('，')

ifdatrecord [ custid ]==required customer 3360

found=1

#打开输出文件

file output=path3file _ out _ rootstr (要求的客户) '.csv '

fo=open (文件输出，' w ' ) ) )。

fo.write(str ) header ) )

# Copyallrecordsforrequiredcustomernumber

wiledatrecord [ custid ]==要求的客户3360

fo.write(str ) dr ) )

dr=fD.readline (

datrecord=dr.split ('，')

#关闭输出文件

fO.close () ) )。

if found==1:

打印(客户计数' str )客户计数(客户id ' str )请求客户(copied.' )

客户计数=客户计数1

else:

打印(customerid ' str ) requiredcustomer ) (not found in dataset ) )。

fl.write(str ) requiredcustomer )、' ' NOT FOUND ' ) )

软盘. close () )

fS.close () )

花了几天找到数百名顾客，但没有找到更多的顾客。有

谢谢@Paul Cornelius。这样效率更高。我采用了你的方法，还使用了@Bernardo建议的csv处理：

^{pr2}$