首页 > 编程知识 正文

python中append和extend的效率

时间:2023-05-05 07:53:29 阅读:196853 作者:63

业务场景:

    有上亿条的数据入库解析并且入库到sqlserver中去,所以每次优化一秒钟,可能对入库的性能就能提升一天。


    python语句的优化,里边有对list数据去重的代码如下:

   

object_id_set = [] remove_objects = [] for object in objects: try: if object['object_id'] in object_id_set: remove_objects.append(object) else: object_id_set.append(object['object_id']) except Exception as e: rh_utils.rh_logger.exception('filter object error: {0}, object: {1}'.format(e, object)) for remove_object in remove_objects: objects.remove(remove_object)
修改点就是 把 object_id_set 变成 dict: {}

   

object_id_set = {} remove_objects = [] for object in objects: try: if object['object_id'] in object_id_set: remove_objects.append(object) else: object_id_set[object['object_id']] = 1 except Exception as e: remove_objects.append(object) rh_utils.rh_logger.exception('filter object error: {0}, object: {1}'.format(e, object)) for remove_object in remove_objects: objects.remove(remove_object)
当 过滤的objects的数目达到1万的时候,时间上面方式的时间居然达到了2s,不可接受。改成下面方法之后变成0.1s的样子。

看来对list进行 in 操作应该是遍历了整个 list的复杂度为 O(n), 而dict大概有类似hash的东西保证查询的复杂度为 O(1).



版权声明:该文观点仅代表作者本人。处理文章:请发送邮件至 三1五14八八95#扣扣.com 举报,一经查实,本站将立刻删除。