As an Engineer — Optimize Python performance

更新於 2023/04/05發佈於 2023/04/05閱讀時間約 13 分鐘

When I was doing the development of the company’s data dashboard, I often pulled tables from various databases for calculation. API response time decreases when the pull time range is wide or when there are multiple tables. This post documents what I’m trying to optimize the process and other tricks that can improve program performance.

1. Change data storage structure

In addition to data types such as str and int, common Python data storage structures include list, tuple, and dict.

List should be a data structure that anyone who has learned any programming languages will be familiar with. A list is an array. Assume that there are a lot of data stored in the list today, and I need to retrieve specific data, and I have to search all over, as follows.

for i in someList:
  print(i)

Time complexity is O(n).

If your data storage today is two-dimensional, such as user data in different countries. The first thing you need to do is to find the country, and then find the data you want, as follows:

for i in someList:
  if i == someCountry:
    for j in someList[i]:
      print(j)

This time yime complexity up to O(n²).

Can tuples vs lists improve performance? As far as I know, there is no clear indication that tuples reduce time complexity. Because the principle of tuple is list, but tuple is a list whose length cannot be changed. When he first creates it, he requests a fixed length from memory. In addition to needing space for initialization and memory, List also needs a temporary storage area for future expansion of data. Therefore, in a tuple of the same data size, the required storage space is smaller than that of a list, and the space complexity is also lower.

Next, I will introduce a powerful function of python, which is dict. dict is short for dictionary. Since it is called a dictionary, there is a relationship between key and value. That is, the value is stored in a unique key. So if you want to query a specific information today, it is as follows:

print(someDict["yourKey"])

Time complexity is O(1).

Even with the example above, the data storage by country is as follows:

print(someDict["someCountry"]["yourKey"])

Time complexity still is O(1).

So in terms of data storage, I prefer Dict > Tuple > List

2. Use Redis

Redis — Powerful caching database. If your data doesn’t need to be updated immediately. (Not the Monitor). Redis is used in many places. When an API pulls a Table from DB to use, other APIs may also need it. At this time, if you have to re-enter SQL or NoSQL commands, the performance will be wasted. If we pull out the data of this table and temporarily store it in Redis, it will take about ten minutes. During these ten minutes, if your other API needs to use the same data, there is no need to re-download the SQL and DB request data. Requesting data from Redis is much faster.

Alternatively, return from your API can also be temporarily stored in Redis. Sometimes the user may refresh the page and resend the request. If these requests need to be recalculated in the background and re-request data from DB. This is very wasteful of performance. If you have saved the returned data in Redis at this time, you can return the result directly without any calculation.

3. Asyn functions for FastAPI

In the FastAPI framework, there is a very novel function, which is Async. Basically equivalent to Node.js’ async. Async allows you to wait for response times, temporarily store state, release thread, and handle other needs.

async def someFunc():
  await something

Remember that await can only be used in async functions. I’ve tested it myself so far. If your function does not need the await object, it is recommended to remove async. No need to wait for processing to remove async functions for better performance.

4. Use multiprocessing

The literal meaning of Multiprocessing is easy to understand, that is, multiple process handle different tasks separately. It has been discussed on the Internet that multiprocessing is more efficient than multithreading in python. So far, I haven’t tested it myself, so I can’t draw conclusions. This article first introduces the two multiprocessing methods I currently use.

from multiprocessing import Pool
with Pool(processes=2) as p:
  print(p.map(someFunc, [1, 2])

Pool can call some functions in parallel through the map function. Remember that the function must be written in the outermost layer.

from multiprocessing import Process
p = Process(target=someFunc, args=(1,2))
p.start()
#
# The original process continues to run
#
# Wait for p process done
p.join()

The difference between the Process method and the Pool is that after the Process is created, the original Process will continue to execute the program. Pool means that after the original process enters the pool, it will be divided into several processes according to the number of tasks.

Have fun~

留言

留言分享你的想法！

Samuel的沙龍

90會員

115內容數

除了翻譯各國新聞以外，會將過去演講的一些主題內容放上來。閒暇之餘，分享一些PM心得，歡迎參訪。

Samuel的沙龍的其他內容

2025/01/14

巴以停火協議達成：33名人質獲釋，和平曙光還是短暫喘息？

哈馬斯與以色列在卡達多哈達成停火協議，首階段將釋放33名人質，為期42天的停火協議將為下一階段全面和平談判鋪路。協議細節包括人質釋放、軍事部署、居民返家、囚犯交換等，但協議執行面臨潛在風險、國內反對聲音及第二階段談判挑戰。協議達成將為加沙地區帶來急需的人道援助，但能否真正實現和平仍有待觀察。

2025/01/14

巴以停火協議達成：33名人質獲釋，和平曙光還是短暫喘息？

2025/01/13

理性優越感的陷阱：如何辨別偽裝成教導的高傲行為

探討理性優越感如何偽裝成教導，並提出避免陷入高傲陷阱的方法，包含反思自身動機、多方提問等，以及如何分辨真誠教導與展現優越的差異。

2025/01/13

理性優越感的陷阱：如何辨別偽裝成教導的高傲行為

探討理性優越感如何偽裝成教導，並提出避免陷入高傲陷阱的方法，包含反思自身動機、多方提問等，以及如何分辨真誠教導與展現優越的差異。

2025/01/10

臺灣海底光纜遭損！疑似中國船隻涉案，揭露灰色地帶行動與網路安全風險

臺灣中華電信海底光纜受損事件引發國際關注，疑似中國船隻「順新39號」涉案，凸顯臺灣通訊基礎設施安全及灰色地帶行動的風險。臺灣正積極尋求應對措施，包括強化監控、部署低軌衛星、與國際合作等，以提升網路韌性，並防範未來潛在威脅。

2025/01/10

臺灣海底光纜遭損！疑似中國船隻涉案，揭露灰色地帶行動與網路安全風險

看更多

你可能也想看

方格子 vocus 官方沙龍

沙龍介面新登場！自訂你的創作空間，讓好內容被看見

沙龍一直是創作與交流的重要空間，這次 vocus 全面改版了沙龍介面，就是為了讓好內容被好好看見！你可以自由編排你的沙龍首頁版位，新版手機介面也讓每位訪客都能更快找到感興趣的內容、成為你的支持者。改版完成後可以在社群媒體分享新版面，並標記 @vocus.official⁠ ♥️ ⁠

#vocus#方格子#方格子沙龍

2025/06/12

方格子 vocus 官方沙龍

沙龍介面新登場！自訂你的創作空間，讓好內容被看見

#vocus#方格子#方格子沙龍

2025/06/12

阿千看世界

2025年綜合所得稅繳稅教學：線上申報、信用卡回饋、拆單攻略！

每年4月、5月都是最多稅要繳的月份，當然大部份的人都是有機會繳到「綜合所得稅」，只是相當相當多人還不知道，原來繳給政府的稅！可以透過一些有活動的銀行信用卡或電子支付來繳，從繳費中賺一點點小確幸！就是賺個１%~2%大家也是很開心的，因為你們把沒回饋變成有回饋，就是用卡的最高境界所得稅線上申報

#2025所得稅#綜合所得稅#繳稅有回饋

2025/05/03

阿千看世界

2025年綜合所得稅繳稅教學：線上申報、信用卡回饋、拆單攻略！

#2025所得稅#綜合所得稅#繳稅有回饋

2025/05/03

Ivan的沙龍

筆記：深入淺出-程式設計（四）

複習一下：我們學習了關於撰寫程式的相關觀念條件分支（if/else) : 藉由條件分支讓程式執行相對應的功能。迴圈（while loop ) ：程式利用迴圈反覆執行某個區塊的程式碼。字串處理（string) : 每個程式都在處理資料，而字串是一種非常重要且常用的資料。函式（fu

2023/10/30

2023/10/30

[Python教學] List 清單和 Tuple元組

List 清單和 Tuple元組清單在Python裡面非常的常用，大家一定要熟練這些基礎的元素。在Python中，列表（List）是一種常用的資料類型，用於儲存一組有序的元素。列表是可變的（Mutable），這意味著你可以在列表中新增、刪除和修改元素。列表使用方括號 []

#清單#教學#Master

2023/09/25

何誠殷的沙龍

[Python教學] List 清單和 Tuple元組

#清單#教學#Master

2023/09/25

冬霞的沙龍

【筆記】Python的資料型態：list、tuple(上)

list跟tuple 應用場景跟常用函式：append extend insert remove clear pop del

#list#tuple#Python

2023/09/17

冬霞的沙龍

【筆記】Python的資料型態：list、tuple(上)

list跟tuple 應用場景跟常用函式：append extend insert remove clear pop del

#list#tuple#Python

2023/09/17

Karen的沙龍

如何運用批次輸入、多處理技術加速特徵工程

How to utilize batch input and multi-processing techniques to accelerate feature engineering? 問題在進行特徵工程的過程中，我們通常需要處理各種各樣的數據，並轉換它們成有意義的特徵，以供後續的模型訓練

2023/08/14

2023/08/14

我們將會學習 Python 中的數據結構。主要的數據結構包括列表 (List)、元組 (Tuple)、字典 (Dictionary) 以及集合 (Set)。

2023/08/02

Hack_Z的沙龍

小白學Python的第四堂課

我們將會學習 Python 中的數據結構。主要的數據結構包括列表 (List)、元組 (Tuple)、字典 (Dictionary) 以及集合 (Set)。

2023/08/02

好奇的小仙人掌的沙龍

Python學習筆記3 - 列表的建立與存取以及常用方法

探索Python學習筆記中列表的建立、存取和常用方法。從使用中括號定義列表到了解索引、新增、刪除、修改等操作，並介紹append、remove、count等常用方法。

#學習#python#list

2023/06/20

好奇的小仙人掌的沙龍

Python學習筆記3 - 列表的建立與存取以及常用方法

#學習#python#list

2023/06/20

炯男孩的沙龍

從Python認識資料結構(一)．陣列

陣列是Python語言的最基礎也最容易實作的資料結構，主要可以透過兩種方式在Python上實踐陣列，其中一種是靜態結構 - 串列(List)，另一種則是動態結構 - 鏈結串列(Linked List)。我們會依序介紹這兩種作法如何在Python上執行陣列的相關功能，並比較兩種方法之間的差異。

2022/07/07

2022/07/07

不間斷 Python 挑戰 Day 15 - 更多關於字典 (Dictionary)

在前面的文章中我們學習了關於字典的基本用法，今天再討論更多關於字典的其它用法，以及它和串列、元組等的關聯。

#python#dictionary

2021/12/23

Wei-Jie Weng的沙龍

不間斷 Python 挑戰 Day 15 - 更多關於字典 (Dictionary)

在前面的文章中我們學習了關於字典的基本用法，今天再討論更多關於字典的其它用法，以及它和串列、元組等的關聯。

#python#dictionary

2021/12/23

Wei-Jie Weng的沙龍

不間斷 Python 挑戰 Day 14 - 元組 (Tuple)

在程式設計中，我們會使用到一些固定不會變動的資料內容，例如一年的月份、物體的邊長、過去一周的氣溫等等，使用串列的結構固然也可以用來儲存這些資料，但串列可以被新增或刪除，不能有效保護這類不可變動的資料。因此，Python也提供了另一種形式的資料結構，稱為元組，它的資料結構和串列相同，但資料的內容不可變

#python#tuple

2021/12/22