As an Engineer — Optimize Python performance

2023/04/05 更新2023/04/05 發佈閱讀 13 分鐘

When I was doing the development of the company’s data dashboard, I often pulled tables from various databases for calculation. API response time decreases when the pull time range is wide or when there are multiple tables. This post documents what I’m trying to optimize the process and other tricks that can improve program performance.

1. Change data storage structure

In addition to data types such as str and int, common Python data storage structures include list, tuple, and dict.

List should be a data structure that anyone who has learned any programming languages will be familiar with. A list is an array. Assume that there are a lot of data stored in the list today, and I need to retrieve specific data, and I have to search all over, as follows.

for i in someList:
  print(i)

Time complexity is O(n).

If your data storage today is two-dimensional, such as user data in different countries. The first thing you need to do is to find the country, and then find the data you want, as follows:

for i in someList:
  if i == someCountry:
    for j in someList[i]:
      print(j)

This time yime complexity up to O(n²).

Can tuples vs lists improve performance? As far as I know, there is no clear indication that tuples reduce time complexity. Because the principle of tuple is list, but tuple is a list whose length cannot be changed. When he first creates it, he requests a fixed length from memory. In addition to needing space for initialization and memory, List also needs a temporary storage area for future expansion of data. Therefore, in a tuple of the same data size, the required storage space is smaller than that of a list, and the space complexity is also lower.

Next, I will introduce a powerful function of python, which is dict. dict is short for dictionary. Since it is called a dictionary, there is a relationship between key and value. That is, the value is stored in a unique key. So if you want to query a specific information today, it is as follows:

print(someDict["yourKey"])

Time complexity is O(1).

Even with the example above, the data storage by country is as follows:

print(someDict["someCountry"]["yourKey"])

Time complexity still is O(1).

So in terms of data storage, I prefer Dict > Tuple > List

2. Use Redis

Redis — Powerful caching database. If your data doesn’t need to be updated immediately. (Not the Monitor). Redis is used in many places. When an API pulls a Table from DB to use, other APIs may also need it. At this time, if you have to re-enter SQL or NoSQL commands, the performance will be wasted. If we pull out the data of this table and temporarily store it in Redis, it will take about ten minutes. During these ten minutes, if your other API needs to use the same data, there is no need to re-download the SQL and DB request data. Requesting data from Redis is much faster.

Alternatively, return from your API can also be temporarily stored in Redis. Sometimes the user may refresh the page and resend the request. If these requests need to be recalculated in the background and re-request data from DB. This is very wasteful of performance. If you have saved the returned data in Redis at this time, you can return the result directly without any calculation.

3. Asyn functions for FastAPI

In the FastAPI framework, there is a very novel function, which is Async. Basically equivalent to Node.js’ async. Async allows you to wait for response times, temporarily store state, release thread, and handle other needs.

async def someFunc():
  await something

Remember that await can only be used in async functions. I’ve tested it myself so far. If your function does not need the await object, it is recommended to remove async. No need to wait for processing to remove async functions for better performance.

4. Use multiprocessing

The literal meaning of Multiprocessing is easy to understand, that is, multiple process handle different tasks separately. It has been discussed on the Internet that multiprocessing is more efficient than multithreading in python. So far, I haven’t tested it myself, so I can’t draw conclusions. This article first introduces the two multiprocessing methods I currently use.

from multiprocessing import Pool
with Pool(processes=2) as p:
  print(p.map(someFunc, [1, 2])

Pool can call some functions in parallel through the map function. Remember that the function must be written in the outermost layer.

from multiprocessing import Process
p = Process(target=someFunc, args=(1,2))
p.start()
#
# The original process continues to run
#
# Wait for p process done
p.join()

The difference between the Process method and the Pool is that after the Process is created, the original Process will continue to execute the program. Pool means that after the original process enters the pool, it will be divided into several processes according to the number of tasks.

Have fun~

留言

留言分享你的想法！

Samuel的沙龍

93會員

130內容數

除了翻譯各國新聞以外，會將過去演講的一些主題內容放上來。閒暇之餘，分享一些PM心得，歡迎參訪。

Samuel的沙龍的其他內容

2025/01/14

巴以停火協議達成：33名人質獲釋，和平曙光還是短暫喘息？

哈馬斯與以色列在卡達多哈達成停火協議，首階段將釋放33名人質，為期42天的停火協議將為下一階段全面和平談判鋪路。協議細節包括人質釋放、軍事部署、居民返家、囚犯交換等，但協議執行面臨潛在風險、國內反對聲音及第二階段談判挑戰。協議達成將為加沙地區帶來急需的人道援助，但能否真正實現和平仍有待觀察。

2025/01/14

巴以停火協議達成：33名人質獲釋，和平曙光還是短暫喘息？

2025/01/13

理性優越感的陷阱：如何辨別偽裝成教導的高傲行為

探討理性優越感如何偽裝成教導，並提出避免陷入高傲陷阱的方法，包含反思自身動機、多方提問等，以及如何分辨真誠教導與展現優越的差異。

2025/01/13

理性優越感的陷阱：如何辨別偽裝成教導的高傲行為

探討理性優越感如何偽裝成教導，並提出避免陷入高傲陷阱的方法，包含反思自身動機、多方提問等，以及如何分辨真誠教導與展現優越的差異。

2025/01/10

臺灣海底光纜遭損！疑似中國船隻涉案，揭露灰色地帶行動與網路安全風險

臺灣中華電信海底光纜受損事件引發國際關注，疑似中國船隻「順新39號」涉案，凸顯臺灣通訊基礎設施安全及灰色地帶行動的風險。臺灣正積極尋求應對措施，包括強化監控、部署低軌衛星、與國際合作等，以提升網路韌性，並防範未來潛在威脅。

2025/01/10

臺灣海底光纜遭損！疑似中國船隻涉案，揭露灰色地帶行動與網路安全風險

看更多

你可能也想看

Chloe小窩

手作人必看｜用蝦皮分潤計畫把興趣變新收入渠道

在小小的租屋房間裡，透過蝦皮購物平臺採購各種黏土、模型、美甲材料等創作素材，打造專屬黏土小宇宙的療癒過程。文中分享多個蝦皮挖寶地圖，並推薦蝦皮分潤計畫。

#手作#黏土手作#輕黏土

2025/09/09

Chloe小窩

手作人必看｜用蝦皮分潤計畫把興趣變新收入渠道

#手作#黏土手作#輕黏土

2025/09/09

小蝸慢慢爬

蝦皮分潤計畫-小豬與小蝸的婚姻神隊友

小蝸和小豬因購物習慣不同常起衝突，直到發現蝦皮分潤計畫，讓小豬的購物愛好產生價值，也讓小蝸開始欣賞另一半的興趣。想增加收入或改善伴侶間的購物觀念差異？讓蝦皮分潤計畫成為你們的神隊友吧！

2025/09/09

2025/09/09

複習一下：我們學習了關於撰寫程式的相關觀念條件分支（if/else) : 藉由條件分支讓程式執行相對應的功能。迴圈（while loop ) ：程式利用迴圈反覆執行某個區塊的程式碼。字串處理（string) : 每個程式都在處理資料，而字串是一種非常重要且常用的資料。函式（fu

2023/10/30

2023/10/30

[Python教學] List 清單和 Tuple元組

List 清單和 Tuple元組清單在Python裡面非常的常用，大家一定要熟練這些基礎的元素。在Python中，列表（List）是一種常用的資料類型，用於儲存一組有序的元素。列表是可變的（Mutable），這意味著你可以在列表中新增、刪除和修改元素。列表使用方括號 []

#清單#教學#Master

2023/09/25

何誠殷的沙龍

[Python教學] List 清單和 Tuple元組

#清單#教學#Master

2023/09/25

冬霞的沙龍

【筆記】Python的資料型態：list、tuple(上)

list跟tuple 應用場景跟常用函式：append extend insert remove clear pop del

#list#tuple#Python

2023/09/17

冬霞的沙龍

【筆記】Python的資料型態：list、tuple(上)

list跟tuple 應用場景跟常用函式：append extend insert remove clear pop del

#list#tuple#Python

2023/09/17

Karen的沙龍

如何運用批次輸入、多處理技術加速特徵工程

How to utilize batch input and multi-processing techniques to accelerate feature engineering? 問題在進行特徵工程的過程中，我們通常需要處理各種各樣的數據，並轉換它們成有意義的特徵，以供後續的模型訓練

2023/08/14

2023/08/14

我們將會學習 Python 中的數據結構。主要的數據結構包括列表 (List)、元組 (Tuple)、字典 (Dictionary) 以及集合 (Set)。

2023/08/02

Hack_Z的沙龍

小白學Python的第四堂課

我們將會學習 Python 中的數據結構。主要的數據結構包括列表 (List)、元組 (Tuple)、字典 (Dictionary) 以及集合 (Set)。

2023/08/02

好奇的小仙人掌的沙龍

Python學習筆記3 - 列表的建立與存取以及常用方法

探索Python學習筆記中列表的建立、存取和常用方法。從使用中括號定義列表到了解索引、新增、刪除、修改等操作，並介紹append、remove、count等常用方法。

#學習#python#list

2023/06/20

好奇的小仙人掌的沙龍

Python學習筆記3 - 列表的建立與存取以及常用方法

#學習#python#list

2023/06/20

炯男孩的沙龍

從Python認識資料結構(一)．陣列

陣列是Python語言的最基礎也最容易實作的資料結構，主要可以透過兩種方式在Python上實踐陣列，其中一種是靜態結構 - 串列(List)，另一種則是動態結構 - 鏈結串列(Linked List)。我們會依序介紹這兩種作法如何在Python上執行陣列的相關功能，並比較兩種方法之間的差異。

2022/07/07

2022/07/07

不間斷 Python 挑戰 Day 15 - 更多關於字典 (Dictionary)

在前面的文章中我們學習了關於字典的基本用法，今天再討論更多關於字典的其它用法，以及它和串列、元組等的關聯。

#python#dictionary

2021/12/23

Wei-Jie Weng的沙龍

不間斷 Python 挑戰 Day 15 - 更多關於字典 (Dictionary)

在前面的文章中我們學習了關於字典的基本用法，今天再討論更多關於字典的其它用法，以及它和串列、元組等的關聯。

#python#dictionary

2021/12/23

Wei-Jie Weng的沙龍

不間斷 Python 挑戰 Day 14 - 元組 (Tuple)

在程式設計中，我們會使用到一些固定不會變動的資料內容，例如一年的月份、物體的邊長、過去一周的氣溫等等，使用串列的結構固然也可以用來儲存這些資料，但串列可以被新增或刪除，不能有效保護這類不可變動的資料。因此，Python也提供了另一種形式的資料結構，稱為元組，它的資料結構和串列相同，但資料的內容不可變

#python#tuple

2021/12/22