As an Engineer — Optimize Python performance

閱讀時間約 12 分鐘
When I was doing the development of the company’s data dashboard, I often pulled tables from various databases for calculation. API response time decreases when the pull time range is wide or when there are multiple tables. This post documents what I’m trying to optimize the process and other tricks that can improve program performance.

1. Change data storage structure

In addition to data types such as str and int, common Python data storage structures include list, tuple, and dict.
List should be a data structure that anyone who has learned any programming languages will be familiar with. A list is an array. Assume that there are a lot of data stored in the list today, and I need to retrieve specific data, and I have to search all over, as follows.
for i in someList:
print(i)
Time complexity is O(n).
If your data storage today is two-dimensional, such as user data in different countries. The first thing you need to do is to find the country, and then find the data you want, as follows:
for i in someList:
if i == someCountry:
for j in someList[i]:
print(j)
This time yime complexity up to O(n²).
Can tuples vs lists improve performance? As far as I know, there is no clear indication that tuples reduce time complexity. Because the principle of tuple is list, but tuple is a list whose length cannot be changed. When he first creates it, he requests a fixed length from memory. In addition to needing space for initialization and memory, List also needs a temporary storage area for future expansion of data. Therefore, in a tuple of the same data size, the required storage space is smaller than that of a list, and the space complexity is also lower.
Next, I will introduce a powerful function of python, which is dict. dict is short for dictionary. Since it is called a dictionary, there is a relationship between key and value. That is, the value is stored in a unique key. So if you want to query a specific information today, it is as follows:
print(someDict["yourKey"])
Time complexity is O(1).
Even with the example above, the data storage by country is as follows:
print(someDict["someCountry"]["yourKey"])
Time complexity still is O(1).
So in terms of data storage, I prefer Dict > Tuple > List

2. Use Redis

Redis — Powerful caching database. If your data doesn’t need to be updated immediately. (Not the Monitor). Redis is used in many places. When an API pulls a Table from DB to use, other APIs may also need it. At this time, if you have to re-enter SQL or NoSQL commands, the performance will be wasted. If we pull out the data of this table and temporarily store it in Redis, it will take about ten minutes. During these ten minutes, if your other API needs to use the same data, there is no need to re-download the SQL and DB request data. Requesting data from Redis is much faster.
Alternatively, return from your API can also be temporarily stored in Redis. Sometimes the user may refresh the page and resend the request. If these requests need to be recalculated in the background and re-request data from DB. This is very wasteful of performance. If you have saved the returned data in Redis at this time, you can return the result directly without any calculation.

3. Asyn functions for FastAPI

In the FastAPI framework, there is a very novel function, which is Async. Basically equivalent to Node.js’ async. Async allows you to wait for response times, temporarily store state, release thread, and handle other needs.
async def someFunc():
await something
Remember that await can only be used in async functions. I’ve tested it myself so far. If your function does not need the await object, it is recommended to remove async. No need to wait for processing to remove async functions for better performance.

4. Use multiprocessing

The literal meaning of Multiprocessing is easy to understand, that is, multiple process handle different tasks separately. It has been discussed on the Internet that multiprocessing is more efficient than multithreading in python. So far, I haven’t tested it myself, so I can’t draw conclusions. This article first introduces the two multiprocessing methods I currently use.
from multiprocessing import Pool
with Pool(processes=2) as p:
print(p.map(someFunc, [1, 2])
Pool can call some functions in parallel through the map function. Remember that the function must be written in the outermost layer.
from multiprocessing import Process
p = Process(target=someFunc, args=(1,2))
p.start()
#
# The original process continues to run
#
# Wait for p process done
p.join()
The difference between the Process method and the Pool is that after the Process is created, the original Process will continue to execute the program. Pool means that after the original process enters the pool, it will be divided into several processes according to the number of tasks.
Have fun~
為什麼會看到廣告
avatar-img
86會員
108內容數
除了翻譯各國新聞以外,會將過去演講的一些主題內容放上來。閒暇之餘,分享一些PM心得,歡迎參訪。
留言0
查看全部
avatar-img
發表第一個留言支持創作者!
Samuel的沙龍 的其他內容
Often, respondents are willing to be interviewed out of product interest or a meager benefit.
Aptos is a blockchain built with the move language, which also attracted me to learn the move programming language.
Today I mainly talk about user interviews, and maybe I will talk about interview techniques that persuade people to use them in the future.
A lambda function is a small anonymous function and can only have one expression.
Often, respondents are willing to be interviewed out of product interest or a meager benefit.
Aptos is a blockchain built with the move language, which also attracted me to learn the move programming language.
Today I mainly talk about user interviews, and maybe I will talk about interview techniques that persuade people to use them in the future.
A lambda function is a small anonymous function and can only have one expression.
你可能也想看
Google News 追蹤
Thumbnail
從範例學python的目標讀者: 針對剛進入的初學者,想學習Python語言。 有基礎本數學邏輯基礎即可。 從小遊戲學python的目標讀者: 針對已經有經驗的C/C++, Python, 或其他有程式基礎的讀者。 想實作一些小專案,從實做中學習如何分析需求、元件分拆、到底層實作
上兩篇有關List的文章,此篇文上兩章的延續,整理一些常用的方法和操作。 [Python]List(列表)新增、修改、刪除元素 [Python基礎]容器 list(列表),tuple(元組) 還有一些常用的 list 方法和操作,讓你能更靈活地處理列表數據
Thumbnail
在 Python 中,tuple 與 List有一個關鍵的不同點:tuple 是不可變的,這意味著一旦創建了 tuple,就無法更改其內容。 這與 List的可變性形成了對比,list 可以新增、刪除或修改元素。 元素的意思: 元素:指的是 List 中的每一個獨立的項目或值。
Thumbnail
Array可以說是各種語言除了基本型別之外,最常用的資料型別與容器之一了。 Array 這種連續格子狀的資料結構,在Python要怎麼表達呢? 建立一個空的陣列 最簡單也最直接的寫法就是 array = [] # Python list [] 就對應到大家熟知的array 陣列型態的資料結
Thumbnail
我們在「【🔒 Python API框架篇 - FastAPI】Ep.1 啟航」有分享 FastAPI 這套API框架, 那麼當我們想要在應用程式剛執行時就註冊一些事件或者共享GPU運算模型、變數…等,當整個應用程式關閉時也進行釋放作業, 這樣的一個週期循環就是所謂的生命週期, 而在FastAPI這
Thumbnail
當我們在開發一個AI應用服務時, 常常會需要載入大模型, But… 我們總不可能每一次的請求就載入一次模型吧! 這樣太沒有效率了, 也非常的浪費資源, 因此我們通常會希望應用程式啟動時就能夠載入模型, 之後每一次的請求只要讓模型進行運算即可, 那麼在FastAPI的框架中究竟要如何使用呢? 首
Thumbnail
在Python中,queue是一個非常有用的模块。 它提供了多種佇列(queue)實現,用於在多線程環境中安全地交換信息或者數據。 佇列(queue)是一種先進先出(FIFO)的數據結構,允許在佇列的一端插入元素,另一端取出元素。(FIFO 是First In, First Out 的縮寫)
Thumbnail
當你需要在 Python 中執行多個任務,但又不希望它們相互阻塞時,可以使用 threading 模組。 threading 模組允許你在單個程序中創建多個執行緒,這些執行緒可以同時運行,從而實現並行執行多個任務的效果。
Thumbnail
列表(List)和元組(Tuple)都是 Python 中用來存儲集合元素的數據結構,兩者看起來很像,在初學時很容易搞混,所以觀念要建立好。 可以把列表(List)和元組(Tuple)想像成是一個容器,什麼元素都可以塞
Thumbnail
關於多執行緒/多行程的使用方式 在Python 3.2版本之後加入了「concurrent.futures」啟動平行任務, 它可以更好的讓我們管理多執行緒/多行程的應用場景,讓我們在面對這種併發問題時可以不必害怕, 用一個非常簡單的方式就能夠處裡, 底下我們將為您展示一段程式碼: imp
Thumbnail
從範例學python的目標讀者: 針對剛進入的初學者,想學習Python語言。 有基礎本數學邏輯基礎即可。 從小遊戲學python的目標讀者: 針對已經有經驗的C/C++, Python, 或其他有程式基礎的讀者。 想實作一些小專案,從實做中學習如何分析需求、元件分拆、到底層實作
上兩篇有關List的文章,此篇文上兩章的延續,整理一些常用的方法和操作。 [Python]List(列表)新增、修改、刪除元素 [Python基礎]容器 list(列表),tuple(元組) 還有一些常用的 list 方法和操作,讓你能更靈活地處理列表數據
Thumbnail
在 Python 中,tuple 與 List有一個關鍵的不同點:tuple 是不可變的,這意味著一旦創建了 tuple,就無法更改其內容。 這與 List的可變性形成了對比,list 可以新增、刪除或修改元素。 元素的意思: 元素:指的是 List 中的每一個獨立的項目或值。
Thumbnail
Array可以說是各種語言除了基本型別之外,最常用的資料型別與容器之一了。 Array 這種連續格子狀的資料結構,在Python要怎麼表達呢? 建立一個空的陣列 最簡單也最直接的寫法就是 array = [] # Python list [] 就對應到大家熟知的array 陣列型態的資料結
Thumbnail
我們在「【🔒 Python API框架篇 - FastAPI】Ep.1 啟航」有分享 FastAPI 這套API框架, 那麼當我們想要在應用程式剛執行時就註冊一些事件或者共享GPU運算模型、變數…等,當整個應用程式關閉時也進行釋放作業, 這樣的一個週期循環就是所謂的生命週期, 而在FastAPI這
Thumbnail
當我們在開發一個AI應用服務時, 常常會需要載入大模型, But… 我們總不可能每一次的請求就載入一次模型吧! 這樣太沒有效率了, 也非常的浪費資源, 因此我們通常會希望應用程式啟動時就能夠載入模型, 之後每一次的請求只要讓模型進行運算即可, 那麼在FastAPI的框架中究竟要如何使用呢? 首
Thumbnail
在Python中,queue是一個非常有用的模块。 它提供了多種佇列(queue)實現,用於在多線程環境中安全地交換信息或者數據。 佇列(queue)是一種先進先出(FIFO)的數據結構,允許在佇列的一端插入元素,另一端取出元素。(FIFO 是First In, First Out 的縮寫)
Thumbnail
當你需要在 Python 中執行多個任務,但又不希望它們相互阻塞時,可以使用 threading 模組。 threading 模組允許你在單個程序中創建多個執行緒,這些執行緒可以同時運行,從而實現並行執行多個任務的效果。
Thumbnail
列表(List)和元組(Tuple)都是 Python 中用來存儲集合元素的數據結構,兩者看起來很像,在初學時很容易搞混,所以觀念要建立好。 可以把列表(List)和元組(Tuple)想像成是一個容器,什麼元素都可以塞
Thumbnail
關於多執行緒/多行程的使用方式 在Python 3.2版本之後加入了「concurrent.futures」啟動平行任務, 它可以更好的讓我們管理多執行緒/多行程的應用場景,讓我們在面對這種併發問題時可以不必害怕, 用一個非常簡單的方式就能夠處裡, 底下我們將為您展示一段程式碼: imp