題目給定一個輸入陣列,陣列裡面的數字自分別代表每篇論文的被引用數。
要求我們計算h-index。
h-index的定義:
找一個最大的h值,使得有h篇論文,個別論文的被引用數都 大於等於 h
Example 1:
Input: citations = [3,0,6,1,5]
Output: 3
Explanation: [3,0,6,1,5] means the researcher has 5 papers in total and each of them had received 3, 0, 6, 1, 5 citations respectively.
Since the researcher has 3 papers with at least 3 citations each and the remaining two with no more than 3 citations each, their h-index is 3.
有三篇論文,引用數都大於等於3
分別是[3, 6, 5] 這三篇
Example 2:
Input: citations = [1,3,1]
Output: 1
有一篇論文,引用數都大於等於1
任選一篇都 >= 1
n == citations.length
1 <= n <= 5000
0 <= citations[i] <= 1000
除了傳統的O( n log n) 的排序演算法的解法以外。
還有O(n) Counting-based的計數排序法的解法。
把被引用數視為一個桶子,每當掃描過一篇論文,就把那篇論文對應到的被引用數累加一。
最後從高引用數掃描到低引用數,累積論文數量,如果當下累積論文數量已經大於等於被引用數,符合定義,則該引用數就是我們的h指標,也就是h index。
註:
實作上有一個空間優化的技巧,因為h index最大值 = 總論文篇數 = n,所以桶子的上邊界設在 總論文篇數 n 即可。
class Solution:
def hIndex(self, citations: List[int]) -> int:
n = len(citations)
# key: citation
# value: counting of papaers with specific citations
counting_of = [ 0 for _ in range(n+1) ]
# Update counting_of table
for citation in citations:
if citation >= n:
# H index is up to n at most
counting_of[n] += 1
else:
counting_of[citation] += 1
# Compute h index
# The given researcher has published at least h papers that have each been cited at least h times.
couting_of_at_least_h = 0
for citation in range(n,-1,-1):
couting_of_at_least_h += counting_of[citation]
if couting_of_at_least_h >= citation:
return citation
return 0
時間複雜度:
O( n ) 每個論文掃描一次,引用數也掃描一次,皆為O(n)
空間複雜度:
O(n) h index 最大值頂多和論文總篇數相等,所以counting_of 佔用空間為O(n)
由定義出發,想到被引用數可以和Counting sort演算法模板結合。從高引用數掃描到低引用數,累積論文數量,如果當下累積論文數量已經大於等於被引用數,符合定義,則該引用數就是我們的h指標,也就是h index。
Reference:
[1] Python sol by sorting. [w/ Comment] - H-Index - LeetCode