2023-10-10|閱讀時間 ‧ 約 2 分鐘

Python 爬蟲 台股ETF 前10大成分股(包含股票代號,權重)

raw-image


import pandas as pd
import requests
from bs4 import BeautifulSoup
import re
from json import loads
#stock_code = "0056"
stock_code = input("stock code:")
soup = BeautifulSoup(requests.get("tw.stock.yahoo.com/quote{}.TW/holding".format(stock_code)).content)
script = soup.find("script",string=re.compile("root.App.main")).text
data = re.search("root.App.main\s+=\s+(\{.*\})", script).group(1)
print(data)

result = re.findall("\[(.*?)\]",data,re.I|re.M)
dict_data = ""
for item in result:
if("ticker" in item and "weighting" in item):
dict_data = item
print(dict_data)

dict_data_mod ='{"holdingDetail":['+ dict_data + ']}'
print(dict_data_mod)
jsonData = dict_data_mod;
text = loads(jsonData)
print(text['holdingDetail'][0]['name'])

new = pd.DataFrame.from_dict(text['holdingDetail'])
print("{} top 10 constituent stock".format(stock_code))
new

分享至
成為作者繼續創作的動力吧!
從 Google News 追蹤更多 vocus 的最新精選內容從 Google News 追蹤更多 vocus 的最新精選內容

發表回應

成為會員 後即可發表留言
© 2024 vocus All rights reserved.