Kubernetes新手必看實務流程-Part3: 安裝後的基本任務與問題排除

2024/01/16閱讀時間約 26 分鐘

來到了基本建置的最後一個部分,在本文中,將會演示我在安裝完Kubernetes Cluster之後會做的幾個基本元件的安裝,當然不只是說明怎麼安裝,同時也將我在安裝時碰到的問題與解決方式分享給大家。

raw-image

雖然這些工具可能有些其他的替代工具,但以新手的角度來說,還是具有參考價值,本篇章節如下:

  1. Calico/Calicoctl
  2. Metric Server
  3. Dashboard UI
  4. 結論

1. Calico/Calicoctl

首先CNI是必要元件,有許多各種不同的CNI可以選擇,本文我們選擇了Calico做為CNI,安裝方式也有二種:Operator & Manifests,此處我使用了manifest來進行。

安裝前要注意各節點Firewall的需求(如下圖)

raw-image
#------------------------------------------------------
# S1-1. 使用manifest安裝 (master01) (LAB)
#------------------------------------------------------
[master]# wget https://docs.projectcalico.org/manifests/calico.yaml
[master]# vim calico.yaml (確認podnetwork)
# no effect. This should fall within `--cluster-cidr`.
- name: CALICO_IPV4POOL_CIDR
value: "192.168.0.0/16"

[master]# kubectl create -f calico.yaml
[master]# watch kubectl get pods -n kube-systems
raw-image
※ Question: calico-node not ready
# kubectl describe pod calico-node-ngznh -n kube-system
kubelet (combined from similar events): Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refusedLocate the problem :[master]# kubectl get ds -A
[master]# kubectl get ds -n kube-system calico-node -oyaml | grep -i ip_auto -C 3
[master]# kubectl exec -ti <pod_name> -n calico-system - bash
#/ cat /etc/calico/confd/config/bird.cfg
=> 確認router id是誰
=> 對應實體的interface(Calico的BGP會使用實體網卡來做virtual router)

※ Solution:
[method1] 確認firewall port
[method2] 強迫指定網卡
[master]# kubectl delete -f calico.yaml
[master]# vim calico.yaml
>> search
- name: IP value: "autodetect"
>> add
- name: IP_AUTODETECTION_METHOD
value: "interface=ens192" (多網卡用","分隔)

[master]# kubectl create -f calico.yaml
[master]# kubectl get pods -n kube-system

[Why]因為first-found會去找主機的第一張網卡,但預設是找"eth*",所以VM的網卡為ens就會失敗

(REF: https://www.unixcloudfusion.in/2022/02/solved-caliconode-is-not-ready-bird-is.html)
raw-image
raw-image
raw-image
#------------------------------------------------------
# S1-2. 安裝calicoctl(binary)(master01)
#------------------------------------------------------
[master]# wget https://github.com/projectcalico/calico/releases/download/v3.26.3/calicoctl-linux-amd64 -o calicoctl
[master]# chmod +x calicoctl ; cp calicoctl /usr/local/bin/
[master]# calicoctl version

Client Version: v3.26.3
Git commit: bdb7878af
Cluster Version: v3.26.1
Cluster Type: k8s,bgp,kubeadm,kdd
#------------------------------------------------------
# S1-3. calicoctl verify(master01)
#------------------------------------------------------
[master]# mkdir -p /data/calico
[master]# cd /data/calico ; vim calicoctl.cfg
apiVersion: projectcalico.org/v3
kind: CalicoAPIConfig
metadata:
spec:
datastoreType: 'kubernetes'
kubeconfig: '/etc/kubernetes/admin.conf'

[master]# calicoctl node status
[master]# calicoctl get ipPool --allow-version-mismatch
NAME CIDR SELECTOR
default-ipv4-ippool 192.168.0.0/16 all()
raw-image

※ node-to-node mesh: 因為規模不大,所以直接使用BGP模式的全節點互聯,3.4.0版本就已可支援到100個以上的節點。

#------------------------------------------------------
# S1-4. 測試
#------------------------------------------------------
[master]# vim nginx-quic.yaml
apiVersion: v1
kind: Namespace
metadata:
name: nginx-quic
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-quic-deployment
namespace: nginx-quic
spec:
selector:
matchLabels:
app: nginx-quic
replicas: 4
template:
metadata:
labels:
app: nginx-quic
spec:
containers:
- name: nginx-quic
image: tinychen777/nginx-quic:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: nginx-quic-service
namespace: nginx-quic
spec:
selector:
app: nginx-quic
ports:
- protocol: TCP
port: 8080 # match for service access port
targetPort: 80 # match for pod access port
nodePort: 30088 # match for external access port
type: NodePort

[master]# kubectl create -f nginx-quic.yaml
[master]# kubectl get deployment -o wide -n nginx-quic
[master]# kubectl get service -o wide -n nginx-quic
[master]# kubectl get pods -o wide -n nginx-quic

-----
※ 測試:
[master]# kubectl exec -it <pod> -- bash
# ping 192.168.50.66
64 bytes from 192.168.165.2: icmp_seq=1 ttl=62 time=0.652 ms
64 bytes from 192.168.165.2: icmp_seq=2 ttl=62 time=0.475 ms
64 bytes from 192.168.165.2: icmp_seq=3 ttl=62 time=0.465 ms
# curl 192.168.50.66:80
192.168.35.4:51610

[master]# curl worker02.test.example.poc:30088
[master]# curl worker03.test.example.poc:30088
[master]# curl worker01.test.example.poc:30088
raw-image
raw-image

2. Metric Server

Metric server是Cluster收集核心資料的聚合器,但Kubeadm預設是不會安裝的。

Metric server主要是提供給Dashboard等其他元件使用,依賴於API Aggregator,在安裝之前需要在kube-apiserver中開啟API Aggregator。

限制如下:

  • Metrics API只能查詢目前的數據,無法查詢歷史資料
  • 必須部署metrics-server才能使用API
  • API Server必須啟用Aggregator Routing支援,否則API server無法識別請求
  • API Server必須要能訪問Metric Server Pod IP,否則API Server無法訪問Metric server
#---------------------------------------------------
# S2-1. 檢查是否已開啟API Aggregator
#---------------------------------------------------
[master]# ps -ef | grep apiserver
=> 確認是否有"--enable-aggregator-routing=true"
#---------------------------------------------------
# S2-2. 修改kube-apiserver.yaml,開啟API Aggregator
# 修改後apiserver會自動重新建立生效 (all masters)
#---------------------------------------------------
[master]# vim /etc/kubernetes/manifests/kube-apiserver.yaml
...
spec:
containers:
- command:
...
- --enable-aggregator-routing=true

[master]# ps -ef | grep apiserver
raw-image
raw-image
#---------------------------------------------------
# S2-3. 確認是否有裝metric server (master)
#---------------------------------------------------
[master]# kubectl top node
error: Metrics API not available
#---------------------------------------------------
# S2-3. 部署metric server (HA) (master)
#---------------------------------------------------
[master]# wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/high-availability-1.21+.yaml
[master]# kubectl create -f high-availability-1.21+.yaml
raw-image
#---------------------------------------------------
# S2-4. 確認Service
#---------------------------------------------------
[master]# kubectl get svc --all-namespaces | grep metrics-server
kube-system metrics-server ClusterIP 10.104.138.184 <none> 443/TCP 53s
#---------------------------------------------------
# S2-5. 確認API server可不可以連到metric server
#---------------------------------------------------
[master]# kubectl describe svc metrics-server -n kube-system
[master]# ping <Endpoint>
[Question]

Pod not running. describe pod shown "Readiness probe failed: HTTP probe failed with statuscode: 500".
inspect the pod log, it showed "x509: cannot validate certificate for 10.107.88.16 because it doesn't contain any IP SANs" node="worker02.test.example.poc""

[Debug]
(1) [master]# vim high-availability-1.21+.yaml
=> 確認readiness段,使用"httpGet"讓kubelet定時發送http請求到/readyz來確認狀態
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: https
scheme: HTTPS
initialDelaySeconds: 20 << 啟動容器後第一次檢查等待時間20s
periodSeconds: 10 << 每個10s檢查一次

(2) 針對"x509" 官方安裝文件說明:
Kubelet certificate needs to be signed by cluster Certificate Authority (or disable certificate validation by passing --kubelet-insecure-tls to Metrics Server)


[Workround]
測試環境可用"--kubelet-insecure-tls"參數
[master]# vim high-availability-1.21+.yaml
[master]# kubectl create -f high-availability
raw-image
raw-image
raw-image
[Question]

Failed to scrape node" err="Get \"https://10.107.88.17:10250/metrics/resource\": dial tcp 10.107.88.17:10250: connect: no route to host" node="worker03.test.example.poc"
=> check firewall

[Question]`kubectl get apiservice`得到FailedDiscoveryCheck
=> 因為Calico與metrics server有時候連接不是很穩定,將kubernetes服務重啟即可(或重開機)
#---------------------------------------------------
# S2-6. 確認是否Metrics server正確部署
#---------------------------------------------------
[master]# kubectl get apiservice
[master]# kubectl top nodes
raw-image
raw-image

3. Dashboard UI

#-----------------------------------------------------------
# S3-1. 下載檔案後,進行部署
#-----------------------------------------------------------
[master]# wget https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml
=> 修改成NodePort
[master]# kubectl get pod -n kubernetes-dashboard -o wide
[master]# kubectl get svc -n kubernetes-dashboard
raw-image
raw-image
#-----------------------------------------------------------
# S3-2. 登入UI
#-----------------------------------------------------------
https://<node_ip>:31000
raw-image

登入方式:

  • Token => 每建一個sa就會產生對應的secrets
[master]# kubectl get secrets -n kubernetes-dashboard
[master]# kubectl get sa -n kubernetes-dashboard
raw-image
#------------------------------------------------------------------
# 建立serviceaccount
#------------------------------------------------------------------
[master]# vim dashboard-admin.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-user
namespace: kubernetes-dashboard

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: admin-user
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: admin-user
namespace: kubernetes-dashboard

[master]# kubectl create -f dashboard-admin.yaml
#-----------------------------------------------------------------
# Create secret & 取得 token
#----------------------------------------------------------------
[master]# vim dashboard-admin-secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: admin-user
namespace: kubernetes-dashboard
annotations:
kubernetes.io/service-account.name: "admin-user"
type: kubernetes.io/service-account-token

[master]# kubectl create -f dashboard-admin-secret.yaml
[master]# kubectl get secrets
NAME TYPE DATA AGE
admin-user kubernetes.io/service-account-token 3 4s
kubernetes-dashboard-certs Opaque 0 23m
kubernetes-dashboard-csrf Opaque 1 23m
kubernetes-dashboard-key-holder Opaque 2 23m

[master]# kubectl get secret admin-user -n kubernetes-dashboard -o jsonpath={".data.token"} | base64 -d
=> 將token貼上UI,進行登入
raw-image

4. 結論

以上就完成了三項我在初始部署完Cluster之後,我會基本接著做的幾件事,當然後續跟據需求還可以再加入幾個工具會更好用。

以下列出我自已還會再加入的幾項:

  • MetalLB
  • Nginx ingress controller
  • kubens
  • NFS storageClass
  • 整合LDAP
  • Grafana Loki
  • Prometheus

好了,對於kubernets的新手來說,做完這三篇文章的內容,基本上你就已經有一個可以運行的kubernetes cluster可供使用了。

感謝您的觀看,我們下篇再見。






















8會員
39內容數
記錄IT社畜的自我學習筆記,如同專題名稱,主要是怕自已忘記自已做過什麼、學到什麼。索性就分享我自已在學習Kubernetes這條路上的各種測試、學習心得。
留言0
查看全部
發表第一個留言支持創作者!