R 套件筆記 | 在有條件的情況下使用 lpSolve 對樣品進行分組

更新於 2024/09/02發佈於 2023/09/03閱讀時間約 11 分鐘

最近遇到了一個思考有點久的問題，如何將數值不同的樣品進行分組，同時每組的總和不能超越固定的上限值，且某些樣品不能被分在同一組。雖然規則並不困難，但是人工進行分組卻頗花時間，是不是有可能自動化處理這個步驟？如果有可能的話想要產出多個不同的分組方法，最後由人工選擇最佳的分組方式。

沒有處理過有條件的分組的我不知如何開始進行，一開始有思考過用硬幹的方式，將樣品依照數值進行排序，從最大的樣品開始丟入組別，如果放不下就創一個新組別丟入，這樣的方式只會得到一組解，且可能發生最後一個組別總和特別少的情形，形成一種浪費。更糟糕的是加上各種條件之後容易有邏輯混亂的的問題，寫一堆 if else 讓人昏頭，一想到就覺得很可怕。

lpSolve 套件

用線性規劃 (Linear Programming, LP) 的方式思考這個問題會簡單許多，線性規劃在目標函數 (objective) 和限制條件 (constraint) 皆為線性下進行最佳化求解。 R 套件 lpSolve 便是用於解決線性規劃和整數線性規劃的問題，以下是最簡單線性組合求解問題：

目標函數，求其最大解
x + 9 y+ z
限制條件
x + 2y + 3z ≤ 9
3x + 2y + 2z ≤ 15

#install.packages("lpSolve")
library(lpSolve)

#目標函數
f.obj <- c(1, 9, 1)
#限制條件矩陣
f.con <- matrix (c(1, 2, 3, 3, 2, 2), nrow=2, byrow=TRUE)
f.dir <- c("<=", "<=")
f.rhs <- c(9, 15)

#限制條件下求目標函數最大解
lp ("max", f.obj, f.con, f.dir, f.rhs)
## Not run: Success: the objective function is 40.5
lp ("max", f.obj, f.con, f.dir, f.rhs)$solution
## Not run: [1] 0.0 4.5 0.0

限制條件矩陣可以選擇用 "const.mat" 或是 "dense.const" 參數輸入，使用 "const.mat" 較為直觀，但我認為 "dense.const" 在之後多條件式輸入上會比較方便一點，尤其是參數較多的時候比較不會亂掉。

# The same problem using the dense constraint approach:
#
f.con.d <- matrix (c(rep (1:2,each=3), rep (1:3, 2), t(f.con)), ncol=3)
lp ("max", f.obj, , f.dir, f.rhs, dense.const=f.con.d)

const.mat 和 dense.const 的差異

一般較常看到使用線性規劃解決生產資源分配已達到最佳化等問題，一開始沒有想到可以用這個套件解決我的問題。不過在看到 lp 函數範例中，處理 8 皇后問題給了我一些靈感。

八皇后問題

八皇后一個經典的問題，在 8×8 的西洋棋棋盤上放置 8 個皇后，使得任何一個皇后都無法直接吃掉其他的皇后。也就是說，任兩個皇后都不能處於同一條行、列或斜線上。此問題扣除旋轉和對稱之後，有 12 個獨立的解法。

在 lp 函式解法，會將棋盤上每個位置進行編號 1~64，再輸入已經寫好的限制條件，共 42 個條件包含：每個行或列都會有 1 個皇后，斜線會有小於 1 個皇后，再限制解為 binary，並設定輸出多個解（只有 binary 可以設定多個解）。

chess.obj <- rep (1, 64)
#載入已寫好的限制條件 dense.const 矩陣
q8 <- make.q8 ()
chess.dir <- rep (c("=", "<"), c(16, 26))
chess.rhs <- rep (1, 42)
#輸出 3 個解
chess = lp ('max', chess.obj, , chess.dir, chess.rhs, dense.const = q8, 
    all.bin=TRUE, num.bin.solns=3)
#拆開解(solution 最後會有一個多餘的 -1)
res <- split(chess$solution[1:(3*64)], rep(1:3, each=64))
# 3 個矩陣解
matrix(res$`1`,8)
matrix(res$`2`,8)
matrix(res$`3`,8)

八皇后問題限制條件

限制條件下分配組別

回到分組的問題，如果把分組的結果當成棋盤，就可以用相同的邏輯進行。

首先計算所有樣品的總和，判斷分出的組數，決定矩陣的大小，之後就可以加入所有限制條件，這時候用 dense.const 輸入會比較容易一些，不同的類型的條件式各別產出之後再合併輸入 lp 函式。

分組限制條件

library(lpSolve)

#每個樣品的量
weights <- c(100, 150, 20, 50, 80, 120, 10, 240, 85, 90)
#樣品種類，同種類不可同一組
type = c("A","B","C","D","E","A","B","F","G","H")
samples = length(weights)
#每個組總和上限
weight.max <- 300

#計算樣品總量 -> 組別數
groups=ceiling(sum(weights)/weight.max)

##矩陣大小
obj=rep (1, samples*groups)
length.obj=length(obj)

####限制條件####

#每組上限
group.const=matrix(c(rep(c(1:groups),each=samples),
                    c(1:length.obj),
                    rep(weights,groups)), ncol=3)
group.dir=rep("<=",groups)
group.rhs=rep(weight.max,groups)

#不可重複分配
#從第一行位置推算所有行位置
uni.const.i=seq(1,1+(groups-1)*samples,samples)
uni.const.2=c()
for(i in 1:samples){
  uni.const.2=c(uni.const.2,uni.const.i+(i-1))
}
uni.const=matrix(c(rep(c((groups+1):(groups+samples)),each=groups),
                   uni.const.2,
                   rep(1,length.obj)),ncol=3)
uni.dir=rep("=",samples)
uni.rhs=rep(1,samples)

#相同種類不可分在同一組
type.dup=type[duplicated(type)]
#從第一列位置推算所有列位置
dup.const.1=c()
dup.const.2=c()
for(j in 1:length(type.dup)){
  dup.const.i=which(type==type.dup[j])
  for(i in 1:groups){
    dup.const.2=c(dup.const.2,dup.const.i+(i-1)*samples)
    dup.const.1=c(dup.const.1,
                  rep((groups+samples+(j-1)*groups+i),length(dup.const.i)))
  }
}

dup.const=matrix(c(dup.const.1,
                   dup.const.2,
                   rep(1,length(dup.const.1))),ncol=3)
dup.dir=rep("<=",length(unique(dup.const.1)))
dup.rhs=rep(1,length(unique(dup.const.1)))

####合併篩選條件####
fin.const=rbind(group.const,uni.const,dup.const)
fin.dir <- c(group.dir,uni.dir,dup.dir)
fin.rhs <- c(group.rhs,uni.rhs,dup.rhs)

###設定候選組合
num.solns=3
x=lp ('max', objective.in = obj, const.dir = fin.dir, const.rhs = fin.rhs, 
      dense.const = fin.const, 
      all.bin=TRUE, num.bin.solns=num.solns)

#將結果分開
res <- split(x$solution[1:(num.solns*length.obj)], rep(1:num.solns, each=length.obj))

#每個組合和組總量
rbind(weights * matrix(res$`1`,samples),weights %*% matrix(res$`1`,samples))
rbind(weights * matrix(res$`2`,samples),weights %*% matrix(res$`2`,samples))
rbind(weights * matrix(res$`3`,samples),weights %*% matrix(res$`3`,samples))