夥伴:真實數據在做的事情,合成數據可以「一起」做嗎?(Can we do the same things with synthetic data that we do with real data?)
替代:真實數據在做的事情,合成數據可以「自己」做嗎?(Can we do the same things to synthetic data that we do to real data?)
你覺得哪個比較合理呢?我覺得後者比較理想。
思考#2:夥伴還是替代,其實還是看想做什麼任務。
合成數據作為夥伴,想做的事是
建築模型
實踐資料分析
檢驗假說
合成數據作為替代,想做的事是
連結分開的數據集
延伸合成數據集,當新的紀錄加入原數據集
感覺這兩個理解都有道理,但好像跟我自己理解with與to的角度不一樣。
這邊的with感覺是替代。
這邊的to感覺更是夥伴。
需要繼續寫文章細化。
思考#3:合成數據學習,仍然是大量未知。
主要的擔憂,是隱私數據有偏誤。
A particular concern for private data is bias. Ghalebikesabi et al. [34] warn against the risks of learning from synthetic data, and propose a methodology for learning unbiasedly from such data. Wilde et al. [35] demonstrate superior performance when model parameters are updated using Bayesian inference, rather than approaches that fail to account for the fact the training data is synthetic.
貝式推論,去調整模型參數,表現更好。[35] Harrison Wilde, Jack Jewson, Sebastian Vollmer, and Chris Holmes. Foundations of bayesian learning from synthetic data. In International Conference on Artificial Intelligence and Statistics, pages 541–549. PMLR, 2021. 🤯🤯🤯🤯🤯