如果你正在學習SQL,這邊介紹一個方便簡易的方式,不須透過安裝傳統的MySQL,只要利用Google提供的BigQuery開源示範dataset就可以實作練習,另一方面,如果你想應用自己手邊上的資料,也能簡單的使用匯入功能(如何匯入看
這邊)。
接著下來,我們開始以公開專案「data-to-insights」示範如何操作,第一步,我們需先啟用BigQuery 沙盒,以免查詢使用了幾次就不小心噴錢。
借助 BigQuery 沙盒,可以免費探索BigQuery功能,以確認 BigQuery 是否滿足需求。有了沙盒,不需要信用卡、創建結算帳號或為項目啟用結算功能就能體驗BigQuery 和 Google Cloud 控制台。(詳細內容可參考官方說明與設定)
第二步,加入「data-to-insights」專案到你的環境裡。
- 點選Navigation menu > BigQuery。
- BigQuery的公開資料集是沒有預先展示在BigQuery web UI中,所以我們就必須新增公開資料集的專案,點選+ Add Data。
- 選擇Pin project > Enter project name。
- 輸入名稱「data-to-insights」。
- 最後點擊pin。
探索ecommerce的資料並且找出重複的列數據
我們選擇「data-to-insights」專案底下「ecommerce」的dataset,然後點選「all_sessions_raw」的table。
在右邊的出現的窗格中,有三個tab提供此table的基本資訊。
- Schema tab: 就是table的schema結購
- Details tab: 這個table的大小與資料的總比較
- Preview tab: 預覽這個table的資料
輸入以下的SQL Query,找出全部欄位完全重複的row。
#standardSQL
SELECT COUNT(*) as num_duplicate_rows, * FROM
`data-to-insights.ecommerce.all_sessions_raw`
GROUP BY
fullVisitorId, channelGrouping, time, country, city, totalTransactionRevenue, transactions, timeOnSite, pageviews, sessionQualityDim, date, visitId, type, productRefundAmount, productQuantity, productPrice, productRevenue, productSKU, v2ProductName, v2ProductCategory, productVariant, currencyCode, itemQuantity, itemRevenue, transactionRevenue, transactionId, pageTitle, searchKeyword, pagePathLevel1, eCommerceAction_type, eCommerceAction_step, eCommerceAction_option
HAVING num_duplicate_rows > 1;
另外,重複相似的動作,確認另一個「all_sessions」的table,有沒有重複的row。
#standardSQL
# schema: https://support.google.com/analytics/answer/3437719?hl=en
SELECT
fullVisitorId, # the unique visitor ID
visitId, # a visitor can have multiple visits
date, # session date stored as string YYYYMMDD
time, # time of the individual site hit (can be 0 to many per visitor session)
v2ProductName, # not unique since a product can have variants like Color
productSKU, # unique for each product
type, # a visitor can visit Pages and/or can trigger Events (even at the same time)
eCommerceAction_type, # maps to ‘add to cart', ‘completed checkout'
eCommerceAction_step,
eCommerceAction_option,
transactionRevenue, # revenue of the order
transactionId, # unique identifier for revenue bearing transaction
COUNT(*) as row_count
FROM
`data-to-insights.ecommerce.all_sessions`
GROUP BY 1,2,3 ,4, 5, 6, 7, 8, 9, 10,11,12
HAVING row_count > 1 # find duplicates
這次我們的GROUP BY使用數字代表,可以簡略寫code的時間,「all_sessions」顯示的是已經排除重複row的結果。
接著,我們就針對已經清理過「all_sessions」table,寫一個簡單的Query去了解每一個來源渠道有多少不重複使用者造訪。
#standardSQL
SELECT
COUNT(DISTINCT fullVisitorId) AS unique_visitors,
channelGrouping
FROM `data-to-insights.ecommerce.all_sessions`
GROUP BY channelGrouping
ORDER BY channelGrouping DESC;
我們接下來想知道所有使用者瀏覽次數最多的前五個產品類別是哪些(依降冪排序)。
#standardSQL
SELECT
COUNT(*) AS product_views,
(v2ProductName) AS ProductName
FROM `data-to-insights.ecommerce.all_sessions`
WHERE type = 'PAGE'
GROUP BY v2ProductName
ORDER BY product_views DESC
LIMIT 5;
進一步,我們可以使用 SQL 的WITH 子句協助將比較復雜的Query分解為多個步驟。 在這裡,我們首先創建一個Query,查找每個使用者瀏覽的不重複產品類別,接著,對所有使用者和產品執行聚合,計算不重複瀏覽數。
WITH unique_product_views_by_person AS (
-- find each unique product viewed by each visitor
SELECT
fullVisitorId,
(v2ProductName) AS ProductName
FROM `data-to-insights.ecommerce.all_sessions`
WHERE type = 'PAGE'
GROUP BY fullVisitorId, v2ProductName )
-- aggregate the top viewed products and sort them
SELECT
COUNT(*) AS unique_view_count,
ProductName
FROM unique_product_views_by_person
GROUP BY ProductName
ORDER BY unique_view_count DESC
LIMIT 5
挑戰1:產品瀏覽後轉換率
計算出所有的訂單數與訂單量。
#standardSQL
SELECT
COUNT(*) AS product_views,
COUNT(productQuantity) AS orders,
SUM(productQuantity) AS quantity_product_ordered,
v2ProductName
FROM `data-to-insights.ecommerce.all_sessions`
WHERE type = 'PAGE'
GROUP BY v2ProductName
ORDER BY product_views DESC
LIMIT 5;
下一步,計算每個產品的瀏覽轉換率(假設排除帶有frisbees的商品名稱,且希望總訂單量至少大於1000單位)。
#standardSQL
SELECT
COUNT(*) AS product_views,
COUNT(productQuantity) AS potential_orders,
SUM(productQuantity) AS quantity_product_added,
(COUNT(productQuantity) / COUNT(*)) AS conversion_rate,
v2ProductName
FROM `data-to-insights.ecommerce.all_sessions`
WHERE LOWER(v2ProductName) NOT LIKE '%frisbee%'
GROUP BY v2ProductName
HAVING quantity_product_added > 1000
ORDER BY conversion_rate DESC
LIMIT 10;
挑戰2:追蹤用戶的結帳程序
探索完成結帳程序的每一步驟有多少不重覆使用者。
#standardSQL
SELECT
COUNT(DISTINCT fullVisitorId) AS number_of_unique_visitors,
eCommerceAction_type
FROM `data-to-insights.ecommerce.all_sessions`
GROUP BY eCommerceAction_type
ORDER BY eCommerceAction_type;
把原本的步驟數字加上有意義的label名稱。
#standardSQL
SELECT
COUNT(DISTINCT fullVisitorId) AS number_of_unique_visitors,
eCommerceAction_type,
CASE eCommerceAction_type
WHEN '0' THEN 'Unknown'
WHEN '1' THEN 'Click through of product lists'
WHEN '2' THEN 'Product detail views'
WHEN '3' THEN 'Add product(s) to cart'
WHEN '4' THEN 'Remove product(s) from cart'
WHEN '5' THEN 'Check out'
WHEN '6' THEN 'Completed purchase'
WHEN '7' THEN 'Refund of purchase'
WHEN '8' THEN 'Checkout options'
ELSE 'ERROR'
END AS eCommerceAction_type_label
FROM `data-to-insights.ecommerce.all_sessions`
GROUP BY eCommerceAction_type
ORDER BY eCommerceAction_type;
所以從上面的結果,我們可以得知加入購物車後實際完成購買的使用者比率為19988 / 56010 = 35.68%
挑戰3:追蹤高品質工作階段的購物車放棄數
#standardSQL
# high quality abandoned carts
SELECT
#unique_session_id
CONCAT(fullVisitorId,CAST(visitId AS STRING)) AS unique_session_id,
sessionQualityDim,
SUM(productRevenue) AS transaction_revenue,
MAX(eCommerceAction_type) AS checkout_progress
FROM `data-to-insights.ecommerce.all_sessions`
WHERE sessionQualityDim > 60 # high quality session
GROUP BY unique_session_id, sessionQualityDim
HAVING
checkout_progress = '3' # 3 = added to cart
AND (transaction_revenue = 0 OR transaction_revenue IS NULL)