Google Professional Data Engineer 證照考古題大全20241118

小豬科技

發佈於證照考古題大全

更新於 2024/11/18發佈於 2024/11/18閱讀時間約 95 分鐘

谷歌數據工程師證照題庫彙整 20241118

Google Professional Data Engineer

Google Cloud Platform（GCP 谷歌雲）全系列考古題，2024年最新題庫，持續更新，全網最完整。GCP 證照含金量高，自我進修、跨足雲端產業必備近期版本更新，隨時追蹤最新趨勢變化。

QUESTION 81

MJTelco Case Study Company Overview

MJTelco is a startup that plans to build networks in rapidly growing, underserved markets around the world. The company has patents for innovative optical communications hardware. Based on these patents, they can create many reliable, high-speedbackbone links with inexpensive hardware.

Company Background

Founded by experienced telecom executives, MJTelco uses technologies originally developed to overcome communicationschallenges in space. Fundamental to their operation, they need to create a distributed data infrastructure that drives real-timeanalysis and incorporates machine learning to continuously optimize their topologies. Because their hardware is inexpensive, they plan to overdeploy the network allowing them to account for the impact of dynamic regional politics on location availability and cost. Their management and operations teams are situated all around the globe creating many-to-many relationship between data consumers and provides in their system. After careful consideration, they decided public cloud is the perfectenvironment to support their needs.

Solution Concept

MJTelco is running a successful proof-of-concept (PoC) project in its labs. They have two primary needs: Scale and harden theirPoC to support significantly more data flows generated when they ramp to more than 50,000 installations.

Refine their machine-learning cycles to verify and improve the dynamic models they use to control topology definition.

MJTelco will also use three separate operating environments ?development/test, staging, and production ? to meet the needs ofrunning experiments, deploying new features, and serving production customers.

Business Requirements

Scale up their production environment with minimal cost, instantiating resources when and where needed in an unpredictable,distributed telecom user community.

Ensure security of their proprietary data to protect their leading-edge machine learning and analysis. Provide reliable and timelyaccess to data for analysis from distributed research workers. Maintain isolated environments that support rapid iteration of theirmachine-learning models without affecting their customers.

Technical Requirements

Ensure secure and efficient transport and storage of telemetry data Rapidly scale instances to support between 10,000 and100,000 data providers with multiple flows each. Allow analysis and presentation against data tables tracking up to 2 years of data storing approximately 100m records/day

Support rapid iteration of monitoring infrastructure focused on awareness of data pipeline problems both in telemetry flows and inproduction learning cycles.

CEO Statement

Our business model relies on our patents, analytics and dynamic machine learning. Our inexpensive hardware is organizedto be highly reliable, which gives us cost advantages. We need to quickly stabilize our large distributed data pipelines to meetour reliability and capacity commitments.

CTO Statement

Our public cloud services must operate as advertised. We need resources that scale and keep our data secure. We also needenvironments in which our data scientists can carefully study and quickly adapt our models. Because we rely on automation to process our data, we also need our development and test environments to work as we iterate.

CFO Statement

The project is too large for us to maintain the hardware and software required for the data and analysis. Also, we cannot afford to staff an operations team to monitor so many data feeds, so we will rely on automation and infrastructure. Google Cloud'smachine learning will allow our quantitative researchers to work on our high- value problems instead of problems with our datapipelines.

You need to compose visualization for operations teams with the following requirements:

Telemetry must include data from all 50,000 installations for the most recent 6 weeks (sampling once every

minute)

The report must not be more than 3 hours delayed from live data.

The actionable report should only show suboptimal links.

Most suboptimal links should be sorted to the top.

Suboptimal links can be grouped and filtered by regional geography.

User response time to load the report must be <5 seconds.

You create a data source to store the last 6 weeks of data, and create visualizations that allow viewers to see multiple dateranges, distinct geographic regions, and unique installation types. You always show the latest data without any changes to your visualizations. You want to avoid creating and updating new visualizations each month. What should you do?

A. Look through the current data and compose a series of charts and tables, one for each possible combination ofcriteria.

B. Look through the current data and compose a small set of generalized charts and tables bound to criteria filters thatallow value selection.

C. Export the data to a spreadsheet, compose a series of charts and tables, one for each possible combination ofcriteria, and spread them across multiple tabs.

D. Load the data into relational database tables, write a Google App Engine application that queries all rows, summarizesthe data across each criteria, and then renders results using the Google Charts and visualization API.

Correct Answer: B

Section: (none)

QUESTION 82

MJTelco Case Study Company Overview

Company Background

Solution Concept

Refine their machine-learning cycles to verify and improve the dynamic models they use to control topology definition.

Business Requirements

Scale up their production environment with minimal cost, instantiating resources when and where needed in an unpredictable,distributed telecom user community.

Ensure security of their proprietary data to protect their leading-edge machine learning and analysis.

Provide reliable and timely access to data for analysis from distributed research workers.

Maintain isolated environments that support rapid iteration of their machine-learning models without affecting their customers.

Technical Requirements

Support rapid iteration of monitoring infrastructure focused on awareness of data pipeline problems both in telemetry flows and inproduction learning cycles.

CEO Statement

Our business model relies on our patents, analytics and dynamic machine learning. Our inexpensive

hardware is organized to be highly reliable, which gives us cost advantages. We need to quickly stabilize our large distributeddata pipelines to meet our reliability and capacity commitments.

CTO Statement

CFO Statement

Given the record streams MJTelco is interested in ingesting per day, they are concerned about the cost of Google BigQuery increasing. MJTelco asks you to provide a design solution. They require a single large data table called tracking_table. Additionally, they want to minimize the cost of daily queries while performing fine-grained analysis of each day's events. They also want to use streaming ingestion. What should you do?

A. Create a table called tracking_table and include a DATE column.

B. Create a partitioned table called tracking_table and include a TIMESTAMP column.

C. Create sharded tables for each day following the pattern tracking_table_YYYYMMDD.

D. Create a table called tracking_table with a TIMESTAMP column to represent the day.

Correct Answer: B

Section: (none)

QUESTION 83

Flowlogistic Case Study Company Overview

Flowlogistic is a leading logistics and supply chain provider. They help businesses throughout the world manage their resources and transport them to their final destination. The company has grown rapidly, expanding their offerings to include rail, truck,aircraft, and oceanic shipping.

Company Background

The company started as a regional trucking company, and then expanded into other logistics market. Because they have not updated their infrastructure, managing and tracking orders and shipments has become a bottleneck. To improve operations, Flowlogistic developed proprietary technology for tracking shipments in real time at the parcel level. However, they are unable todeploy it because their technology stack, based on Apache Kafka, cannot support the processing volume. In addition, Flowlogistic wants to further analyze their orders and shipments to determine how best to deploy their resources.

Solution Concept

Flowlogistic wants to implement two concepts using the cloud:

Use their proprietary technology in a real-time inventory-tracking system that indicates the location of their loads

Perform analytics on all their orders and shipment logs, which contain both structured and unstructured data, to determine how best to deploy resources, which markets to expand info. They also want to use predictive analytics to learn earlier when ashipment will be delayed.

Existing Technical Environment

Flowlogistic architecture resides in a single data center: Databases

- 8 physical servers in 2 clusters

- SQL Server ?user data, inventory, static data

- 3 physical servers

- Cassandra ?metadata, tracking messages

10 Kafka servers ?tracking message aggregation and batch insert

Application servers ?customer front end, middleware for order/customs

- 60 virtual machines across 20 physical servers

- Tomcat ?Java services

- Nginx ?static content

- Batch servers Storage appliances

- iSCSI for virtual machine (VM) hosts

- Fibre Channel storage area network (FC SAN) ?SQL server storage Network-attached storage (NAS) image storage, logs,backups 10 Apache Hadoop /Spark servers

- Core Data Lake

- Data analysis workloads 20 miscellaneousservers

- Jenkins, monitoring, bastion hosts,

Business Requirements

Build a reliable and reproducible environment with scaled panty of production. Aggregate data in acentralized Data Lake for analysis

Use historical data to perform predictive analytics on future shipments Accurately track everyshipment worldwide using proprietary technology

Improve business agility and speed of innovation through rapid provisioning of new resources Analyze and optimizearchitecture for performance in the cloud

Migrate fully to the cloud if all other requirements are met TechnicalRequirements

Handle both streaming and batch data Migrate existingHadoop workloads

Ensure architecture is scalable and elastic to meet the changing demands of the company. Use managed serviceswhenever possible

Encrypt data flight and at rest

Connect a VPN between the production data center and cloud environment

SEO Statement

We have grown so quickly that our inability to upgrade our infrastructure is really hampering further growth and efficiency. Weare efficient at moving shipments around the world, but we are inefficient at moving data around.

We need to organize our information so we can more easily understand where our customers are and what they are shipping.

CTO Statement

IT has never been a priority for us, so as our data has grown, we have not invested enough in our technology. I have a good staff to manage IT, but they are so busy managing our infrastructure that I cannot get them to do the things that really matter, such as organizing our data, building the analytics, and figuring out how to implement the CFO' s tracking technology.

CFO Statement

Part of our competitive advantage is that we penalize ourselves for late shipments and deliveries. Knowing where out shipmentsare at all times has a direct correlation to our bottom line and profitability. Additionally, I don't want to commit capital to buildingout a server environment.

Flowlogistic's management has determined that the current Apache Kafka servers cannot handle the data volume for their real-time inventory tracking system. You need to build a new system on Google Cloud Platform (GCP) that will feed the proprietarytracking software. The system must be able to ingest data from a variety of global sources, process and query in real-time, and store the data reliably. Which combination of GCP products should you choose?

A. Cloud Pub/Sub, Cloud Dataflow, and Cloud Storage

B. Cloud Pub/Sub, Cloud Dataflow, and Local SSD

C. Cloud Pub/Sub, Cloud SQL, and Cloud Storage

D. Cloud Load Balancing, Cloud Dataflow, and Cloud Storage

E. Cloud Dataflow, Cloud SQL, and Cloud Storage

Correct Answer: A

Section: (none)

QUESTION 84

After migrating ETL jobs to run on BigQuery, you need to verify that the output of the migrated jobs is the same as the output of the original. You've loaded a table containing the output of the original job and want to compare the contents with output from the migrated job to show that they are identical. The tables do not contain a primary key column that would enable you to jointhem together for comparison.

What should you do?

A. Select random samples from the tables using the RAND() function and compare the samples.

B. Select random samples from the tables using the HASH() function and compare the samples.

C. Use a Dataproc cluster and the BigQuery Hadoop connector to read the data from each table and calculate a hash fromnon-timestamp columns of the table after sorting. Compare the hashes of each table.

D. Create stratified random samples using the OVER() function and compare equivalent samples from each table.

Correct Answer: C

Section: (none)

QUESTION 85

You are a head of BI at a large enterprise company with multiple business units that each have different priorities and budgets.You use on-demand pricing for BigQuery with a quota of 2K concurrent on-demand slots per project. Users at your organizationsometimes don't get slots to execute their query and you need to correct this. You'd like to avoid introducing new projects to your account.

What should you do?

A. Convert your batch BQ queries into interactive BQ queries.

B. Create an additional project to overcome the 2K on-demand per-project quota.

C. Switch to flat-rate pricing and establish a hierarchical priority model for your projects.

D. Increase the amount of concurrent slots per project at the Quotas page at the Cloud Console.

Correct Answer: C

Section: (none)

QUESTION 86

You have an Apache Kafka cluster on-prem with topics containing web application logs. You need to replicate the data to GoogleCloud for analysis in BigQuery and Cloud Storage. The preferred replication method is mirroring to avoid deployment of KafkaConnect plugins.

What should you do?

A. Deploy a Kafka cluster on GCE VM Instances. Configure your on-prem cluster to mirror your topics to the cluster running in GCE. Use a Dataproc cluster or Dataflow job to read from Kafka and write to GCS.

B. Deploy a Kafka cluster on GCE VM Instances with the PubSub Kafka connector configured as a Sink connector. Use aDataproc cluster or Dataflow job to read from Kafka and write to GCS.

C. Deploy the PubSub Kafka connector to your on-prem Kafka cluster and configure PubSub as a Source connector. Use aDataflow job to read from PubSub and write to GCS.

D. Deploy the PubSub Kafka connector to your on-prem Kafka cluster and configure PubSub as a Sink connector. Use aDataflow job to read from PubSub and write to GCS.

Correct Answer: A

Section: (none)

QUESTION 87

You've migrated a Hadoop job from an on-prem cluster to dataproc and GCS. Your Spark job is a complicated analyticalworkload that consists of many shuffing operations and initial data are parquet files (on average 200-400 MB size each). You seesome degradation in performance after the migration to

Dataproc, so you'd like to optimize for it. You need to keep in mind that your organization is very cost- sensitive, so you'd liketo continue using Dataproc on preemptibles (with 2 non-preemptible workers only) for this workload.

What should you do?

A. Increase the size of your parquet files to ensure them to be 1 GB minimum.

B. Switch to TFRecords formats (appr. 200MB per file) instead of parquet files.

C. Switch from HDDs to SSDs, copy initial data from GCS to HDFS, run the Spark job and copy results back to GCS.

D. Switch from HDDs to SSDs, override the preemptible VMs configuration to increase the boot disk size.

Correct Answer: D

Section: (none)

QUESTION 88

Your team is responsible for developing and maintaining ETLs in your company. One of your Dataflow jobs is failing because of some errors in the input data, and you need to improve reliability of the pipeline (incl. being able to reprocess all failing data).

What should you do?

A. Add a filtering step to skip these types of errors in the future, extract erroneous rows from logs.

B. Add a try... catch block to your DoFn that transforms the data, extract erroneous rows from logs.

C. Add a try... catch block to your DoFn that transforms the data, write erroneous rows to PubSub directly from the DoFn.

D. Add a try... catch block to your DoFn that transforms the data, use a sideOutput to create a PCollection that can be storedto PubSub later.

Correct Answer: D

Section: (none)

QUESTION 89

You're training a model to predict housing prices based on an available dataset with real estate properties. Your plan is to train a fully connected neural net, and you've discovered that the dataset contains latitude and longitude of the property. Real estate professionals have told you that the location of the property is highly influential on price, so you'd like to engineer a feature thatincorporates this physical dependency.

What should you do?

A. Provide latitude and longitude as input vectors to your neural net.

B. Create a numeric column from a feature cross of latitude and longitude.

C. Create a feature cross of latitude and longitude, bucketize at the minute level and use L1 regularization during optimization.

D. Create a feature cross of latitude and longitude, bucketize it at the minute level and use L2 regularizationduring optimization.

Correct Answer: C

Section: (none)

QUESTION 90

You are deploying MariaDB SQL databases on GCE VM Instances and need to configure monitoring and alerting. You want to collect metrics including network connections, disk IO and replication status from MariaDB with minimal development effort anduse StackDriver for dashboards and alerts.

What should you do?

A. Install the OpenCensus Agent and create a custom metric collection application with a StackDriver exporter.

B. Place the MariaDB instances in an Instance Group with a Health Check.

C. Install the StackDriver Logging Agent and configure fluentd in_tail plugin to read MariaDB logs.

D. Install the StackDriver Agent and configure the MySQL plugin.

Correct Answer: C

Section: (none)

QUESTION 91

You work for a bank. You have a labelled dataset that contains information on already granted loan application and whether theseapplications have been defaulted. You have been asked to train a model to predict default rates for credit applicants.

What should you do?

A. Increase the size of the dataset by collecting additional data.

B. Train a linear regression to predict a credit default risk score.

C. Remove the bias from the data and collect applications that have been declined loans.

D. Match loan applicants with their social profiles to enable feature engineering.

Correct Answer: B

Section: (none)

QUESTION 92

You need to migrate a 2TB relational database to Google Cloud Platform. You do not have the resources to significantly refactorthe application that uses this database and cost to operate is of primary concern.

Which service do you select for storing and serving your data?

A. Cloud Spanner

B. Cloud Bigtable

C. Cloud Firestore

D. Cloud SQL

Correct Answer: D

Section: (none)

QUESTION 93

You're using Bigtable for a real-time application, and you have a heavy load that is a mix of read and writes. You've recently identified an additional use case and need to perform hourly an analytical job to calculate certain statistics across the whole database. You need to ensure both the reliability of your production application as well as the analytical workload.

What should you do?

A. Export Bigtable dump to GCS and run your analytical job on top of the exported files.

B. Add a second cluster to an existing instance with a multi-cluster routing, use live-traffic app profile for your regular workload and batch-analytics profile for the analytics workload.

C. Add a second cluster to an existing instance with a single-cluster routing, use live-traffic app profile for your regular workload and batch-analytics profile for the analytics workload.

D. Increase the size of your existing cluster twice and execute your analytics workload on your new resized cluster.

Correct Answer: C

Section: (none)

QUESTION 94

You are designing an Apache Beam pipeline to enrich data from Cloud Pub/Sub with static reference data from BigQuery. Thereference data is small enough to fit in memory on a single worker. The pipeline should write enriched results to BigQuery foranalysis. Which job type and transforms should this pipeline use?

A. Batch job, PubSubIO, side-inputs

B. Streaming job, PubSubIO, JdbcIO, side-outputs

C. Streaming job, PubSubIO, BigQueryIO, side-inputs

D. Streaming job, PubSubIO, BigQueryIO, side-outputs

Correct Answer: C

Section: (none)

QUESTION 95

You have a data pipeline that writes data to Cloud Bigtable using well-designed row keys. You want to monitor your pipelineto determine when to increase the size of you Cloud Bigtable cluster. Which two actions can you take to accomplish this?(Choose two.)

A. Review Key Visualizer metrics. Increase the size of the Cloud Bigtable cluster when the Read pressure index is above 100.

B. Review Key Visualizer metrics. Increase the size of the Cloud Bigtable cluster when the Write pressure index is above 100.

C. Monitor the latency of write operations. Increase the size of the Cloud Bigtable cluster when there is a sustained increasein write latency.

D. Monitor storage utilization. Increase the size of the Cloud Bigtable cluster when utilization increases above 70% of maxcapacity.

E. Monitor latency of read operations. Increase the size of the Cloud Bigtable cluster of read operations take longer than100 ms.

Correct Answer: CD

Section: (none)

QUESTION 96

You want to analyze hundreds of thousands of social media posts daily at the lowest cost and with the fewest steps.

You have the following requirements:

You will batch-load the posts once per day and run them through the Cloud Natural Language API. You will extract topicsand sentiment from the posts.

You must store the raw posts for archiving and reprocessing.

You will create dashboards to be shared with people both inside and outside your organization.

You need to store both the data extracted from the API to perform analysis as well as the raw social media posts for historicalarchiving. What should you do?

A. Store the social media posts and the data extracted from the API in BigQuery.

B. Store the social media posts and the data extracted from the API in Cloud SQL.

C. Store the raw social media posts in Cloud Storage, and write the data extracted from the API into BigQuery.

D. Feed to social media posts into the API directly from the source, and write the extracted data from the API into BigQuery.

Correct Answer: C

Section: (none)

QUESTION 97

You store historic data in Cloud Storage. You need to perform analytics on the historic data. You want to use a solution to detect invalid data entries and perform data transformations that will not require programming or knowledge of SQL.

What should you do?

A. Use Cloud Dataflow with Beam to detect errors and perform transformations.

B. Use Cloud Dataprep with recipes to detect errors and perform transformations.

C. Use Cloud Dataproc with a Hadoop job to detect errors and perform transformations.

D. Use federated tables in BigQuery with queries to detect errors and perform transformations.

Correct Answer: B

Section: (none)

QUESTION 98

Your company needs to upload their historic data to Cloud Storage. The security rules don't allow access from external IPs totheir on-premises resources. After an initial upload, they will add new data from existing on- premises applications every day.What should they do?

A. Execute gsutil rsync from the on-premises servers.

B. Use Cloud Dataflow and write the data to Cloud Storage.

C. Write a job template in Cloud Dataproc to perform the data transfer.

D. Install an FTP server on a Compute Engine VM to receive the files and move them to Cloud Storage.

Correct Answer: A

Section: (none)

QUESTION 99

You have a query that filters a BigQuery table using a WHERE clause on timestamp and ID columns. By using bq query ?-dry_run you learn that the query triggers a full scan of the table, even though the filter on timestamp and ID select a tiny fractionof the overall data. You want to reduce the amount of data scanned by BigQuery with minimal changes to existing SQL queries.What should you do?

A. Create a separate table for each ID.

B. Use the LIMIT keyword to reduce the number of rows returned.

C. Recreate the table with a partitioning column and clustering column.

D. Use the bq query - -maximum_bytes_billed flag to restrict the number of bytes billed.

Correct Answer: C

Section: (none)

QUESTION 100

You have a requirement to insert minute-resolution data from 50,000 sensors into a BigQuery table. You expect significantgrowth in data volume and need the data to be available within 1 minute of ingestion for real- time analysis of aggregatedtrends. What should you do?

A. Use bq load to load a batch of sensor data every 60 seconds.

B. Use a Cloud Dataflow pipeline to stream data into the BigQuery table.

C. Use the INSERT statement to insert a batch of data every 60 seconds.

D. Use the MERGE statement to apply updates in batch every 60 seconds.

Correct Answer: B

Section: (none)

QUESTION 101

You need to copy millions of sensitive patient records from a relational database to BigQuery. The total size of the database is 10 TB. You need to design a solution that is secure and time-efficient. What should you do?

A. Export the records from the database as an Avro file. Upload the file to GCS using gsutil, and then load the Avro file intoBigQuery using the BigQuery web UI in the GCP Console.

B. Export the records from the database as an Avro file. Copy the file onto a Transfer Appliance and send it to Google, andthen load the Avro file into BigQuery using the BigQuery web UI in the GCP Console.

C. Export the records from the database into a CSV file. Create a public URL for the CSV file, and then use Storage Transfer Service to move the file to Cloud Storage. Load the CSV file into BigQuery using the BigQuery web UI in the GCP Console.

D. Export the records from the database as an Avro file. Create a public URL for the Avro file, and then use Storage Transfer Service to move the file to Cloud Storage. Load the Avro file into BigQuery using the BigQuery web UI in the GCP Console.

Correct Answer: B

Section: (none)

QUESTION 102

You need to create a near real-time inventory dashboard that reads the main inventory tables in your BigQuery datawarehouse. Historical inventory data is stored as inventory balances by item and location. You have several thousand updates to inventory every hour. You want to maximize performance of the dashboard and ensure that the data is accurate. Whatshould you do?

A. Leverage BigQuery UPDATE statements to update the inventory balances as they are changing.

B. Partition the inventory balance table by item to reduce the amount of data scanned with each inventory update.

C. Use the BigQuery streaming the stream changes into a daily inventory movement table. Calculate balances in a viewthat joins it to the historical inventory balance table. Update the inventory balance table nightly.

D. Use the BigQuery bulk loader to batch load inventory changes into a daily inventory movement table. Calculate balancesin a view that joins it to the historical inventory balance table. Update the inventory balance table nightly.

Correct Answer: C

Section: (none)

QUESTION 103

You have a data stored in BigQuery. The data in the BigQuery dataset must be highly available. You need to define a storage, backup, and recovery strategy of this data that minimizes cost. How should you configure the BigQuery table?

A. Set the BigQuery dataset to be regional. In the event of an emergency, use a point-in-time snapshot to recover the data.

B. Set the BigQuery dataset to be regional. Create a scheduled query to make copies of the data to tables suffixed with thetime of the backup. In the event of an emergency, use the backup copy of the table.

C. Set the BigQuery dataset to be multi-regional. In the event of an emergency, use a point-in-time snapshot to recoverthe data.

D. Set the BigQuery dataset to be multi-regional. Create a scheduled query to make copies of the data to tables suffixed with the time of the backup. In the event of an emergency, use the backup copy of the table.

Correct Answer: C

Section: (none)

QUESTION 104

You used Cloud Dataprep to create a recipe on a sample of data in a BigQuery table. You want to reuse this recipe on a dailyupload of data with the same schema, after the load job with variable execution time completes. What should you do?

A. Create a cron schedule in Cloud Dataprep.

B. Create an App Engine cron job to schedule the execution of the Cloud Dataprep job.

C. Export the recipe as a Cloud Dataprep template, and create a job in Cloud Scheduler.

D. Export the Cloud Dataprep job as a Cloud Dataflow template, and incorporate it into a Cloud Composer job.

Correct Answer: D

Section: (none)

QUESTION 105

You want to automate execution of a multi-step data pipeline running on Google Cloud. The pipeline includes Cloud Dataprocand Cloud Dataflow jobs that have multiple dependencies on each other. You want to use managed services where possible, andthe pipeline will run every day. Which tool should you use?

A. cron

B. Cloud Composer

C. Cloud Scheduler

D. Workflow Templates on Cloud Dataproc

Correct Answer: B

Section: (none)

QUESTION 106

You are managing a Cloud Dataproc cluster. You need to make a job run faster while minimizing costs, without losing work inprogress on your clusters. What should you do?

A. Increase the cluster size with more non-preemptible workers.

B. Increase the cluster size with preemptible worker nodes, and configure them to forcefully decommission.

C. Increase the cluster size with preemptible worker nodes, and use Cloud Stackdriver to trigger a script to preserve work.

D. Increase the cluster size with preemptible worker nodes, and configure them to use graceful decommissioning.

Correct Answer: D

Section: (none)

QUESTION 107

You work for a shipping company that uses handheld scanners to read shipping labels. Your company has strict data privacy standards that require scanners to only transmit recipients' personally identifiable information (PII) to analytics systems, whichviolates user privacy rules. You want to quickly build a scalable solution using cloud-native managed services to prevent exposure of PII to the analytics systems. What should you do?

A. Create an authorized view in BigQuery to restrict access to tables with sensitive data.

B. Install a third-party data validation tool on Compute Engine virtual machines to check the incoming data for sensitiveinformation.

C. Use Stackdriver logging to analyze the data passed through the total pipeline to identify transactions that may containsensitive information.

D. Build a Cloud Function that reads the topics and makes a call to the Cloud Data Loss Prevention API. Use the tagging andconfidence levels to either pass or quarantine the data in a bucket for review.

Correct Answer: D

Section: (none)

QUESTION 108

You have developed three data processing jobs. One executes a Cloud Dataflow pipeline that transforms data uploaded to Cloud Storage and writes results to BigQuery. The second ingests data from on-premises servers and uploads it to Cloud Storage. The third is a Cloud Dataflow pipeline that gets information from third-party data providers and uploads the information to Cloud Storage. You need to be able to schedule and monitor the execution of these three workflows and manually executethem when needed. What should you do?

A. Create a Direct Acyclic Graph in Cloud Composer to schedule and monitor the jobs.

B. Use Stackdriver Monitoring and set up an alert with a Webhook notification to trigger the jobs.

C. Develop an App Engine application to schedule and request the status of the jobs using GCP API calls.

D. Set up cron jobs in a Compute Engine instance to schedule and monitor the pipelines using GCP API calls.

Correct Answer: A

Section: (none)

QUESTION 109

You have Cloud Functions written in Node.js that pull messages from Cloud Pub/Sub and send the data to BigQuery. You observe that the message processing rate on the Pub/Sub topic is orders of magnitude higher than anticipated, but there is noerror logged in Stackdriver Log Viewer. What are the two most likely causes of this problem? (Choose two.)

A. Publisher throughput quota is too small.

B. Total outstanding messages exceed the 10-MB maximum.

C. Error handling in the subscriber code is not handling run-time errors properly.

D. The subscriber code cannot keep up with the messages.

E. The subscriber code does not acknowledge the messages that it pulls.

Correct Answer: CE

Section: (none)

QUESTION 110

You are creating a new pipeline in Google Cloud to stream IoT data from Cloud Pub/Sub through Cloud Dataflow to BigQuery. While previewing the data, you notice that roughly 2% of the data appears to be corrupt. You need to modify the Cloud Dataflowpipeline to filter out this corrupt data. What should you do?

A. Add a SideInput that returns a Boolean if the element is corrupt.

B. Add a ParDo transform in Cloud Dataflow to discard corrupt elements.

C. Add a Partition transform in Cloud Dataflow to separate valid data from corrupt data.

D. Add a GroupByKey transform in Cloud Dataflow to group all of the valid data together and discard the rest.

Correct Answer: B

Section: (none)

留言

留言分享你的想法！

小豬科技的沙龍

4會員

97內容數

小豬科技 - 您的雲端伺服器解決方案我們是領先的雲端伺服器供應商，提供來自 AWS、GCP、阿里雲、騰訊雲等頂級供應商的解決方案。我們主要提供高效能 VPS（虛擬機），以滿足客戶的多樣化需求。

小豬科技的沙龍的其他內容

2024/11/18

AWS Certified Solutions Architect - Associate SAA-C03 證照考古題

AWS Certified SAA-C03證照考古題大全20241118 QUESTION 702 A development team is collaborating with another company to create an integrated product.

2024/11/18

AWS Certified Solutions Architect - Associate SAA-C03 證照考古題

AWS Certified SAA-C03證照考古題大全20241118 QUESTION 702 A development team is collaborating with another company to create an integrated product.

2024/11/18

AWS Certified Solutions Architect - Professional SAP-C02 考古題

AWS 專業架構師證照考古題大全 SAP-C02 20241118 QUESTION 124 A financial services company receives a regular data feed from its credit card servicing partner.

2024/11/18

AWS Certified Solutions Architect - Professional SAP-C02 考古題

AWS 專業架構師證照考古題大全 SAP-C02 20241118 QUESTION 124 A financial services company receives a regular data feed from its credit card servicing partner.

2024/11/13

AWS Certified Solutions Architect - Associate SAA-C03 證照考古題

QUESTION 673 A company needs to extract the names of ingredients from recipe records that are stored as text files in an Amazon S3 bucket.

2024/11/13

AWS Certified Solutions Architect - Associate SAA-C03 證照考古題

QUESTION 673 A company needs to extract the names of ingredients from recipe records that are stored as text files in an Amazon S3 bucket.

看更多

你可能也想看

渡狼／DL

蝦皮開箱｜TOMICA 吉伊卡哇烏薩奇兔兔小車

TOMICA第一波推出吉伊卡哇聯名小車車的時候馬上就被搶購一空，一直很扼腕當時沒有趕緊入手。前陣子閒來無事逛蝦皮，突然發現幾家商場都又開始重新上架，價格也都回到正常水準，估計是官方又再補了一批貨，想都沒想就立刻下單！同文也跟大家分享近期蝦皮購物紀錄、好用推薦、蝦皮分潤計畫的聯盟行銷！

#吉伊卡哇#開箱#蝦皮分潤計畫

2025/05/13

渡狼／DL

蝦皮開箱｜TOMICA 吉伊卡哇烏薩奇兔兔小車

#吉伊卡哇#開箱#蝦皮分潤計畫

2025/05/13

阿千看世界

2025年綜合所得稅繳稅教學：線上申報、信用卡回饋、拆單攻略！

每年4月、5月都是最多稅要繳的月份，當然大部份的人都是有機會繳到「綜合所得稅」，只是相當相當多人還不知道，原來繳給政府的稅！可以透過一些有活動的銀行信用卡或電子支付來繳，從繳費中賺一點點小確幸！就是賺個１%~2%大家也是很開心的，因為你們把沒回饋變成有回饋，就是用卡的最高境界所得稅線上申報

#2025所得稅#綜合所得稅#繳稅有回饋

2025/05/03

阿千看世界

2025年綜合所得稅繳稅教學：線上申報、信用卡回饋、拆單攻略！

#2025所得稅#綜合所得稅#繳稅有回饋

2025/05/03

數據分析師的真實日常週報

數據分析師的真實工作週報-種族與性別分析、市場分析、CDP導入、SEO分析

本篇週報記錄了數據分析師最近一週的重要工作內容，包括種族與性別分析、Amazon市場分析、購買人群統計資訊及 SEO 品牌字分組等等。透過以上議題的分析與執行過程，不僅能瞭解工作內容，也能學到數據分析的實戰議題，有助於減少行銷和數據分析方面的學習彎路。

#數據#品牌#用戶

2024/07/06

數據分析師的真實日常週報

數據分析師的真實工作週報-種族與性別分析、市場分析、CDP導入、SEO分析

#數據#品牌#用戶

2024/07/06

［沙龍］知識拼圖：Z的閱讀紀錄

《超級外送員：使命必達的省時戰》閱讀筆記

　　外送員究竟是怎麼樣的工作呢？外送產業從幾年前逐漸興起，有些公司為吸引人力，也紛紛拋出某些招募廣告詞：工作時間任選、自由彈性、月收入上看十萬等條件。　　然而真的是這樣嗎？本書針對這些說詞提出質疑，並探討外送員行業的勞動現場、工作制度、勞動安全等問題，進而讓我們瞭解外送行業的實際情況。

2024/06/15

2024/06/15

喜歡分享的漢服小兔子HANFURABBIT_JOY&LIFE

〔職場〕一個隨時把姓名血型放在腦袋後的工作（科技業、半導體產業）

一個很陌生，以前也從來沒想過會接觸的行業一個在這個行業扮演溝通角色的小螺絲釘 →關於科技業的工作環境＆條件

#職場故事#科技業#半導體

2024/05/31

喜歡分享的漢服小兔子HANFURABBIT_JOY&LIFE

〔職場〕一個隨時把姓名血型放在腦袋後的工作（科技業、半導體產業）

一個很陌生，以前也從來沒想過會接觸的行業一個在這個行業扮演溝通角色的小螺絲釘 →關於科技業的工作環境＆條件

#職場故事#科技業#半導體

2024/05/31

職涯停看聽

解密採購員的秘密生活：從談判高手到庫存守護者

嘿，你知道採購人員是怎麼讓公司運轉順利的嗎？如果你對這個角色充滿好奇，想了解他們的日常工作，那麼你來對地方了。今天，我們就來揭開採購人員工作的神秘面紗，帶你深入了解這個職位的方方面面，讓你在考慮這個職業時有個清晰的認識。

#職場故事#職業介紹#採購

2024/05/29

職涯停看聽

解密採購員的秘密生活：從談判高手到庫存守護者

#職場故事#職業介紹#採購

2024/05/29

專案管理的江湖人生的沙龍

【問問Ernest：職涯疑難雜症百百問 #011】已轉職超過半年但還是適應不良，是否要考慮轉換跑道？

這篇文章討論了一位在傳產工廠轉換到大型電子公司就職後適應問題的處境。文章分享了作者對工作上問題的分析以及新環境適應的建議，並歡迎進一步討論。

#轉職#適應不良#轉換跑道

2024/03/03

專案管理的江湖人生的沙龍

【問問Ernest：職涯疑難雜症百百問 #011】已轉職超過半年但還是適應不良，是否要考慮轉換跑道？

#轉職#適應不良#轉換跑道

2024/03/03

哈斯的煉金工房

採購人生｜採購二三問，要問什麼通通來

採購的職位介紹及實務經驗分享，也歡迎提出疑難雜症討論。

#自我介紹#職場故事#採購

2024/01/11

哈斯的煉金工房

採購人生｜採購二三問，要問什麼通通來

採購的職位介紹及實務經驗分享，也歡迎提出疑難雜症討論。

#自我介紹#職場故事#採購

2024/01/11

凱茜女孩的沙龍

你會在英文面試描述你的「工作量」嗎？

曾經有個學生問我：「要怎麼用英文描述自己每天的工作量，因為面試的時候需要回答這個問題，另外我也需要問面試官新職缺的工作量，要怎麼問比較好？」

2024/01/07

2024/01/07

上班到底是在上什麼？——我有培養出自己該有的sense嗎？

在學校學的，很多除了課本的知識外，大多就是由自己的生命經驗累積出來的常識（common sense）。然而，囿於個人的成長與教育背景、身邊的人脈圈與既有眼界，往往我們所擁有的常識（common sense）或許不日常，也不見得是他人的平常。上班到底在上些什麼呢？

#生成式AI#職場#職涯

2023/12/17

公關人生相談室的沙龍

上班到底是在上什麼？——我有培養出自己該有的sense嗎？

#生成式AI#職場#職涯

2023/12/17

Kristen的沙龍

美國面試考題 Stratascratch SQL ID 9847

有一張員工資料表，請找出每個部門在四月份之後加入的員工數，最後依員工數遞減排序。每日動動腦，用美國企業真實的面試考題來練習，每天7題，我會把免費的428題簡單SQL都做完，大概會用到兩個月，有需要的人，也可以直接搜ID。

#美國#面試#考題

2023/08/30

Kristen的沙龍

美國面試考題 Stratascratch SQL ID 9847

#美國#面試#考題

2023/08/30

叮叮咚小姐的沙龍

[面試心得] 2022 Software Engineer 軟體工程師新鮮人面試心得

背景 112CS 學士應屆沒有演算法競賽經驗在LINE實習過兩年 Overview Google - Offer Get (面試心得) Synology - Offer Get Foodpanda - Offer Get Taboola - 面試後感謝信 Yahoo - onboard 時間沒辦

#面試#工程師#Google

2022/12/13

叮叮咚小姐的沙龍

[面試心得] 2022 Software Engineer 軟體工程師新鮮人面試心得

#面試#工程師#Google

2022/12/13

追蹤感興趣的內容從 Google News 追蹤更多 vocus 的最新精選內容追蹤 Google News