Homework 3: Data Warehousing for Data Engineering Zoomcamp 2026

Due date: 2026-02-09T23:59:59+00:00 (local time)

Please log in to access this homework.

Questions

Question 1. What is count of records for the 2024 Yellow Taxi Data? (1 point)

Question 2. What is the estimated amount of data that will be read when this query is executed on the External Table and the Table? (1 point)

Question 3. Why are the estimated number of Bytes different? (1 point)

Question 4. How many records have a fare_amount of 0? (1 point)

Question 5. What is the best strategy to make an optimized table in Big Query if your query will always filter based on tpep_dropoff_datetime and order the results by VendorID (Create a new table with this strategy) (1 point)

Question 6. Write a query to retrieve the distinct VendorIDs between tpep_dropoff_datetime 2024-03-01 and 2024-03-15 (inclusive). Use the materialized table you created earlier in your from clause and note the estimated bytes. Now change the table in the from clause to the partitioned table you created for question 5 and note the estimated bytes processed. What are these values? (1 point)

Question 7. Where is the data stored in the External Table you created? (1 point)

Question 8. It is best practice in Big Query to always cluster your data: (1 point)

Question 9. Write a `SELECT count(*)` query FROM the materialized table you created. How many bytes does it estimate will be read? Why? (not graded)

Please provide a valid URL.

Please log in to access this homework.

Status: Not submitted