Homework 3: Data Warehousing

Practice data warehousing in BigQuery with external tables, partitioning, clustering, and SQL analysis.
Instructions
Deadline: 9 February 2026, 23:59
Shown in your timezone.
Correct answers are available below

This homework has already been scored. Log in to view your own submission, score, and feedback.

Log in to view my results

Correct answers

1. What is count of records for the 2024 Yellow Taxi Data? (1 point)

2. What is the estimated amount of data that will be read when this query is executed on the External Table and the Table? (1 point)

3. Why are the estimated number of Bytes different? (1 point)

4. How many records have a fare_amount of 0? (1 point)

5. What is the best strategy to make an optimized table in Big Query if your query will always filter based on tpep_dropoff_datetime and order the results by VendorID (Create a new table with this strategy) (1 point)

6. Write a query to retrieve the distinct VendorIDs between tpep_dropoff_datetime 2024-03-01 and 2024-03-15 (inclusive). Use the materialized table you created earlier in your from clause and note the estimated bytes. Now change the table in the from clause to the partitioned table you created for question 5 and note the estimated bytes processed. What are these values? (1 point)

7. Where is the data stored in the External Table you created? (1 point)

8. It is best practice in Big Query to always cluster your data: (1 point)

9. Write a `SELECT count(*)` query FROM the materialized table you created. How many bytes does it estimate will be read? Why? (not graded)