SnowPro Advanced: Data Engineer — Question 44
A CSV file, around 1 TB in size, is generated daily on an on-premise server. A corresponding table, internal stage, and file format have already been created in Snowflake to facilitate the data loading process.
How can the process of bringing the CSV file into Snowflake be automated using the LEAST amount of operational overhead?
Answer options
- A. Create a task in Snowflake that executes once a day and runs a COPY INTO statement that references the internal stage. The internal stage will read the files directly from the on-premise server and copy the newest file into the table from the on-premise server to the Snowflake table.
- B. On the on-premise server, schedule a SQL file to run using SnowSQL that executes a PUT to push a specific file to the internal stage. Create a task that executes once a day in Snowflake and runs a COPY INTO statement that references the internal stage. Schedule the task to start after the file lands in the internal stage.
- C. On the on-premise server, schedule a SQL file to run using SnowSQL that executes a PUT to push a specific file to the internal stage. Create a pipe that runs a COPY INTO statement that references the internal stage. Snowpipe auto-ingest will automatically load the file from the internal stage when the new file lands in the internal stage.
- D. On the on-premise server, schedule a Python file that uses the Snowpark Python library. The Python script will read the CSV data into a DataFrame and generate an INSERT INTO statement that will directly load into the table. The script will bypass the need to move a file into an internal stage.
Correct answer: B
Explanation
Option B is correct because it efficiently schedules the file transfer using SnowSQL and waits for the file to be ready before executing the COPY INTO statement. Option A incorrectly assumes direct access to the on-premise server from Snowflake, which is not feasible. Option C relies on Snowpipe, which, while automated, involves more complexity and setup. Option D adds unnecessary complexity by using Python to read and insert data, rather than leveraging the existing Snowflake capabilities.