With Redshift we can select data and send to data sources available to us in AWS Cloud. Data sources like RDS, Athena, or S3. With the UNLOAD command, we can save files in CSV or JSON format directly to S3.
Access to an AWS Redshift cluster, access to the query editor, and an IAM role with permissions to write to the S3 location. For more information about creating IAM roles for Redshift see AWS docs here: Redshift and IAM Roles
To use S3 as a data source for Redshift and to export data, first write the query to export the data. In this example we’ll have a single table, shows with a list of shows with show titles and descriptions:
select * from shows;
|America’s Got Talent||AGT Live in Hollywood!|
|Magic on the Strip||Las Vegas Magic|
|Live Music on the Plaza||Modesto Ca.|
We can use that select query with an UNLOAD command. To do so we’l need to define an S3 location, an IAM role for permissions, and any options to include. For example, we’ll use CSV to export data in CSV format as an option.
UNLOAD ('select * from shows') to 's3://redshift-output/shows/' authorization 'aws_iam_role=arn:aws:iam::<account-id>:role/<role-name>' CSV;
After running the UNLOAD statement in the query editor, you can find your results saved in S3 with the path s3://redshift-output/shows/. By default UNLOAD will write files in the format 0000_part_00, 0001_part_00, etc. Depending on how many slices your cluster has, a file will be written for each slice. If your query is small, some of the output files will be empty.
The UNLOAD command takes a string for your query, so if you need quotes '' in it, select * from shows where title=‘AGT’ for example, you’ll need to escape the quotes. I use the double quote escape syntax, with '''' for each quote:
UNLOAD ('select * from shows where title=''''AGT''''') to 's3://redshift-output/shows.csv' authorization 'aws_iam_role=arn:aws:iam::<account-id>:role/<role-name>' CSV;
File Output Options
Options for outputting to S3 include file size, force one file, file name prefix, and more. Use the MAXFILESIZE option to dictate file size, with 5 MB being the smallest, and 6.2 GB the largest. To output data in 5 MB files, with a prefix of “show”, and overwritting files in the destination:
UNLOAD ('select * from shows') to 's3://redshift-output/shows/show' authorization 'aws_iam_role=arn:aws:iam::<account-id>:role/<role-name>' CSV maxfilesize 5 MB allowoverwrite;
To force only one file, unless the output is larger than 5 MB use parallel off :
UNLOAD ('select * from shows') to 's3://redshift-output/shows/show' authorization 'aws_iam_role=arn:aws:iam::<account-id>:role/<role-name>' CSV maxfilesize 5 MB allowoverwrite paralell off;