Transfer

10 functions for moving data between local storage, cloud providers, and public repositories. All transfers support resume, checksums, and progress callbacks.

FTP

ftp_download / ftp_upload

ftp_download(url, path?) -> string
ftp_upload(path, url) -> string
# Download from NCBI FTP
ftp_download(
  "ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/GCA_000001405.15_GRCh38_genomic.fna.gz",
  "reference/GRCh38.fa.gz"
)

# Download from Ensembl
ftp_download(
  "ftp://ftp.ensembl.org/pub/release-111/gtf/homo_sapiens/Homo_sapiens.GRCh38.111.gtf.gz",
  "annotation/GRCh38.gtf.gz"
)

Cloud Storage

s3_download / s3_upload

s3_download(s3_url, local_path?) -> string
s3_upload(local_path, s3_url) -> string

Uses s3://bucket/key URL format. Delegates to aws s3 cp under the hood.

# Download from public S3 bucket
s3_download("s3://1000genomes/release/20130502/ALL.chr1.vcf.gz", "data/chr1.vcf.gz")

# Upload results to private bucket
s3_upload("results/analysis.csv", "s3://my-lab-bucket/project-x/results.csv")

# Auto-infer local filename from S3 key
s3_download("s3://lab-data/samples/S001.bam")

gcs_download / gcs_upload

gcs_download(gs_url, local_path?) -> string
gcs_upload(local_path, gs_url) -> string

Uses gs://bucket/object URL format. Delegates to gsutil cp under the hood.

# Download from Google Cloud Storage
gcs_download("gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta", "ref.fa")

# Upload to project bucket
gcs_upload("results.bam", "gs://my-project-data/outputs/sample1.bam")

Specialized Transfers

rsync

Efficient incremental file synchronization.

rsync(source, destination, opts?) -> map
rsync("user@hpc.lab.edu:/data/project/", "local_data/", {
  exclude: ["*.tmp", "*.log"],
  compress: true,
  dry_run: false
})

aspera_download

High-speed download using Aspera (ascp). Ideal for large genomic datasets from EBI/NCBI.

aspera_download(source, destination?) -> string
aspera_download(
  "era-fasp@fasp.sra.ebi.ac.uk:/vol1/fastq/SRR123/SRR1234567_1.fastq.gz",
  "raw_data/SRR1234567_1.fastq.gz"
)

sra_fastq

Download FASTQ files from NCBI SRA by accession. Resolves the fastest download source automatically (AWS, GCP, NCBI FTP, or Aspera).

sra_fastq(accession, outdir, opts?) -> list
OptionTypeDescription
splitboolSplit paired-end reads (default: true)
sourcestringForce source: "aws", "gcp", "ncbi", "aspera"

Returns: list<string> — paths to downloaded FASTQ files

let fastqs = sra_fastq("SRR1234567", "raw_data/")
# ["raw_data/SRR1234567_1.fastq.gz", "raw_data/SRR1234567_2.fastq.gz"]

# Download multiple accessions
let accessions = ["SRR1234567", "SRR1234568", "SRR1234569"]
let all_files = accessions |> map(|acc| sra_fastq(acc, "raw_data/")) |> flatten
println("Downloaded", len(all_files), "FASTQ files")