Transfer
10 functions for moving data between local storage, cloud providers, and public repositories. All transfers support resume, checksums, and progress callbacks.
FTP
ftp_download / ftp_upload
ftp_download(url, path?) -> string
ftp_upload(path, url) -> string
# Download from NCBI FTP
ftp_download(
"ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/GCA_000001405.15_GRCh38_genomic.fna.gz",
"reference/GRCh38.fa.gz"
)
# Download from Ensembl
ftp_download(
"ftp://ftp.ensembl.org/pub/release-111/gtf/homo_sapiens/Homo_sapiens.GRCh38.111.gtf.gz",
"annotation/GRCh38.gtf.gz"
)
Cloud Storage
s3_download / s3_upload
s3_download(s3_url, local_path?) -> string
s3_upload(local_path, s3_url) -> string
Uses s3://bucket/key URL format. Delegates to aws s3 cp under the hood.
# Download from public S3 bucket
s3_download("s3://1000genomes/release/20130502/ALL.chr1.vcf.gz", "data/chr1.vcf.gz")
# Upload results to private bucket
s3_upload("results/analysis.csv", "s3://my-lab-bucket/project-x/results.csv")
# Auto-infer local filename from S3 key
s3_download("s3://lab-data/samples/S001.bam")
gcs_download / gcs_upload
gcs_download(gs_url, local_path?) -> string
gcs_upload(local_path, gs_url) -> string
Uses gs://bucket/object URL format. Delegates to gsutil cp under the hood.
# Download from Google Cloud Storage
gcs_download("gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta", "ref.fa")
# Upload to project bucket
gcs_upload("results.bam", "gs://my-project-data/outputs/sample1.bam")
Specialized Transfers
rsync
Efficient incremental file synchronization.
rsync(source, destination, opts?) -> map
rsync("user@hpc.lab.edu:/data/project/", "local_data/", {
exclude: ["*.tmp", "*.log"],
compress: true,
dry_run: false
})
aspera_download
High-speed download using Aspera (ascp). Ideal for large genomic datasets from EBI/NCBI.
aspera_download(source, destination?) -> string
aspera_download(
"era-fasp@fasp.sra.ebi.ac.uk:/vol1/fastq/SRR123/SRR1234567_1.fastq.gz",
"raw_data/SRR1234567_1.fastq.gz"
)
sra_fastq
Download FASTQ files from NCBI SRA by accession. Resolves the fastest download source automatically (AWS, GCP, NCBI FTP, or Aspera).
sra_fastq(accession, outdir, opts?) -> list
| Option | Type | Description |
|---|---|---|
| split | bool | Split paired-end reads (default: true) |
| source | string | Force source: "aws", "gcp", "ncbi", "aspera" |
Returns: list<string> — paths to downloaded FASTQ files
let fastqs = sra_fastq("SRR1234567", "raw_data/")
# ["raw_data/SRR1234567_1.fastq.gz", "raw_data/SRR1234567_2.fastq.gz"]
# Download multiple accessions
let accessions = ["SRR1234567", "SRR1234568", "SRR1234569"]
let all_files = accessions |> map(|acc| sra_fastq(acc, "raw_data/")) |> flatten
println("Downloaded", len(all_files), "FASTQ files")