Cloud Storage Types Across AWS, Google Cloud, and Azure
Why storage choice matters
Every data platform decision starts with storage. Get it wrong and you inherit performance bottlenecks, spiralling costs, and migration headaches that compound over years. Get it right and your warehouse queries run faster, your pipelines cost less, and your architecture evolves without rewrites.
The three hyperscalers — AWS, Google Cloud, and Microsoft Azure — each offer storage services that look similar on the surface. The naming conventions differ, the pricing models diverge, and the integration points vary in ways that matter when you are building production data infrastructure.
This is not a feature-by-feature matrix. It is a guide to making architectural storage decisions based on access patterns, cost sensitivity, and where your data platform is heading.
Object storage
Object storage is the foundation of every modern data platform. It stores unstructured data as discrete objects — each with a unique key, the data itself, and metadata — in a flat namespace. There is no directory hierarchy, though key prefixes simulate folder structures.
The services
- AWS S3 — The original cloud object store. 11 nines of durability, multiple storage classes (Standard, Intelligent-Tiering, Glacier), and deep integration with every AWS service.
- Google Cloud Storage — Unified API across storage classes (Standard, Nearline, Coldline, Archive). Tight coupling with BigQuery for direct querying.
- Azure Blob Storage — Hot, Cool, and Archive tiers. Native integration with Azure Data Lake Storage Gen2 via hierarchical namespace.
When to use it
Object storage is the right default for most data engineering workloads. Raw data landing zones, data lakes, pipeline staging areas, model artefact storage, backup and archival — all belong in object storage.
If your data does not need sub-millisecond access or POSIX file semantics, it should almost certainly live in object storage.
Key trade-offs
S3 has the deepest ecosystem and the most mature tooling. Google Cloud Storage wins on simplicity — a single API covers all tiers — and its BigQuery integration means you can query data in place without moving it. Azure Blob Storage is the strongest choice if your organisation already runs on Microsoft infrastructure, particularly when paired with ADLS Gen2 for analytical workloads.
Cost varies by access pattern. All three providers charge for storage volume, API operations, and egress. For infrequently accessed data, lifecycle policies that automatically transition objects between tiers can reduce costs by 60-80%.
Block storage
Block storage provides raw, low-level storage volumes that attach to compute instances. Data is stored in fixed-size blocks with no inherent structure — the operating system or application manages the filesystem on top.
The services
- AWS EBS (Elastic Block Store) — SSD-backed (gp3, io2) and HDD-backed (st1, sc1) volumes. Provisioned IOPS for latency-sensitive workloads.
- Google Persistent Disk — Standard (HDD) and SSD options. Regional persistent disks for high availability across zones.
- Azure Managed Disks — Premium SSD, Standard SSD, Standard HDD, and Ultra Disk for extreme IOPS requirements.
When to use it
Block storage is for workloads that need low-latency random I/O: databases (PostgreSQL, MySQL, MongoDB), transactional systems, boot volumes for compute instances, and any application that requires a traditional filesystem.
In data engineering, block storage is typically used for the compute nodes running your database engines, Spark executors that need local scratch space, or Kafka brokers managing commit logs.
Key trade-offs
Performance is the primary differentiator. AWS io2 Block Express delivers up to 256,000 IOPS. Azure Ultra Disk reaches similar numbers. Google Persistent Disk trails slightly on peak IOPS but offers simpler pricing. All three charge based on provisioned capacity — you pay for what you allocate, not what you use.
Block storage is not for data lakes or analytical storage. The cost per gigabyte is 5-10x higher than object storage, and the data is tightly coupled to a single compute instance or availability zone.
File storage
File storage provides shared filesystems accessible over network protocols (NFS, SMB). Multiple compute instances can read and write to the same filesystem simultaneously.
The services
- AWS EFS (Elastic File System) — Fully managed NFS. Scales automatically, pay-per-use, with throughput modes for different workload profiles.
- Google Filestore — Managed NFS with performance tiers (Basic HDD, Basic SSD, Enterprise). Fixed capacity provisioning.
- Azure Files — Supports both SMB and NFS protocols. Premium (SSD) and Standard (HDD) tiers. Integrates with Azure File Sync for hybrid scenarios.
When to use it
File storage solves a specific problem: shared state across multiple compute instances with POSIX filesystem semantics. Common use cases include legacy application migration (where applications expect a mounted filesystem), shared configuration across a fleet of servers, CMS platforms, and machine learning training jobs that read datasets from a common mount point.
Key trade-offs
AWS EFS is the most flexible — elastic scaling with no capacity planning required — but it is also the most expensive at scale. Google Filestore requires capacity provisioning upfront, which means you pay for allocated space regardless of usage. Azure Files is the strongest option for organisations that need SMB protocol support (Windows workloads) alongside NFS.
For data engineering specifically, file storage is rarely the right primary choice. It is more expensive than object storage, slower than block storage for single-instance workloads, and adds operational complexity. Use it when your tooling genuinely requires POSIX semantics — otherwise, object storage with a caching layer is typically more cost-effective.
Data lakes and analytical storage
The most consequential storage decision for data teams is the analytical layer: where does your structured and semi-structured data live for querying, transformation, and machine learning?
The market has converged on the lakehouse pattern — combining the low-cost scalability of data lakes with the query performance of warehouses. Each cloud provider approaches this differently.
AWS: S3 + Lake Formation + Athena / Redshift Spectrum
AWS separates storage (S3) from compute (Athena, Redshift, EMR). Lake Formation provides governance, access control, and catalogue management on top of the S3-based data lake. You can query data in place using Athena (serverless) or Redshift Spectrum (warehouse-integrated).
Advantages: Maximum flexibility. You control the storage format (Parquet, Iceberg, Delta), the query engine, and the governance layer independently. Open table formats like Apache Iceberg are first-class citizens.
Best for: Teams that want full control over their stack and are comfortable managing the integration between components.
Google Cloud: BigQuery + Cloud Storage
Google's approach is the most opinionated — and often the simplest. BigQuery is simultaneously a warehouse and a lake. BigQuery Storage (managed columnar storage) handles structured data, while BigQuery external tables query Parquet and ORC files directly in Cloud Storage.
Advantages: Minimal operational overhead. Serverless pricing (pay per query or flat-rate slots). Built-in ML via BigQuery ML. The fastest path from raw data to analytics for teams that want to move quickly.
Best for: Teams that prioritise speed of delivery over architectural flexibility. Particularly strong for organisations under 50TB of active data.
Azure: ADLS Gen2 + Synapse Analytics
Azure Data Lake Storage Gen2 is Blob Storage with a hierarchical namespace — a genuine filesystem layer over object storage. Synapse Analytics provides serverless SQL pools, dedicated SQL pools, and Spark pools that all read from the same ADLS Gen2 storage layer.
Advantages: Strong hybrid story for organisations already invested in Microsoft. Unity between data lake and warehouse within a single service. Deep integration with Power BI and the broader Microsoft analytics stack.
Best for: Enterprise organisations with existing Microsoft agreements and Power BI as their primary BI tool.
Choosing the right storage
Storage selection is architectural, not operational. The right choice depends on three factors:
Access pattern. How is the data read and written? Sequential scans favour object storage. Random reads favour block storage. Shared concurrent access requires file storage. Analytical queries point to a lakehouse.
Cost profile. Object storage is cheapest per gigabyte. Block storage is most expensive but necessary for latency-sensitive applications. Analytical storage costs vary dramatically by query pattern — BigQuery's per-query pricing rewards infrequent access, while Redshift's provisioned model rewards sustained throughput.
Ecosystem gravity. If your team lives in the AWS ecosystem with Terraform, Airflow, and dbt, choosing Azure Synapse creates unnecessary friction. If your organisation is a Microsoft shop with Power BI dashboards and Azure DevOps, fighting that gravity for BigQuery rarely makes sense.
The best storage decision is the one your team can operate, monitor, and evolve without heroic effort.
Most production data platforms use multiple storage types in concert: object storage for the raw landing zone, a lakehouse for analytics, and block storage for the databases that power operational systems. The architecture is rarely one or the other — it is understanding which layer serves which purpose, and keeping the boundaries clean.
Where we stand
At Neo Analytica, we build data platforms across all three clouds. We have delivered lakehouse architectures on S3 with Iceberg, BigQuery-native platforms on Google Cloud, and ADLS Gen2 + Synapse deployments for enterprise clients. The technology matters less than the architecture decisions behind it.
If you are evaluating your storage layer — or inheriting one that needs modernising — that is exactly the conversation we are built for.