7 Ways Amazon Glacier Fails at Cloud Archiving vs. HubStor on Microsoft Azure
News as reported by the company. Updated September 16, 2016.
Author: Geoff Bourgeois -- Geoff Bourgeois is a cloud storage enthusiast and veteran enterprise software pro. He is also CEO and co-founder of HubStor, the world’s first data-aware cloud archive for business.
Amazon Glacier, the archive cloud storage available in Amazon Web Services (AWS), is a great backup tape replacement, but IT professionals often assume it is a true enterprise-grade cloud archive solution.
HubStor receives regular interest from IT professionals looking for alternatives to Amazon Glacier. They’re looking for something that offers the low cost and convenience of public cloud, has a more convenient cloud gateway approach, and is also fully managed, effortless to adopt, and meets their data governance and retrieval needs.
#1 -- FAST, EASY RECOVERY FROM THE CLOUD
Before storing copious amounts of your precious data in the cloud, you need a clear picture of how retrieval will work.
With Amazon Glacier, retrieval is anything but simple. Here’s what Amazon says about downloading your data from Glacier:
Amazon Glacier provides a management console, which you can use to create and delete vaults. However, you cannot download archives from Amazon Glacier by using the management console. To download data, such as photos, videos, and other documents, you must either use the AWS CLI or write code to make requests, by using either the REST API directly or by using the AWS SDKs.
You also have to store your data in sets. When retrieving, you have to request a retrieval job on a set and then -- if you’re only interested in retrieving specific data within a set -- you need to programmatically specify a retrieval range to avoid data transfer costs on the full set.
There’s more complexity:
Most Amazon Glacier [retrieval] jobs take about four hours to complete. Amazon Glacier must complete a job before you can get its output. A job will not expire for at least 24 hours after completion, which means you can download the output within the 24-hour period after the job is completed.
Glacier’s complex and slow retrieval make it difficult to see how it could be anything more than a tape backup replacement.
HubStor’s Approach: Self-service Cloud Export
HubStor makes it easy to get your data out of the cloud.
It includes a self-service user access portal which you can provide to an unlimited number of your users (with instant access to browse, download, search, and even share cloud-archived content).
For bulk extraction and recovery scenarios, HubStor also includes a downloadable export utility for privileged users. The export utility provides a simple interface allowing an authenticated and authorized user to recover data on-demand, either downloading the folders and content to a new folder or merging things back to their original place. It’s handy for bulk downloading certain folders, entire folder structures, and discovery cases you might have created in HubStor’s Admin Portal.
It’s also the same tool you’d use if you ever decided to stop using HubStor and needed to egress your entire archive.
Unlike Glacier, HubStor doesn’t place hurdles between you and your data. There’s no arbitrary process to retrieve your data. On one hand, you can optionally provide knowledge workers with self-service access to their cloud archives. And super users have an intuitive user interface for bulk downloading specific content on demand. Retrieval from HubStor always happens instantaneously.
#2 -- FULLY MANAGED SOLUTION
Archiving typically isn’t something businesses want to do -- it’s something they have to do. It’s also a complex undertaking. Amazon’s cloud services provide a platform for archiving, not a solution. As we see from Amazon’s positioning of its archiving solution:
Amazon Web Services offers a complete set of cloud storage services for archiving. You can choose Amazon Glacier for affordable, non-time sensitive cloud storage, or Amazon Simple Storage Service (S3) for faster storage, depending on your needs. With AWS Storage Gateway and our solution provider ecosystem, you can build a comprehensive, storage solution.
To achieve enterprise-grade cloud archiving with Amazon solutions, you need to deal with Glacier’s complex retrieval paradigm. Your solution might involve deployment and configuration of the AWS Storage Gateway. You might need partner products such as StorReduce if things like deduplication are important to you. If you want secure access to the data you’ll need to go deeper in the Amazon cloud stack with AWS Identity and Access Management (IAM), with custom development if you want synchronized authorization. If you need compliance storage, you need to dig into Amazon Glacier Vault Lock. And if you need to search your data in the cloud, wrap your mind around Amazon CloudSearch.
If you’re busy with other projects and just want cloud archiving to be simple, Amazon Glacier isn’t it. Leveraging Amazon’s cloud storage services for archiving in a meaningful way for your business requires expertise, time, and careful planning. And even if you do manage to connect all the dots, perhaps involving AWS partner solutions, you’re likely still missing key features to support your enterprise archive use cases.
HubStor’s Approach: Fully Managed Cloud Archive
HubStor’s approach connects all the dots for you. The complete solution environment consisting of infrastructure and software is fully managed for you, whether you deploy into your own account in Microsoft Azure or under HubStor’s.
#3 -- THE CLOUD GATEWAY
How does your data get to the cloud?
Both the Amazon Storage Gateway and HubStor’s virtual cloud gateway are run on a simple virtual machine.
The AWS Storage Gateway presents a drive on your network which locally caches active data while floating everything else up to the cloud. Each instance of the AWS storage gateway that you run has a monthly fee along with its own storage and data transfer costs.
Here’s the rub: Not only does Amazon’s gateway add infrastructure for you to manage with its requirement to carve out on-premises storage capacity, but to use the gateway’s storage volume you also have a data migration problem to solve. It’s less than ideal for IT managers who want to simplify their lives and avoid the brouhaha of a never-ending data migration.
Unlike Amazon, HubStor’s gateway is included in the cloud archive subscription at no extra cost, and instead of being a storage mount point, HubStor’s gateway cloud-extends your existing storage investments with automated data migration policies.
HubStor’s Approach: The Virtual Cloud Gateway
HubStor’s virtual cloud gateway is an engine that runs archiving and data protection policies against your existing storage investments and -- based on your rules -- synchronizes and migrates targeted data to the cloud. We do this with permissions and folders intact so that you data maintains the same security and structure in the cloud.
In our gateway, you define the data that syncs/moves to the cloud using rules based on file type, last accessed, size, etc. Your rules run automatically based on a schedule, and you can control their impact on your network during peak and off-peak hours with bandwidth throttling.
If you have multiple facilities with storage, you can run an instance of HubStor’s virtual gateway in each location to locally sync and migrate your distributed storage environment. HubStor scales out across Azure regions if you need to maintain data sovereignty for certain locations. But when you archive to a common target in HubStor’s cloud archive you get deduplication across all the storage serviced by multiple HubStor gateways.
HubStor’s cloud gateway approach is easier because it handles the data migration problem for you, and it doesn’t add infrastructure. HubStor essentially cloud-enables your existing storage infrastructure as opposed to showing up as more storage for you to manage.
#4 -- STORAGE REDUCTION, ON-PREMISES AND IN THE CLOUD
Reducing storage on-premises isn’t something Glacier helps you with in any automated fashion.
In HubStor’s virtual cloud gateway, optional policies let you delete or create stubs out of the originals after archiving to the cloud. This frees up space in your storage arrays, allowing you to simplify backup and defer spending on new storage.
Regarding storage reduction in the cloud, beware of cloud storage solutions that price themselves on storage volume and base their calculation on the logical size of your data. For instance, if you archive 50 TB of data they are probably compressing and eliminating duplicates from this dataset to physically store perhaps 20 TB, but you are still paying to store 50 TB. In Glacier’s case, there is no native deduplication but there is compression. If you want deduplication you have to run a third-party product from StorReduce, which performs block-level deduplication.
HubStor performs file-level deduplication and compression. Storage reduction savings are visible to you in the HubStor Admin Portal. And yes, our pricing is transparently based on your storage capacity after dedupe and compression.
#5 -- DATA AWARE CLOUD STORAGE
In my opinion, if you’re archiving enterprise data to the cloud, you need to think ahead on cloud data management. Cloud storage shouldn’t be a dangerous black hole. Like black hole Gargantua in the movie ‘Interstellar’, you don’t want to find yourself floating about like spaceman Matthew McConaughey in the Tesseract trying to find a way to deal with your archive’s data gravity.
Amazon Glacier is not data aware storage in the cloud. It’s just low-cost storage. There are no built in data governance or analytics features. In my opinion, this alone places Glacier in the category of tape-replacement backup solution, not an enterprise archiving solution. How you’ll one day manage data in Glacier is left for you to sort out.
HubStor’s Approach: Data Awareness with Integrated Data Governance Apps
HubStor’s data aware cloud storage is an advantage for everyday IT operations, security, compliance, and litigation scenarios.
(What you’re about to read is ahead-of-the-curve functionality in cloud storage…)
HubStor’s scale-out cloud storage fabric runs a storage analytics and activity intelligence engine. We call it ‘data aware storage’ for short. It works quietly behind the scenes to maintain insights and surface perspectives for admin users that make audits, investigations, data management, and monitoring hassle-free.
Data aware storage shows you what’s in your archive, how your policies are doing, what data is sensitive or on legal hold, and how users are interacting with content. And you can build and track your own queries to generate the insights you need.
Each of the following use cases can be stimulated and optimized by data aware insights:
1) Reviewing access rights.
2) Managing retention.
3) Detecting private/sensitive data.
4) Applying holds.
5) Performing discovery.
The policy engine in HubStor that links its query-ready cloud storage fabric with its data management controls runs near real-time -- there’s nothing you need to tinker with or learn administratively to realize this value.
Cloud storage, like a traditional file server, can be like a black hole for your data. Businesses need insight into what they are storing. Managing retention -- actually deleting things in a defensible manner -- is a concern for decision makers. Your data in the cloud is not immune to needing legal holds, tagging, and auditing. If your data rests in a black hole, you’re sitting on a powder keg of risk.
#6 -- SELF-SERVICE USER ACCESS
With most archiving there is often the desire to have self-service user access. This can involve different approaches:
1) Shortcuts or stubs left behind in the original storage that give the users a quick link in Windows Explorer (or the native application) to retrieve from the cloud as needed, and/or . . .
2) A Web-based interface that empowers users to browse, download, search, and even share their content.
Amazon Glacier doesn’t handle self-service user access. You need to develop your own solution or add third-party products to the mix. This will involve becoming familiar with AWS Identity and Access Management (IAM), plus finding/developing a way to handle the authorization problem.
HubStor’s Approach: Self-service Web Access and Optional Policy-based Stubbing
Recall our earlier discussion of how HubStor’s gateway captures the permissions when writing data or syncing changes to the cloud. When accessing their content in the cloud archive, HubStor ensures users see only their content according to the synchronized permissions on folders and items. Access rights just transparently keep in sync.
Inactive data sitting in user file shares and home directories is a great example of data you might want to stub. HubStor’s virtual cloud gateway includes an agentless stubbing approach that is controllable with policies. For example, you might create stubs out of PDF files that are larger than 25 MB and haven’t been accessed in over one year. When a user clicks the shortcut, they see that the data has been cloud archived and they have the option to instantly retrieve it or view it in the Web portal.
#7 -- SEARCH AS A SERVICE
Businesses with compliance or litigation activity need search for the discoverability of their data. Search is also a great feature inside of self-service user access.
Amazon CloudSearch is a search service in the AWS cloud. Both Amazon CloudSearch and HubStor’s search-as-a-service are fully managed for you in the cloud, including setup, maintenance, configuration, monitoring, and management.
But unlike HubStor’s search, Amazon CloudSearch requires you to roll up your sleeves to make meaningful use of it. For example:
1) You want users to have self-service searching of their data and they should only see their content in search results.
2) Discovery searches are needed and you must to take actions on the search results.
Remember that Amazon Storage Gateway doesn’t synchronize folders and permissions when it sends data to the cloud, so you won’t natively be able to scope searches by folders, users, groups, or data owners if you use CloudSearch to index your data in AWS S3 or Glacier.
HubStor’s Approach: Customizable Scoped Indexing
HubStor provides an on-demand search service designed to mitigate search costs and accelerate discovery methods. It is fully integrated to support self-service user access and discovery scenarios. HubStor’s CoolSearch was also introduced this summer to support searchable cold storage configurations.
Because full content indexing can be a compute and storage intensive operation, HubStor uniquely enables its customers to specify rules that control the indexing scope within their archive.
For instance, an indexing policy might specify that only certain users’ data needs to be keyword searchable. Instead of keyword indexing the entire corpus of data in a large archive, HubStor lets you specify the targeted portions of your archive that need to be keyword searchable. At any time, you can modify this scope as needed.
HubStor’s search service is enterprise-friendly with the following features:
1) Access to HubStor’s search -- whether by end users or privileged discovery users -- is controlled through HubStor’s role-based access.
2) Discovery users can add search results to a case, place holds, and easily export results with auditing.
3) Each archive in HubStor can have a customized list of metadata fields that are pushed into the index.
4) By default, you can search on folders, item types, item names, last accessed, last modified, users, groups, and data owners.
5) Search can also be leveraged to inspect and tag content if it contains private/sensitive data such as credit card numbers.
This is by no means a knock against Amazon’s solutions. I am just calling it like it is: Amazon Glacier is a platform cloud service that is just one piece of the enterprise archiving puzzle.
Infrastructure and Platform-as-a-Service (IaaS and PaaS) -- whether Amazon Web Services or Microsoft Azure -- are awesome cloud computing solutions, but true enterprise-ready cloud archiving requires a Software-as-a-Service (SaaS) solution higher in the stack to make everything work hassle-free the way you expect.