In a world where numbers and figures play a vital role in shaping businesses, cloud storage has become an invaluable tool. Whether you’re reviewing sales from H1 or you’re just trying to upload a video, you need somewhere for that data to reside.
A lot of people are using Cloud Storage, the unified object store from Google Cloud Platform (GCP), as a medium for this type of general storage. While storing an object in the cloud in itself is an easy task, making sure you have the soundest approach for the situation you are in requires a bit more forethought.
One of the benefits of having a scalable and limitless storage service is that there are going to be some buckets and objects that you really can’t justify holding onto. These items incur a cost over time and whether you need them for business purposes or are just holding onto them on the off chance that they might someday be useful, the first step is creating a practice around how to identify the usefulness of an object/bucket to your business.
There are multiple factors to consider when you’re looking to optimise Cloud Storage cost. The trick here is to ensure that there are no performance impacts and that nothing is thrown away that may need to be retained for future purposes, whether that be compliance, legal or simply business value purposes.
With data emerging as a top business commodity, you’ll want to use appropriate storage classes in the near term as well as for longitudinal analysis. There is a multitude of storage classes to choose from, all with varying costs, durability and resiliency. There are rarely one-size-fits-all approaches to anything when it comes to cloud architecture. The natural starting point is to first understand “What costs me money?” when using Cloud Storage.
The first thing to consider when looking at a data type is its retention period. Asking yourself questions like “Why is this object valuable?” and “For how long will this be valuable?” are critical to help determine the appropriate lifecycle policy. Setting a lifecycle policy lets you tag specific objects or buckets and creates an automatic rule that will delete or even transform storage classes for that particular object or bucket type. Think of this as your own personal butler that will systematically ensure that your attic is organised and clean but instead of costing money, this butler is saving you money for these operations.
A great application is for compliance in legal discovery. Depending on your industry and data type, there are certain laws that regulate the data type that needs to be retained and the period for which it must be retained. Using a Cloud Storage lifecycle policy, you can instantly tag an object for deletion once it has met the minimum threshold for legal compliance needs, ensuring you aren’t charged for retaining it longer than is needed and you don’t have to remember which data expires when. To make this simpler, Cloud Storage has a bucket lock feature to minimise the risk for accidental deletion. If you’re concerned with FINRA, SEC and CFTC, this is a particularly useful feature. Bucket lock may also help you address certain healthcare industry retention regulations.
Within Cloud Storage, you can also set policies to transform a storage type to a different class. This is particularly useful for data that will be accessed relatively frequently for a short period of time but then won’t be needed for frequent access in the long term. You might want to retain these particular objects for a longer period of time for legal or security purposes or even general long-term business value.
A great way to put this in practice is within a lab environment. Once you complete an experiment, you likely want to analyse the results quite a bit in the near term but in the long term won’t access that data very frequently. Having a policy set up to convert this storage to Nearline or Coldline storage classes after a month is a great way to save on its long-term data costs.
2. Access pattern
The ability to transform objects into lower-cost storage classes is a powerful tool but one that must be used with caution. While long-term storage is cheaper to maintain for an object that is accessed at a lower frequency, there will be additional charges incurred if you suddenly need to frequently access the data or metadata that has been moved to a “colder” storage option. There are also cost implications when looking to remove that data from a particular storage class.
For instance, there’s currently a minimum time of 30 days for an object to sit in Nearline storage. If you need to access that data with an increased frequency, you can make a copy in a regional storage class instead to avoid increased access charges.
When considering the opportunities for cost savings in the long term, you should also think about whether your data will need to be accessed in the long term and how frequently it will need to be accessed if it does become valuable again.
For example, if you are a CFO looking at a quarterly report on cloud expenses and only need to pull that information every three months, you might not need to worry about the increased charges accrued for the retrieval of that data because it will still be cheaper than maintaining the storage in a regional bucket year-round. Some retrieval costs on longer-term storage classes can be substantial and should be carefully reviewed when making storage class decisions.
“Where is this data going to be accessed from?” is a major question to consider when you’re considering performance and trying to establish the best storage class for your particular use case. Locality can directly influence how fast content is pushed to and retrieved from your selected storage location.
For instance, a “hot object” with global utilisation like your employee time-tracking application would fit well in a multi-regional location which enables an object to be stored in multiple locations. This can potentially bring the content closer to your end-users as well as enhance your overall availability.
Another example is a gaming application with a broad geo-distribution of users. This brings the content closer to the user for a better experience and ensures that your last saved file is distributed across several locations so you don’t lose your hard-earned loot in the event of a regional outage.
One thing to keep in mind when considering this option is that storage in multi-regional locations allow for better performance and higher availability but comes at a premium and could increase network egress charges depending on your application’s design. During the application design phase, this is an important factor to consider.
Another option when you’re thinking about performance is buckets in regional locations, a good choice if your region is relatively close to your end-users. You can select a specific region that your data will reside in and get guaranteed redundancy within that region. This location type is typically a safe bet when you have a team working in a particular area and accessing a dataset with relatively high frequency. This is the most commonly used storage location type as it handles most workloads’ needs quite well. It’s fast to access, redundant within the region and affordable overall as an object store.
Overall, for something as simple-sounding as a bucket, there are actually vast amounts of possibility, all with varying degrees of cost and performance implications. As you can see, there are many ways to fine-tune your own company’s storage needs to help save some space and some cash in a well-thought-out, automated way.