In 2015, Australia passed a new piece of legislation entitled the Telecommunications (Interception and Access) Amendment (Data Retention) Act 2015 . Following the introduction of this act, service providers have obligations to retain various data associated with services provided to customers.
Despite having taken effect in October 2015, I still see a lot of confusion in the service provider and broader community about exactly what customer data should be and is retained by providers. Having been heavily involved at my work in both preparing our implementation plan and providing guidance to the service provider industry at large, I feel I'm somewhat cognizant of some of the common misunderstandings and hope I can shed some light on how providers should be interpreting their obligations.
Disclaimer: I am not a lawyer and this is not advice. If you think your organisation may have a metadata obligation - the best thing you can do is contact a lawyer who is familiar with the service provider industry to get expert advice. Likewise, this website and these words are not those of my employer, so please don't hold them accountable for any opinions herein.
Finally, a lot of this information (and much more) can be found in the document Data Retention - Frequently Asked Questions for Industry published by the Attorney Generals Department (AGD). The AGD copped a lot of flak from members of industry for not being able to clearly articulate how to interpret the legislation, however I've found this document (even in its first revision) very capable of doing so - if only people take the time and effort to read an understand it with a sense of calm.
First thing first - this law is in effect now. If you have a metadata obligation you are assumed to be compliant from the 13th October 2015 unless you have lodged what is known as a Data Retention Implementation Plan (DRIP) with the Communications Access Coordinator (CAC) and received approval prior to this date.
Providers with approved implementation plans may have up to the 12th April 2017 (18 months later) to become compliant.
Quite simply, this act mandates service providers to retain data about customers buying a relevant service and some metadata around the service.
This information is often requested by police investigating issues, but historically has never been retained by service providers. By introducing this legislation, the government helps support law enforcement by ensuring important information must be retained.
There has been a lot of criticism around this legislation, from my reading this falls broadly into one of two areas:
What is being collected - Misinformation about the data being collected, or misunderstanding of what is being asked of providers by overzealous industry members; and
Who can access the data - Misinformation about who can request access to the data from a provider.
The legislation provides a very specific (and in my opinion reasonable) list of government agencies that can request access to this data. The Attorney general can also declare additions to this, however such appointments are public and thus have a level of oversight.
I have heard reports of agencies outside this list making requests for information from service providers and whilst it is unclear whether the providers have made data available when they shouldn't have, it is clear that there is still confusion about who can ask what. The CAC is supposedly able to provide clarification on such matters to those that find themselves in this situation.
Providers are also able to apply for exemption to their obligations, both as part of an implementation plan and on an ongoing basis. An application for exemption may be made to the CAC on the basis of one of the following:
Exemptions are required to be kept confidential by providers simply because public knowledge of these loopholes may provide a vector for bad actors to exploit.
The Australian government has made funding available to industry in the form of the Data Retention Industry Grants Programme to implement their metadata obligations.
Broadly speaking, applications for funding were open to providers who incurred costs preparing their DRIP; performed work to ensure compliance between 30 October 2014 and 13 October 2015; or had an implementation plan approved to become compliant.
Grants totalling up to $128.4 million were awarded to applicants in August 2016 and information on the recipients and the allocation methodology are avilable on the AGD website.
There has been some level of controversy within industry regarding the grants awarded - some spectators have questioned funding requests disproportionate to the size of the providers operations.
It's worth bearing in mind however, that as with most grant programmes - recipients must agree to a funding agreement which includes reporting requirements on how the money is being spent.
Phew, okay - now we're ready to talk about what information is being collected...
Information is only required to be collected where the customer has a service where a metadata obligation exists (evluating this is covered in the next section)
Generally speaking, if the provider does not handle or generate any of the data covered here - they are not required to generate it or capture it solely for the purpose of data retention.
Any information that is covered and available must be retained by the service provider for no less than two years.
Providers are required to ensure that data retained is:
A provider must retain any customer contact (name, address, etc) or billing details in their CRM including historic data of at least 2 years.
Where the service provider facilitates a communication, the relevant metadata must be collected where available for any communication held or attempted to be held:
In the case the provider is able to positively identify the other party of a communication (ie: the other party is also a customer of the provider) then metadata about that customer must also be retained. It is my understanding that even if that other party does not consume any services that would be subject to metadata retention, though I have not sought any clarification on this).
It's really important to note here that:
Anything that falls outside of the items discussed above.
Specifically, it's worth noting some specific things that aren't covered:
The last item here is one of the things overzealous operators have jumped on - however the AGD provides specific guidance around this.
Metadata retention obligations are determined on a per-service basis and a provider must consider each of the following criteria.
If the provider:service combination does not meet all the criteria below, there is no obligation.
Note: Again I would like to acknowledge the fantastic Industry FAQ from the AGD from which this is derived.
Are you one of the following:
These are very well defined terms under existing Telecommunications legislation and generally you will know if you are one of these.
It's worth noting that if you provide certain types of listed carriage services to a third party in return for a reward, you are considered a CSP.
Does the service carry, or enable a communication to be carried out?
This doesn't include services required to carry out a communication (eg: DNS), just those that actually carry it.
Intent is important here, if the service isn't primarily concerned with carrying communications in normal operation - you don't need to anticipate off-the-wall scenarios (eg: iodine dns, etc).
Services offered and consumed within the same property boundary are exempted.
Metadata obligations do not extend to services offered to officers or employees of the provider.
Here are some pretty common scenarios that come up and how I would evaluate a metadata obligation for them.
There is an obligation unless the "immediate circle" exclusion applies.
Customer data would be readily available. IP Address allocation and session start/stop times are acceptable metadata records. The service address of the service would suffice for the this type of connection.
I've added this example mostly to cover off voice services, whilst they remain a large part of the focus of this legislation - the industry is very mature and has a solid background in generating and retaining metadata here.
Generally speaking, there's an obligation here except free services.
CRM, telephone number allocation, attempted inbound/outbound call logs are required here. Including physical location of the handset (fixed address or mobile) at the call initiation/hangup.
If the operator of the WiFi is not a Carrier/ISP there is no obligation. This is because the service is offered for free so they would not be considered a CSP.
If the operator of the WiFi is a Carrier/ISP/CSP but the service is offered within a single property boundary there is no obligation. This is because the "same area" exclusion applies.
If the operator of the WiFi is a Carrier/ISP/CSP and the service is offered across multiple locations... get a lawyer. Strictly speaking you don't meet the "same area" exclusion - but some good lawyering might just change that.
Where an obligation exists, depending on the solution you may not have customer identifying information. MAC addresses suffice if your captive portal collects them - however not all solutions do.
If you are performing NAT you may be required to collect NAT mappings.
The hotel may considered a CSP as they are selling internet access for reward. Fortunately for them, the "same area" exclusion may apply to these providers.
Unless they're a chain of hotels, in which case clever lawyering may be required.
Or if they outsource the operation of the WiFi, in which case the operator almost certainly has an obligation.
There is no obligation - this is because of the "immediate circle" exclusion as email accounts are offered to employees only.
Unless email is a proscribed service for CSP (I don't think it is, but I haven't looked), and you outsource your IT externally - then your IT provider shouldn't have an obligation either unless they're considered a CSP/Carrier/ISP for other business activities.
If they were, your supply agreement may determine whether they have an obligation - if they're contracted to perform professional services for your staff email server - there is no obligation, however if they are providing the email as a contracted service then that might be a different story.
Even then, the "same area" restriction may apply if they manage a server on your premises.
There is no metadata obligation.
Whilst you operate this service, web browsing history is specifically excluded form legislation - so web server logs are not in scope here.
Any over-the top services operated by your customer (forums, etc) are not your responsibility to retain data for.
However, in theory any outbound traffic generated by a customer may be in scope - in which case the data you retain may need to include process owner (if you give customers a system account and run their apps as them), as this is a similar situation as NAT (shared resource, retain the mappings).
If I was a hosting provider, I'd be getting a lawyer to review this with me. I'd also be preparing an application for exemption on the basis that this data would not normally be generated as part of business as usual operation, even if you offer a dedicated IP for SSL/other reasons.
There is a metadata obligation here.
In the case of VPS, a static IP allocation exists and meeting the obligation is quite easy to meet.
I hope this helps address some of the landscape of metadata retention in Australia.
The general constraints are fairly easy to understand for engineering staff looking after these services, however your obligation does depend on how you offer the service from both a technical and commercial point of view.
My only advice is that you find (and retain) a competent technology lawyer and keep a level head!
Over the last year or so I've been looking quite a bit at Elasticsearch for use as a general purpose time series database for operational data.
Whilst there are definitely a lot of options in this space, most notably InfluxDB, I keep coming back to Elasticsearch when we're talking about large volumes of data where you're doing a lot of analytical workload.
More than a few times, I've been asked to explain what Elasticsearch looks like to the would-be developer/operations person. This isn't too suprising, the documentation isn't great at giving a real world architectural overview - which can really help contextualise the documentation.
Having been asked again today, I've decided to write one up here - so I can save some time explaining the same thing over and over.
It's important to note for the would be reader that this is my imperfect understanding of Elasticsearch. If you spot any glaringly obvious errors please let me know and I will update this accordingly and you'll have my eternal thanks for helping me grok this a little better.
So, as they say: the show must go on!
ElasticSearch is a search engine built on top of the Apache Lucene. It's great for full text search (duh) as well as analytical workloads with adhoc queries and aggregations.
In NoSQL parlance, it would be classified as a document-oriented database and that's primarily how your application will run with it.
You insert your DOCUMENT into an INDEX.
An INDEX is backed by one or more SHARDS
A SHARD can have zero or more REPLICA SHARDS
ELASTICSEARCH runs as a CLUSTER made up of NODES
A SHARD is a Lucene index
I won't cover setting up and maintaining quorum in the cluster, because that's pretty well covered elsewhere. If you're running on AWS there's even a managed product available which helps simplify things a lot.
For the keen observer, that last bullet point raises some interesting constraints when managing your cluster.
Elasticsearch manages lucene process, but remember they don't share the same memory space. Because your indexes have to be loaded fully into virtual memory, make sure you leave enough memory free for your index data (lucene).
Don't allocate more than 32GB heap to Elasticsearch. Due to how Java addresses memory, this will slow things down heaps (see what I did there?).
Read this. No really.
In short, deploying Elasticsearch for your search application requires some careful planning on both the ingest and query side.
Primarily you want your cluster to have enough memory so your busy shards (read or write) stay resident in physical memory. Otherwise your nodes will spend all their time paging data in and out of disk, which defeats the point!
Read the designing for scale part of the Elasticsearch guide.
Your best bet is to split the data across indexes by a meaningful criteria, and in the case of time series data this is a natural fit.
Elasticsearch has a feature called index templates, which are super useful for dynamically creating indexes with specific settings. Writes are directed to the correct index and you can automatically have the new index added to an index alias for reads.
Elasticsearch is a great tool, but requires you to plan ahead: Hopefully I've given you a good introduction to how things hang together and where there are sharp edges.
I highly recommend reading through the entire Elasticsearch: The definitive guide document, with the above information in mind ahead of time I think it makes for a much more cohesive read.
Good luck and happy hacking!
Thanks to @ZacharyTong for pointing out that Lucene does in fact support paging index segments (statement removed), and @warkolm for spotting a mistake regarding number of replicas and an opportunity to clarify!
After a suggestion by someone, I got it in my mind that a certain group could really do with a NNTP (Usenet) caching proxy. NNTP Proxies and Caches do already exist, but none of them support cache hierachies - that is, trying to resolve articles from peer caches before talking upstream.
The use case here is WACAN, a wireless network where each participant may want to offer their article cache for use by members on the network.
So after talking about it for a little while, I wrote one.
It's pretty terrible code actually (suprise!) - I've hand written the parser and for now it supports the bare minimum set of the protocol to support SABnzbd. NZBGet won't work, but could with some minor command support.
This ended up being useful to me for some of my other forever projects, so I've had a chance to look back at this recently and am considering revisiting it if I have time.
The new implementation will be a lot simpler, backing usenet requests onto stateless HTTP requests - leaving the implementation as a fairly flexible and pluggable exercise.
I've done a bit of testing with this already, specifically trying to use the
nntp schemes over HTTP - though library support for this (I'm looking at you net/http and libcurl is pretty average).
Watch this space, I guess.
Having been writing apps (poorly) with AngularJS for a while now, I was pretty excited to realise that combining this with Phonegap/Cordova - I could start writing portable mobile apps!
For the uninitiated, Cordova is an open source mobile app development platform. Basically, it bundles your app for your device and runs an embedded (lightweight) webserver and runs your app in a fullscreen browser control. Nifty. Phonegap is the Adobe commercial version, and they kindly donated the core of it to the Apache FSF (thanks Adobe!), which is the bit known as Cordova.
Now what project to learn with? Hmm... aha!
One of the problems with being a hobby project is that it doesn't get a lot of love. Specifically, their mobile app for iOS no longer works unless you still have an iPhone 3 (how quaint). I use the app /a lot/, and would love it to work on mobile without carrying an extra device around.
After vague promises of maybe adopting my code for their official app, I decided to see how hard it would be to replicate this on iOS.
As it turns out, not hard at all! After a week of development I had a fully functioning prototype with most (not all) of the features implemented. This ended up being a fantastic starter project for both size and combination of native features (camera, local storage, modal dialogs, etc).
Whilst the puush guys haven't adopted the code yet, there's nothing stopping anyone with an apple developer account from building and publishing to the app store. Which I've decided to do, after I polish it up a bit and make it ready for prime time.
Probably the biggest change I need to make is converting it to a grunt project so it bundles and minifies all the code locally instead of including it from cloud CDNs... Don't look at me like that - it was development code!
Anyway, check it out: puush-phonegap on GitHub
I had to split up my posts to avoid my hiatus post from being an unreadable mess.
This post is a bit of a recap of some projects I've worked on or am working on. There are another two posts following this one covering some larger posts.
I haven't participated in another Startup Weekend in Perth since my first one. I don't think I ever wrote it up in its full glory either, needless to say it was pretty intense and despite not having the experience I was after, I learnt a bunch and would recommend it to anyone who is interested in that kind of thing.
In preparation for the SWPerth7 event the organisers did a call out for mentors. This is something I'm interested in, but I honestly didn't expect to get accepted. Heh, suckers.
Mentoring was a great experience, and I genuinely hope I was able to help the teams with their planning and validation - my feedback seems to suggest so. I was a bit worried I'd commit a cardinal sin and be prescriptive about things "You should do this or that", but I was able keep it to leading questions and answering specific advice questions - so yay for that.
In the future, I'd love to do this again and aim to get some of the pitch coaching mentor timeslots. I think I might have more to help on this side of things.
I managed to get most of the parts together to completely automate my still for the purposes of , uh.. extracting essential oils and distilled water.
The hardest part so far is the temperature probe which is an annoying 5mm in diameter in stainless steel. I was hoping to use a one wire digital probe, however the smallest package for these is 5mm in diameter - leaving no room for the stainless steel shroud.
I'll have to order some K-type sensors, and once they arrive and the shed is cleaned - I should have some updates on that particular project.
They're looking to move from DV capture to HDMI over USB and have been waiting on some custom hardware to get made, which looks like a good path forwards.
But in the meantime my work has given me some budget to put together a solution with off the shelf parts. I've happily spent the entire budget and have a nice pile of bits ready to go.
Whilst I'm still keen on the open source solution, it's going to have to wait a while for me to play with these new toys.
Expect a post on this soon.
I'm happy to be a founding member of this organisation. A bunch of guys involved in WAFreeNet incorporated an association to further the goals of building a community operated wireless network across the region.
The incorporation (or rather, some of the people involved on either side of the should-we-incorporate fence) has caused some division in the community, but has also achieved some really great stuff. Specifically, the organisation has a relationship with WAIA which has helped secure a tenancy for a great core node at QV1 as well as access to some CDN traffic over the network.
I've perpetually been unable to participate in these networks - since 2003(?) I've been testing line of sight to each house I've lived in with no luck. More recently I've had perfect LOS to QV1, but at the wrong angle for the sector panel - but it looks like our new house should be able to manage an ok connection.
My involvement in this group has primarily been as a member, I'm not really interested in committees any more - but I've organised a few public meetings (none of which I've been able to attend, heh).
I think I'll keep doing that for a while.