Wednesday, December 10, 2014

PostgreSQL's pgcrypto for bigdata security and privacy

Working with data from other companies's always a burden. We have to be completely safe with their data. Leaks and privacy concerns everyone of us, and them.

So, here I will write out what I did to save records from clients in a manner that not even DBA's or data analysts could watch nor leak any sensitive data. 

I'm using PostgreSQL 9.3, with pgcrypto extension enabled, and asymmetric cryptographic keys. 

Considering that one client does not want to lent us their ultra-mega-super sensitive data. So we advise them to cryptograph it using our public GPG key, and they did. So we have cryptographic data, and our procedures ensure that only the analyst who will be working with this client's data will have the chance to peek into their super-sensitive data. 

I also have set up a network topology where no one can leak client's data, by using subnets that are not connected to the internet, and only within this particular client's subnet the analyst can see their data. The only method for the analyst to leak sensitive information would be by taking pictures of his screen and paginate throughout 20GB of text, taking pictures to have them OCR'ed after, well, we can't be 100% completely secure all the time.

When the work is done and the final report generated, we need to deliver it to the client. And to make sure that DBAs or Front-end developers also cannot even take a peek at client's data. With simple plpgsql, triggers and pgcrypto, is possible:

# CREATE TABLE pubkeys (client character varying, pubkey character varying);

The pubkey should be the pubkey from client.

# CREATE TABLE crypt_reports(client character varying, report bytea);

The report will contain automatically cryptographed report for the client.

# CREATE EXTENSION pgcrypto;

Activate the cryptographic extension.

# CREATE OR REPLACE FUNCTION my_pub_crypt() RETURNS trigger LANGUAGE plpgsql AS $$
# DECLARE
# pubkey character varying;
# pubkey = (select pubkey from pubkeys where client = NEW.client);
# NEW.report = (select pgp_pub_encrypt_bytea(NEW.report,dearmor(pubkey));
# RETURN NEW;
# END;$$;

# CREATE TRIGGER insert_report BEFORE INSERT ON crypt_reports FOR EACH ROW EXECUTE PROCEDURE my_pub_crypt();

After procedure and trigger are set, then every time anything is inserted into report column it will be cryptographed using the client's pubkey. So, the following statement:

# INSERT INTO crypt_reports('client','The sensitive report');

Would produce the following:

# SELECT * FROM crypt_reports;

"client";"\301\301L\003X\224\343)\023\353\016\003\001\017\377g3)\221\241\177\035\272\220\011\017\331\015N\036\324\244\304 ,9v*j-n>\200\206\241\217\244H\004,\316\266A\335\235Tb\217y\016\213>\377\304\221o;\177\210(M'a\251\264d\014\3768\242\344*?\036aS\354\230\375\230u (...)"

To return the cryptographed file to the client, the front-end should create the file:

# COPY (select encode(report,'hex') from crypt_reports where client = 'client') TO '/tmp/report.out';

We still have to reverse the encoding:

bash:~# xxd -p -r /tmp/report.out > report.out.gpg

And then the client can decrypt his report:

bash:~# gpg -d report.out.gpg > report.out

# echo EOF