Beaten Up by Big Data Analytics

Posted by K Krasnow Waterman on Thu, Jan 16, 2014 @ 10:01 AM

Tags: privacy technology, accountability, data mining, big data, b2b customer service technology, privacy, analytics, risk management, forensics

Big data analytics and I have had our first personal run-in.  Last year, I wrote and spoke about the risks of big data analytics errors and the impact on individuals' privacy and their lives.  Recently, I observed it happening in real time... to me!  One of Lexis's identity verification products confused me with someone else and the bank trying to verify me didn't believe I am I.  I was asked three questions, supposedly about myself; the system had confused me with someone else, so I didn't know the answers to any; and the bank concluded I was a fraud impersonating myself!

Here's how it played out.  I was on the phone with a bank, arranging to transfer the last of my mother's funds to her checking account at another bank.  The customer service representative had been very helpful in explaining what needed to be done and how.  I have power of attorney, so have the right to give instructions on my mom's behalf.  However, before executing my instructions, the representative wanted to verify that I am who I claimed to be.  

In the past, companies confirmed identities by asking me to state facts that were in their own records - facts I had provided, like address, date of birth, last four of Social Security Number.  Instead, the representative informed me, he would be verifying my identity by asking me some questions about myself from facts readily available on the internet.

The good: That's a pretty reasonable concept.  The typical personal identifiers have been used so much that it's getting progressively easier for other people - bad guys - to get access to them.  In a 2013 Pew survey, 50% of people asked said their birthdate is available online.  The newer concept is that there are so many random facts about a person online that an imposter couldn't search their way to the answers at the speed of normal questions and answers.

The bad: The implementation doesn't always match the concept.  The customer service representative asked me "What does the address 7735 State Route 40 mean to you?" Nothing. I later Googled to find out where this is; I don't even know the town. 

"Who is Rebecca Grimes to you?" To me, no one, I don't know anyone by that name.  "Which of the following three companies have you worked for?"  I had never worked for any of the three companies with very long names.  I explained that I lead a relatively public life, that he could Google me and see that I'd worked at IBM, JPMorgan, etc. That might have been my savior, because next he patched in someone from security to whom I could give my bona fides.  With my credentials in this arena, a google search, and answering the old fashioned questions, the security staffer told the customer service rep he was authorized to proceed.

The ugly: The rest of the population is not so lucky.  They can't all talk their way past customer service or play one-up with the Information Security staff.  And, big data has some pretty big problems.  In 2005, a small study (looking at 17 reports from data aggregators ChoicePoint and Axciom and less than 300 data elements) found that more than two thirds had at least one error in a biographical fact about the person. In that same year, Adam Shostack, a well regarded information risk professional, pointed out that Choicepoint had defined away it's error rate by only considering errors in the transmission between the collector and Choicepoint, thus asserting an error rate of .0008%.

Fast forward, Choicepoint is gone, acquired by LexisNexis in 2008.  My particular problem, the bank InfoSec guy told me, was coming from a Lexis identity service.  In 2012, Lexis Nexis claimed a 99.8% accuracy rate (0.02% error), but I was skeptical given the ways accuracy and error can be defined.  

The problem, though, is larger.  At the end of 2012, the Federal Trade Commission did a larger study (1,001 people, nearly 3,000 reports) of credit reporting, another form of data aggregation and one that typically feeds into the larger personal data aggregators.  That study found that 26% of the participants found at least one "material" error, a mistake of fact that would affect their credit report or score.  One in four people found a credit-related error. The FTC did not count other factual errors but this provides a sense of the scale of error still being seen today.


In the FTC study, approximately 20% of the participants sought a correction to their report and 80% of those got a report change in response.  About 10% of the overall participants saw a change in their credit score. Appropriate to today's blog topic, the report table above shows that data vendors agreed with more than 50% of the complaints that they'd mixed in someone else's data.

The individual today has the choice between regularly chasing after big data analytics errors or suffering the consequences of mistaken beliefs about themselves.  Some very prominent folks in the Privacy policy sphere have told me this isn't a privacy issue.  I think they're wrong. The Fair Information Practices, which have been in use since the 1970's and form the basis for much of the privacy law and policy around the world, include the requirement that those entitites handling personal information ensure that it is accurate.  How much sense would it make if you have a privacy right to keep people from using accurate data in harmful ways, but no privacy right to keep them from using inaccurate data in the same harmful ways?



Article has 0 Comments. Click here to read/write comments

The Cross-Border eDiscovery Challenge & The Possible Accountable Systems Solution

Posted by K Krasnow Waterman on Thu, Jun 18, 2009 @ 13:06 PM

Tags: access control, technology implementing law, privacy technology, technology for lawyers, accountability, knowledge discovery for litigation, information management, data protection, digital evidence, technology for business managers, global outsourcing, information security, digital rights, privacy, eDiscovery, forensics

Cross-border eDiscovery is a hot topic this year. The decreased cost of storage has resulted in nearly everyone retaining massively greater quantities of information. Email and the Web have driven a shift in data to less formal, less structured records and files. And, globalization of business has caused the relevant information for an increasing number of lawsuits to be spread among multiple countries. Courts have instituted new rules for how parites will engage in discovery related to this digital evidence. And, these new rules are putting some lawyers in the cross-hairs of other governmental digital control activities. Lawyers, by and large, are not technologists and the challenges arising from handling this mass of distributed data are proving daunting. Technology vendors are offering significant assistance but still more is required.

Discovery, at its simplest, is the concept that one party to a lawsuit can learn what the opposing party knows that is relevant to the resolution of the case. In the US, this had long been accomplished through gamesmanship and strategy (think, hide-and-seek meets go-fish) while, for example, the UK had moved on to affirmative disclosure, the idea that each side needs to identify the truly relevant and provide it. In either case, the parties have needed to decide what data to preserve and how to search it. For a variety of reasons, corporations are adding and deleting data all the time -- doing things like updating client or supplier addresses, changing prices, adding sales, marking deliveries. So, typically, one needs to select a moment in time that's relevant to the issues in a lawsuit and look at all data from that time or up until that time. This is no easy task, as the challenges of selecting the moment, deciding how to save the data, and which tools will provide the best search result are all subject to debate.

Handling a case that involves data in multiple countries compounds the challenge. The EU has had detailed and tightly controlling rules about the handling of information about people by commercial entities for nearly thirty years. By comparison, the US historically has had a comparatively limited concern about the privacy of people whose identities appear in commercial files. For example, in many cases EU rules prohibit making the sort of "moment in time" copy of entire systems described in the last paragraph and have rules that as a practical matter prohibit sending data about people out of the country. Recently, these rules have come into head-on conflict with courts in the US requiring that certain information be turned over in discovery. The decision not to violate the EU rules has resulted in some significant financial penalties being imposed by US judges, while the decision to violate the EU rules and provide the data in the US has resulted in some equally significant financial penalties being imposed by European judges, leaving litigators between a rock and a hard place.

Much discussion is ongoing about ways to resolve this problem. For example, governmental, public policy, and commercial bodies are discussing possible changes to their rules. New forms of insurance may be offered to indemnify parties caught in the current situation. At the same time, there is a quiet march forward of new technologies which may resolve some of the issues. For example, systems that track each data transaction at a very granular level and account for their compliance with rules, called "accountable systems", are in development. Such systems would make it possible to understand the data in the system at a particular moment in time without requiring a "copy" to be made. And, they would be able to recognize competing data rules and apply the correct ones, wherever the resolution of a rules conflict is possible.  In theory, this technology might also make it possible to transfer the substantive portions of the information without the personal information, so that the parties could define very small subsets that are relevant and actually required to be disclosed, thus limiting the release of personal information to subsets so small that requirements, like notice to the individuals in the data, could reasonably be met.

While this new type of technology offers promise for resolving some of the cross-border eDiscovery challenges without requiring any jurisdiction to change its rules, it has drawn relatively little attention in this context to date.  Perhaps this is because the technology needs to be refined and then implemented in the day-to-day digital business practices of organizations before it can be capitalized upon to address this issue.  How long it will be before this occurs will be driven by how quickly people recognize the problems this technology can solve.

Article has 0 Comments. Click here to read/write comments

Legal Standards in a Technologically Bifurcated World

Posted by K Krasnow Waterman on Thu, Jan 29, 2009 @ 10:01 AM

Tags: access control, identity management, technology implementing law, privacy technology, technology for business managers, law about technology, public policy, technology b2b customer service, information security

It's not news that our society is divided into technological haves and have-nots.  Much has been written about the advantages lost or gained - education, professional, and social - based upon the primacy and recency of one's technology.  Recently, I've become increasingly attuned to another place where technological caste matters -- legal standards. 

It's been clear to me for quite some time that the lawyer who resonates with technology can do more successful and faster legal research; propound vastly superior discovery requests; and produce substantially more incisive disclosures.  It's now becoming increasingly clear to me that the law itself is being skewed by those of us who live to keep up with the next big thing in technology.  Debates among lawyers rage in my email inbox about the differences in things like encryption technologies and metadata standards, with lots of cool techie references to things like ISO, NIST, Diffie, OASIS, and XACML.  

In the meantime, I was on the the Social Security Administration website the other day and they wanted me to use an eight digit alphanumeric password (case insensitive, no special characters) to upload W2 and other sensitive tax information.  My bank's brokerage affiliate is using the same outdated and readily hackable password technology  I still see commercial and bar association websites seeking personal and financial information without indicating that they're using SSL or some other baseline method of securing the information.  I still get requests from security professionals to email my Social Security Number.  If you're not particularly technical, trust me, none of these are good things.

The distance between these two realities has got me thinking about all the places that these two technological castes will be competing to set legal standards.  For example, does a "time is of the essence clause" apply the perception of time of a blackberry owner or a person without a laptop?   

As the new administration provides the first coordinated national focus on technology, I'd like to add this to the list.  Perhaps the new national CTO (yet to be appointed) could work with the American Bar Association and other leaders to identify a rational strategy for standards setting in such a technologically bifurcated society.




Article has 0 Comments. Click here to read/write comments

Privacy on the Web - Part I

Posted by K Krasnow Waterman on Thu, Nov 22, 2007 @ 10:11 AM

Tags: privacy technology, technology for lawyers, technology for business managers, technology, privacy

A friend just sent me a blog which is a bit of a rant about some comments on privacy or lack thereof. It provides a good basis to discuss some concepts and misonceptions about privacy and technology.

What does privacy mean?

Donald Kerr, a Deputy Director of National Intelligence, said that our culture equates privacy and anonymity. Like the blog author, James Harper -- of the Cato Institute and other esteemed institutions-- I disagree that the terms are equivalent in the eyes of the general public. Webster's dictionary describes being anonymous as being unknown or not identified, while defining privacy as keeping oneself apart or free from intrusion. In our culture, volition appears to be a key differentiator. When I close the blinds, I'm choosing privacy. When no one notices me in a crowd, I'm anonymous.

Is it unrealistic to expect privacy?

Kerr asserts that privacy doesn't exist and cites the availability of personal information through MySpace, FaceBook and Google. From a volition standpoint, Kerr's statement is a mixed metaphor. MySpace and FaceBook are entirely voluntary, people deciding to post things about themselves for their friends or the world to see. Google, making great strides at "organizing the world's information", aggregates personal information that may not have been intended or expected to be shared. I recently showed a friend that in five minutes on Google I could find more than his professional profile -- I produced his home address, his parents, his religion, his political leanings, and something about his finances. This undercuts Harper's contrary assertion that people have retained the ability to provide their identifiers to some "without giving up this information to the world".

Can individuals control privacy?

Kerr and Harper are talking when/whether/how the federal government should have access to individual information, but the question extends farther. Anyone signing up for access to a newspaper or making a purchase on the web is giving bits of himself away. Most typically, the information is gathered in "cookies", established by the websites and stored on the individual's computer. This summer, one study concluded 85% of users were aware of cookies, but only about 28% were able to successfully delete them.

The public's misunderstanding about their control over personal information in cookies extends past their technical inabilities. The misunderstanding is exacerbated by a little legal wordplay. Nearly every "privacy statement" I've ever read on an e-commerce website says that the information may be shared with "afflilates" but then doesn't define that term. Each of these companies could call anyone, any company, or any government agency an "affiliate" and give them access to cookies or sell them the information in the cookies.


[Stay tuned for Part II, where I'll talk about what business leaders and system designers can do to offer more privacy and still meet their business goals.]




Article has 0 Comments. Click here to read/write comments

Comments to DHS Data Privacy and Integrity Advisory Committee

Posted by Dharmesh Shah on Sat, Nov 05, 2005 @ 06:11 AM

Tags: privacy technology

Public Statement I gave to the DHS Privacy and Integrity Advisory Committee on behalf of the DHS Information Sharing and Collaboration Office, June 15, 2005, Harvard Law School.
( The official transcript with follow-on questions and answers is posted on the DHS website:

On behalf of myself and the DHS Information Sharing and Collaboration Office, I thank you for the invitation to speak here today. The Information Sharing and Collaboration Office, commonly known as “ISCO”, is working on a number of projects that we believe will have a direct impact on preserving privacy -- while at the same time improving information sharing.


In the late summer and fall of last year, ISCO served as the DHS lead in the drafting of a multi-agency plan for a broad-ranging terrorism Information Sharing Environment. That plan was required by Executive Order 13356 (issued last August) and is now a part of the work under Section 1016 of the Intelligence Reform and Terrorism Prevention Act (passed in December). In its work on the Information Sharing Environment, ISCO acted as the conduit between the components of DHS and the other federal agencies with anti-terrorism missions.


In that role, ISCO received a clear message from the DHS Privacy Office: “Privacy should not be addressed as an afterthought. It should be an integral part of any information sharing plan.” I am pleased to say that the draft plan submitted to the White House carries that message forward.

Within DHS, ISCO also has a broad information sharing policy and implementation role. As you know, effective screening and credentialing require the sharing of information about persons. So, while the Terrorism Information Sharing Environment moves forward on a multi-agency basis, ISCO is also working on near-term and mid-term tactical steps to ensure that privacy will be at the forefront of policy and process development as DHS develops information sharing activities.

One of ISCO’s duties is to assess the current state of information sharing within DHS. The Privacy Act requires federal agencies to publish System of Records Notices, called SORNs, to publicly describe the sources, collection, and manipulation of “person” data in each system; and to publish Routine Use Notices to describe the parties with whom the data is going to be shared, and under what circumstances. One of ISCO’s projects was to gather all of the SORN and Routine Use Notices for DHS systems and parse their published information sharing rules in spreadsheet format. To our knowledge, this is the first such compilation.

Now that this information is compiled, and in a spreadsheet, the information from the SORN and Routine Use notices can be cross-matched with the information from a department-wide electronic survey ISCO conducted to understand information flow in DHS. By comparing the two, DHS can supplement and harmonize the knowledge about systems that contain person information.

As we learn more about information sharing in DHS and with our stakeholders, and in particular while doing this project, we note that the terms for describing Routine Uses – the terms and phrases used for the “who, what, and when” of privacy sharing – are not consistent, either internally to DHS or externally around the federal government. ISCO and the DHS Privacy Office have begun discussions about establishing a project to either harmonize routine use terms or to build equivalency tables for the terms. That work will take copious amounts of time and effort; if we begin now, we may have the results that are needed when the time comes to computerize any of these processes.

ISCO’s responsibilities include making proposals for “what should be” and how to move DHS there. In part, we derive our ideas from the knowledge we glean about the current (“as-is”) state of information sharing. For example, we know that agencies or components enter into agreements for information sharing, setting forth the mechanics and rules for sharing information. ISCO conducted a brief study and confirmed that there was no standardized methodology for entering into such information sharing agreements with other agencies. Based on its assessment of what appear to be best practices, and as a part of its duties to establish policies and procedures, ISCO has now established a methodology, a facilitation team, and a prototype system for building information sharing agreements.

The methodology includes the requirements that a Privacy Office representative be contacted and that certain privacy-related questions be answered as part of the creation of each new information sharing agreement. This provides a near-term improvement to the goal of integrating privacy concerns into information sharing.

The facilitation team has been approached to help components that have received many requests for the same information. ISCO, then, can facilitate an understanding of the broader scope of sharing that may be under review; with that broader view, ISCO can ensure that the Privacy Office is given the opportunity to address not only the implications of an individual agreement, but also the implications of the aggregate of the agreements. This provides both near and mid-term improvements to privacy.

A prototype information system that has just been developed collects whole information sharing agreements and ultimately will permit authorized individuals to draft and edit the specific provisions of the agreement over which they have authority. Over time, such individuals also will have the ability to select from the language of earlier agreements. As part of that process, every agreement will have to address privacy requirements, and only a person authorized by the Privacy Office will be able to create those provisions. This will provide mid-term improvement to integrating privacy concerns into information sharing.

We are, perhaps, most proud of the work that ISCO is doing to ensure that privacy needs are integrated into the long-term information sharing efforts. We have been a regular participant in internal discussions with the DHS CIO’s Office, and external discussions with the Information Sharing Council and the Information Sharing Environment Program Manager.

An interactive Information Sharing Environment must have log-on identity management functions that will act as the key to unlock the access and security controls each information provider in the environment will place on their data. At ISCO, we have begun to focus on the source, nature, and scope of the rules that will need to be in place to protect the data and the concomitant user identity information, such as roles and credentials, that will be needed to apply the access rules. We are also co-sponsoring with the DHS CIO Office acquisition of a Departmental identity management system that will use the roles and rules to control and audit access to DHS information resources. DHS recently issued a “request for information” to seek public input on the requirements for this system

In that vein, as we work towards an interactive environment, ISCO is evaluating whether a live prototype can be built with the SORNs and Routine Use notices as a gatekeeper for access – a prototype that would match information about the requestor with privacy access rules associated with the requested data. We are currently evaluating whether there is sufficient detail in the parsed SORNs and Routine Use notices we have already produced, or whether we need to also parse Privacy Impact Assessments to get the fine-grained detail that will be needed to reduce these requirements to the formal logic – the 1’s and 0’s of a computer system – required to automate the thousands or millions of access decisions made daily to conduct anti-terrorism and other operations.

ISCO has proposed this activity because the Privacy Act appears to provide some of the most complex and diverse rules inside a single rule set and, therefore, a prototype of privacy access could provide great insight into the requirements for all the other rule sets that will need to be added.

ISCO is working collaboratively with the Privacy Office to provide privacy rules that can be used as early use cases for the builders of this technology. ISCO and the Privacy Office have worked together to deconstruct the Privacy Act into a comprehensive flow diagram, detailing each decision that will need to be implemented at the systems level. There is a draft companion document that lays out which rules will be consistent for all government agencies, which rules have exceptions for some agencies, which rules are subject to legal interpretation and will have variants between agencies, and which rules – the routine use rules – are unique to each dataset.

A first draft of this material has been presented to DHS’ Metadata Center of Excellence and to the Federal Enterprise Architecture Data Reference Model working group -- and we have received an enthusiastic response. If we succeed in having this information be a use case for each of these activities, we will have succeeded in placing Privacy Act implementation into the earliest stages of future system development. That would be a significant long-term success.

ISCO works on many information sharing policies, processes, and projects. The scope of ISCO’s work is across the many diverse interests and responsibilities of the Department. Our broad view of the work that is underway and the work that needs to be done, allows us to integrate the needs and requirements of the components of DHS into a cohesive plan. As you have heard today, the DHS Information Sharing and Collaboration Office is proud and pleased to be able to work with the Privacy Office to ensure that privacy interests are set into the foundation of government information sharing.

Article has 0 Comments. Click here to read/write comments