Risks in Big Data Predictions
Tags: technology for lawyers, accountability, data mining, big data, technology for business managers, public policy, privacy, knowledge discovery, analytics, risk management
Disrupt Law!! Spark-athon (InternetWeekNY) - UPDATE
Tags: technology implementing law, technology for lawyers, Internet Week, legaltech, entrepreneur, startup, IWNY
The Disrupt Law!! Spark-athon is sold out! I knew it would be exciting to put on an InternetWeekNY event. We're bringing together Matt Hall (Founder, Docracy), Tom Chernaik (CEO, CMP.LY), and Steven Cherry (Journalist, @TechWisePodcast) to inspire 25 lawyers and 25 hackers brainstorming new legaltech projects and ventures. We added in a happy hour - the space and the beer donated by WeWork Labs in Soho - and prizes - Grand Prize donated by Dreamhost.
But, I had no idea how exciting it would be. We've got a waiting list! We've received tremendous support from the New York Legal Hackers and the nyhacker meetups. Jonathan Askin, a noted tech law professor, is going to participate. Extra thanks to him for pitching in getting the word out and adding volunteers. And, Josh Kubicki, author of the TechCocktail blog I've been quoting, has come in from Cincinnati to participate!
Disrupt Law!! Spark-athon (InternetWeekNY)
Tags: technology implementing law, technology for lawyers, Internet Week, legaltech, entrepreneur, startup, IWNY
I'm a fan of InternetWeekNY - now 45,000+ New Yorkers mingling to teach, pitch, and network all manner of things web. So, this year, I'm the proud sponsor of an event. On Thursday, May 23, from 4pm to 6pm, I'll be hosting DISRUPT LAW!! SPARK-ATHON in Soho.
The event will include speed-networking and collaborative brainstorming among 25 innovation-oriented lawyers and 25 venture-seeking hackers/developers. The goal is to spark new legaltech ventures in disruptive legal technology. For those not in the startup scene, that's "disruptive" as in "ground-breaking innovativion" NOT as in "breaking someone else's technology.'"
Motivating descriptions of successful ventures will be provided by Matt Hall, co-founder of Docracy; Tom Chernaik, CEO of CMP.LY; and one more surprise. Docracy was the winner of the TechCrunch Disrupt NY Hackathon in 2011; it offers an open collection of legal contracts and a mechanism to negotiate and sign documents online. CMP.LY provides a full and creative suite of tools for compliance and risk management for social media. And, of course, there'll be a little something to eat and drink.
I haven't been this excited since I created the LinkedData Lab, which launched new careers and companies. Can't wait to see what Disrupt Law!! brings!
Follow this event on twitter - #DisruptLawIWNY
The Cross-Border eDiscovery Challenge & The Possible Accountable Systems Solution
Tags: access control, technology implementing law, privacy technology, technology for lawyers, accountability, knowledge discovery for litigation, information management, data protection, digital evidence, technology for business managers, global outsourcing, information security, digital rights, privacy, eDiscovery, forensics
Discovery, at its simplest, is the concept that one party to a lawsuit can learn what the opposing party knows that is relevant to the resolution of the case. In the US, this had long been accomplished through gamesmanship and strategy (think, hide-and-seek meets go-fish) while, for example, the UK had moved on to affirmative disclosure, the idea that each side needs to identify the truly relevant and provide it. In either case, the parties have needed to decide what data to preserve and how to search it. For a variety of reasons, corporations are adding and deleting data all the time -- doing things like updating client or supplier addresses, changing prices, adding sales, marking deliveries. So, typically, one needs to select a moment in time that's relevant to the issues in a lawsuit and look at all data from that time or up until that time. This is no easy task, as the challenges of selecting the moment, deciding how to save the data, and which tools will provide the best search result are all subject to debate.
Handling a case that involves data in multiple countries compounds the challenge. The EU has had detailed and tightly controlling rules about the handling of information about people by commercial entities for nearly thirty years. By comparison, the US historically has had a comparatively limited concern about the privacy of people whose identities appear in commercial files. For example, in many cases EU rules prohibit making the sort of "moment in time" copy of entire systems described in the last paragraph and have rules that as a practical matter prohibit sending data about people out of the country. Recently, these rules have come into head-on conflict with courts in the US requiring that certain information be turned over in discovery. The decision not to violate the EU rules has resulted in some significant financial penalties being imposed by US judges, while the decision to violate the EU rules and provide the data in the US has resulted in some equally significant financial penalties being imposed by European judges, leaving litigators between a rock and a hard place.
Much discussion is ongoing about ways to resolve this problem. For example, governmental, public policy, and commercial bodies are discussing possible changes to their rules. New forms of insurance may be offered to indemnify parties caught in the current situation. At the same time, there is a quiet march forward of new technologies which may resolve some of the issues. For example, systems that track each data transaction at a very granular level and account for their compliance with rules, called "accountable systems", are in development. Such systems would make it possible to understand the data in the system at a particular moment in time without requiring a "copy" to be made. And, they would be able to recognize competing data rules and apply the correct ones, wherever the resolution of a rules conflict is possible. In theory, this technology might also make it possible to transfer the substantive portions of the information without the personal information, so that the parties could define very small subsets that are relevant and actually required to be disclosed, thus limiting the release of personal information to subsets so small that requirements, like notice to the individuals in the data, could reasonably be met.
While this new type of technology offers promise for resolving some of the cross-border eDiscovery challenges without requiring any jurisdiction to change its rules, it has drawn relatively little attention in this context to date. Perhaps this is because the technology needs to be refined and then implemented in the day-to-day digital business practices of organizations before it can be capitalized upon to address this issue. How long it will be before this occurs will be driven by how quickly people recognize the problems this technology can solve.
Campaign Hacking a Reminder for Email Security
Tags: access control, technology for lawyers, data protection, technology for business managers, information security, technology management, forensics, cyber-security
Computer hacks were the topic of tech news on the day after Senator Obama's historic election. On Wednesday, Newsweek reported that the Obama and McCain campaigns were the subject of computer hacks during the campaign. The Obama campaign reported a possible email phishing attack this past summer. They were ultimately told by federal authorities that both the Obama and McCain campaign computers had been compromised. Reports are circulating that the attacks came from a "foreign entity" and lifted significant amounts of data from both campaigns.
Also on Wednesday, malware creators took advantage of the tremendous interest in the election and began sending emails with "Obama" somewhere in the subject line. The most common subject lines promised video of a speech, additional election coverage, or new interviews. One security company alone reported that it had filtered more than 10 million emails in less than 6 hours on Wednesday morning. Apparently, hundreds of thousands of people sought to open them and were instead infecting their computers with malware.
These two events highlight the importance of email security. This is the first major election heavily conducted, financed, covered, and influenced on the web. It reflects the transition to technology for ever-increasing numbers of the population. And, it reflects our ready acceptance of the transition.
Too many people assume that their spam filter, anti-virus software, etc will protect them. Yet, any technology professional will tell you that firewalls and software alone are not enough to protect a computer from data theft or destruction. They'll also tell you that emails are the easiest means of attacking computers because people still act before they think. A huge percentage of hacks rely on "social engineering" - convincing a person to do something that works to the hacker's benefit.
Education is still a significant tool in the computer security arsenal. Users must learn to stop and ask themselves whether the email is likely to be what it seems. First the easy questions: How likely is it that some stranger will really send you millions of dollars? Is your US bank really going to send you any request from an email address that doesn't contain the company name? And, if your friend really did lose a wallet on a spur-of-the-moment vacation how likely is it that she'd email you for a credit card number instead of calling her husband, the consulate, or American Express for help?
Is it possible to go the next step and teach users a little technology? They should always check to see if the attachment they're about to open like a present on Christmas morning ends with ".exe" (a file that will execute some program). If it does, they should beware and seek tech support. Or can we teach them to look at the "properties" of the link they're about to click, see the web address ("URL") and recognize that the source is the wrong country? A quick look at the domain registry will make it pretty obvious that something that purports to come from around the corner has a two letter code that means it's really coming from a country around the the world.
With so much hacking going on, the problem is no longer just a technical one. More laws are creating responsibility to take reasonable care to protect other people's information and liability for failing to do so. It is important to remember that with these changes, the standard of care is expected to improve, and what was reasonable yesterday may be unreasonable today.
Text messaging and the train wreck
Tags: technology for lawyers, knowledge discovery for litigation, technology for business managers, law about technology, public policy, eDiscovery
Train wreck caused by text messaging? Multiple news reports have raised the possibility that the conductor of a Los Angeles train was sending text messages just before the train crashed and many were killed. The questions under investigation are whether this is true and whether the conductor was distracted by it when he should have seen red light signals indicating the hazard ahead.
This is the saddest outcome of an issue I, and others, have been raising for years. The use of technology for non-work activities has pervaded the work environment to the extent that it is impacting work performance. The obvious problem is lost revenue and reduced profits to the employer, but sometimes it correlates to increased liability. If true in this case, it means lost lives.
If the shopclerk with an mp3 player or cellphone in the ear is too distracted to answer questions accurately or make correct change, what makes me think my car mechanic, stock broker, or doctor's lab technician isn't? In 2006, eDiscovery companies were estimating that one quarter to one third of all emails flowing through a corporation were personal email. At the time, I wrote about the thousands of football and fantasy football gambling emails that had passed through Enron. I also wrote about the dirty jokes, hook ups, and other sex emails there.
It's getting technically easier to discover that people aren't really working when they claim to be. This summer before lecturing at a state bar convenion, I stood in the back of the large hall and observed what people were doing. I explained the ways I could prove that they had been using their laptops, blackberries, and iphones to shop on the web, play video poker, and text friends and family. I explained how, In the not-to-distant-future, these activities will probably void the professional certification credit they thought they were earning by being present but not paying attention.
This week's train wreck brings more attention to the debate about just how much people's attention is diverted and what the consequences can be. At a New York panel discussion last fall, a group of senior financial industry compliance managers uniformly said they weren't concerned about personal web, email, and phone use at work. Perhaps they ought to be.
(WARNING: Adult content)
On Tuesday, Alex Kozinski, Chief Judge of the federal Ninth Circuit was caught by the LA Times with a website full of sexually explicit material accessible to the public. Pardon the pun, but perhaps the old expression about "closing the barn door after the animals are gone" has never been more appropriate. The LA Times says the site included photos of "naked women on all fours painted to look like cows and a video of a half-dressed man cavorting with a sexually aroused farm animal." There is so much wrong with this picture that it's hard to decide where to start.
Next week, I'll be giving a talk at the Arizona State Bar Convention about legal ethics and technology. One of the most important points is that lawyers need to understand how big a data footprint they and their clients are leaving behind.
Kozinski is reported to have said that he thought the site was for his private storage and that he was not aware the images could be seen by the public. That's a problem for many lawyers, who are unaware how easy it is to find things they or their clients have posted on the web. In the Judge's case, that's doubtful if he's really the author of the letter to 'Article III Groupie" posted on undertheirrobes.com. There, in a plea to be included as a contender for "judicial hottie" were multiple links to http://alex.kozinski.com. The links included the reportedly offensive subdirectory /stuff (see the properties for "bungee jump"). If he didn't think people could get to the subdirectory, why did he include a link to it?
Kozinski is reported to have said he didn't know if any of the material on the site is obscene. The site is now offline and apparently unavailable through some of the easiest means of access. But, Cryptome has posted a list of all of the files and subdirectories in the judge's /stuff subdirectory and it contains a subdirectory called "/fucking" which has been around since November 2006. The LA Times described part of the Kozinski site as containing "images of masturbation, public sex and contortionist sex." In researching this story, I accidentally came across the women-as-cows photo (be very careful which Google hits you choose if you search this story); the women's posteriors are facing the camera and their genitalia are in full view.
In the first LA Times story, the Judge said that he had uploaded sexually explicit content to the site. The next day, the Judge is reported to have suggested that some of the items were posted by his adult son and that he was unaware of them. If this becomes a question of sufficient concern, there are technical methods to determine whether this is likely true or false. The website appears to have been registered by the Judge's son, hosted on a joke server and registered using an obviously false address (including both homage to hackers with references to FOO and to lawyers with the fictitious town "Barsville"). Even so, with pc logs, server logs, emails, and web postings, it won't be that hard to figure out most of who did what.
The story broke because Judge Kozinski was hearing a trial level case, a criminal prosecution for for the distribuion of pornographic materials (containing bestiality). In response to the news stories about his own website, Judge Kozinski suspended trial at least until Monday. Besides the immediate question of possible conflict of interest, it is likely that someone will look more closely at how the case came to be assigned to Judge Kozinski. It is not impermissible for an appeals court judge to hear a trial case, but it is not common.
It won't be long before people are reassessing everything the Judge has said or done. And, quite a lot of that history is readily available in digital form. For example, people are already reassesssing Judge Kozinski's 2001 battle with the Court's administrators over pornography filters on the government's computers. I've yet to see any discussion of his opinion (in US v Poehlman) finding that the government entrapped a man it accused of crossing state lines to have sex with minors.
The LA Times reports that the Judge "defended some of the adult content as "funny"" and "he had shared some material on the site with friends." Considering that the site contains the aforementioned photos of naked women as cows, and is reported to have included at least one photo of women exposing their pubic hair, we will now wait to see whether former female employees or colleagues come forward to say that they were the recipients of such "sharing" and found it offensive or harassing. And, it's only a matter of time before someone takes a new look at his writing on sexual harassment (Foreword in Sexual Harassment in Employment Law (Barbara Lindemann & David D. Kadue, BNA 1992), reprinted as Locking Women Workers in a Gilded Cage in Legal Times of Washington, May 25, 1992, at 26.)
Also discovered on Judge Kozinski's website were "more than a dozen" copyrighted songs and it has been asserted that they were readily copy-able by the public. While that's a pretty small number relative to the civil copyright infringement actions typically reported, it could still be a copyright violation if others did copy the files. Perhaps more interesting, someone may want to reread the Judge's participation in the July 28, 2000 decision to stay an injunction against Napster.
All in all, it looks like it's going to be a tough week for Judge Kozinski, until now considered one of America's brightest and most influential conservative judges.
eDiscovery tools and techniques - Knowledge Discovery in lawyer's clothing
Tags: technology for lawyers, knowledge discovery for litigation, eDiscovery
eDiscovery was the hands down favorite at LEGALTECH, a huge legal industry expo and conference held in New York a few weeks ago. Although some lawyers have been asking for opposing parties' electronic records during litigation for years, the Supreme Court only implemented a rule on the subject in December 2006. The rule applies to lawsuits in federal court and uniformly places the burden on parties to proactively seek out and turn over relevant digital records from their repositories early in the case. Well more than half the vendors at LegalTech had an eDiscovery spin to their offerings.
As someone who has been both a trial attorney and the manager of large data systems I was somewhat bemused by the marketing efforts. Nearly every vendor's representative told me his or her offering was "unique." When questioned, almost none could say what made them unique or what the software really does. Some had systems engineers on hand from whom I could glean more specific information.
Based upon my conversations with the sales personnel and their systems engineers, eDiscovery is nothing more than Knowledge Discovery (a pre-existing and still rapidly growing field of information technology) in lawyer's clothing. There's nothing fundamentally wrong with that. Lawyers shouldn't have to learn a whole new technology lexicon to do their work; it's appropriate for the vendors to speak in terms that are relevant to the customer. However, since most of the sales reps had a black-box, it's-magic, sort of presentation, I think a little explanation from the perspective of someone who has been working with these issues since long before last year might be useful. Lawyers need to know that not every product is offering the same function or the same quality.
Once a company knows that it is being sued or is filing suit, all relevant records have to be preserved. This means making sure that no one changes or deletes relevant data that is potential evidence or which could lead to evidence.
One of the hard questions is deciding where relevant data might reside. From the hardware perspective, company servers are usually an obvious place to start, but it may also be necessary to reach out to desktops, laptops, phones, and other employee PDAs. Also, if the company uses outside hosted web services, it may be necessary to work quickly to preserve that data as well.
I was surprised to find that most vendors seem to focus exclusively on major corporate databases and a small number of personal filetypes: word processing documents and email-related files. Most did not mention finding instant messages, photo files, internet session logs, or a number of other popular application created files. Also, few mentioned capturing physical access (e.g., swipe card lock records), telephone call logs or voicemail.
Since businesses usually continue to operate during litigation, and that often means legitimate reasons for changing data, the eDiscovery process often involves making and preserving a copy of the data. People have differing opinions about whether it is more effective to narrow what's collected (decide what's potentially relevant first) or to collect everything and narrow later. In theory, the former is cheaper because you're storing a smaller copy. On the other hand, storage is cheap, but failing to preserve the correct data could cost the ultimate price.
Data cleansing is a broadly used term to describe anything done to data in preparation for searching activities. The lawyer might not think much about this step in the process, but one study in other businesses found this typically accounted for 60% of the effort. Some of the most significant challenges are:
Integration: If you're not technical, imagine the instructions you would have to give to file clerks to get them to re-order a million paper files from a chronological system to a topic-based one. Integration is the process of taking data with a structure created by one piece of software and making it understandable to a system with a different structure.
Deduplication: Digital files replicate faster than rabbits. People copy them like mad and, occasionally, electronic hiccups just create them. Deduplication is the process of reducing the multiples to one. In an eDiscovery context this can be a double-edged sword. It can radically improve the speed for answering "is there a document that says...?" questions. But, it may remove the ability to know how many people had copies or where they saved them.
Disambiguation/Fuzzy Matching: Whether by typo or intent, there are often similar but not identical representations of information (think "Robert", "Bob", and "Bobert"). There are a variety of techniques to attempt to figure out which refer to the same information and which is really distinct (e.g., two employees, both named "Joe Johnson"). Some try to perform this before the Search process and others have tools to handle it during the search.
Entity Extraction: From a computer perspective, it's easier to find information in a database (already sorted into a neat table with descriptive column headers) than in running text (called "unstructured" and including things like emails, letters, and written reports). So, there's now an array of software that will attempt to pull everything out of unstructured text and put it in a database.
It is critically important for lawyers to understand what search technologies are actually doing, yet this was the area where sales reps had the least understanding. At the simplest level, search technologies can look for what you know or what you don't know.
In the "what you know" category, the most common is keyword searching, looking for a specific word. This might be a fine method for searching for official documents on a project or deal that will always be mentioned by name in the document. This can be enhanced by Boolean search, the technique that lets you add "and" "or" "but not" connectors between words, so that you can narrow the number of results. But, these are only an incomplete option for searching less formal communications, like email, instant messages, and voicemail, where the subject is often not mentioned.
The next level of "what you know" searching involves data structure. For example, when you see it, you typically know which is a social security number, a phone number, a street address, a person's name. It is possible to teach a computer to do the same thing.
The big jump in technology is moving to inference based searching, when you want a system to find things that are like other things even if the same words are not used. This can sometimes find communications about a person or project only referenced by a nickname or not named at all. In reality, a computer still can only do what it's told and people have come up with a variety of computations to emulate what a human is doing when making inferences. The four I heard from eDiscovery systems engineers were: Bayes, Shannon, linguistic indexing, and semantic indexing. Describing what they do is the subject for another blog, but suffice it to say that they will not likely produce identical results.
Even in paper files, complex litigation discovery has often involved millions of records. Computers, though, can quickly and radically improve your ability to understand what you have. My favorite example is that a five year graph of the S&P500 represents about 126,000 data points. In my day at LegalTech, I was surprised by how little was being said about output formats. I'm not sure whether that represents a lack of availability or a perception that lawyers only want traditional text presentation.
Conclusion: Human document review was never perfect; critical documents have always been missed through concentration fatigue and occasional laziness or dishonesty. Spend a week in a dusty warehouse full of documents, and you'll understand just how easy it is for those to occur. But, lawyers must understand that every one of these electronic eDiscovery techniques can be done well or poorly; that even the best techniques will likely miss something; and some of these techniques (such as entity extraction and inferential searching) are young or imperfect. Performing multiple techniques means compounding the number of misses or errors. These may be the only realistic options for handling millions, billions, or trillions of records, so it's important for lawyers to know enough about the technologies being offered to ensure they ask the questions that matter to them, understand what they're buying and consider the risks involved.
Privacy on the Web - Part I
Tags: privacy technology, technology for lawyers, technology for business managers, technology, privacy
A friend just sent me a blog which is a bit of a rant about some comments on privacy or lack thereof. It provides a good basis to discuss some concepts and misonceptions about privacy and technology.
What does privacy mean?
Donald Kerr, a Deputy Director of National Intelligence, said that our culture equates privacy and anonymity. Like the blog author, James Harper -- of the Cato Institute and other esteemed institutions-- I disagree that the terms are equivalent in the eyes of the general public. Webster's dictionary describes being anonymous as being unknown or not identified, while defining privacy as keeping oneself apart or free from intrusion. In our culture, volition appears to be a key differentiator. When I close the blinds, I'm choosing privacy. When no one notices me in a crowd, I'm anonymous.
Is it unrealistic to expect privacy?
Kerr asserts that privacy doesn't exist and cites the availability of personal information through MySpace, FaceBook and Google. From a volition standpoint, Kerr's statement is a mixed metaphor. MySpace and FaceBook are entirely voluntary, people deciding to post things about themselves for their friends or the world to see. Google, making great strides at "organizing the world's information", aggregates personal information that may not have been intended or expected to be shared. I recently showed a friend that in five minutes on Google I could find more than his professional profile -- I produced his home address, his parents, his religion, his political leanings, and something about his finances. This undercuts Harper's contrary assertion that people have retained the ability to provide their identifiers to some "without giving up this information to the world".
Can individuals control privacy?
Kerr and Harper are talking when/whether/how the federal government should have access to individual information, but the question extends farther. Anyone signing up for access to a newspaper or making a purchase on the web is giving bits of himself away. Most typically, the information is gathered in "cookies", established by the websites and stored on the individual's computer. This summer, one study concluded 85% of users were aware of cookies, but only about 28% were able to successfully delete them.
The public's misunderstanding about their control over personal information in cookies extends past their technical inabilities. The misunderstanding is exacerbated by a little legal wordplay. Nearly every "privacy statement" I've ever read on an e-commerce website says that the information may be shared with "afflilates" but then doesn't define that term. Each of these companies could call anyone, any company, or any government agency an "affiliate" and give them access to cookies or sell them the information in the cookies.
[Stay tuned for Part II, where I'll talk about what business leaders and system designers can do to offer more privacy and still meet their business goals.]