A Deep Dive into Big Data

A Deep Dive into Big Data

Brian Flaherty

I have a colleague who now and again laughs about the time “when we were wizards.”  When we reference librarians would wave our wands (or eyes) over digests or (worse yet) shepards – gibberish to mere muggles – and come up with citations to relevant legal authority crowned with the title “good law.”  He points out that we are no longer wizards – that with google scholar & well chosen search terms, people unfamiliar with the law are able to do adequate legal research (of course, I’m quick to point out that “adequate legal research” just doesn’t cut it these days).

I say this because just now, I think some librarians look at “big data” as mystical, and at the folks who can harvest and manipulate it as magicians.  But demystifying it is, I think, essential for a clear understanding of what’s behind it, how powerful it is, and how it can be used.  The “Deep dive” on big data went a long way towards doing this.  I am going to attempt a short summary, but what I write here will be inadequate.  I urge everyone to get the powerpoints and reading lists from the AALL site when they’re available, and become familiar with this.  The overused phrase fits here: it is the future of the law practice.

Briefly: Big Data are information sets that are too large or complex for traditional processing models.  For example: a data set including every federal case would be “Big data.”  Robert Kingan from Bloomberg Law began the program with a great definition of what constitutes big data, and a discussion of just how difficult it is to collect it in a form that is useful for any kind of analysis.  He said that some 80-90% of the work for any kind of a project is just collecting and cleaning the data, putting it into a format where it can be used.  Think, for example, of getting the aforementioned set of all federal cases into a spreadsheet, where one column was “Judges name” and you begin to get a feel for how huge an endeavor this is.  Daniel Lewis from Ravel talked a bit about how they go about manipulating the data once they’ve got it – with a short discussion of the uses of SQL and NoSQL (“Not Only SQL”) and the benefits of both.  Irina Matveeva from NexLP gave a short discussion of Language Processing – what would seem to be the next step of data analysis – where there are programs that can do document analysis, email forensics, and other linguistic manipulation extremely quickly.

Following this introduction, and a brief discussion of how BloomberLaw (Robert), Ravel (Daniel) and NextLP harness Big Data in the resources they provide, we were given the opportunity to explore what planning a “Big Data”project would be like.  Folks got into groups at tables and devised a possible project, a list of some of the resources they would need, and a list of some of the necessary players (e.g. librarians, programmers, 3rd party vendors).  Some of the ideas were fantastic: one table talked about creating a predictive tool that could be used to determine whether a law enforcement officer would be likely to be accused of a civil rights violation – and what kind of data sources would be necessary to create such a tool (personal history? Demographic information?).  At our table, one person was engaged in creating a resource that would predict the likelihood that a piece of legislation would pass – and so we talked about the data necessary to do that: the sponsor’s history, party affiliation, words in bill titles that have passed, public sentiment (retrieved from news sources & social media).

In all, Big Data is fascinating stuff – incredibly useful for its predictive value of everything from the outcome of a court case, to the passage of legislation.  Not only should we be paying attention, we should be “deep diving” into it, to understand what it can do for us and the legal community.

Dispatches from AALL: Chicago – On Gender

AALL Chicago: On gender

by Brian Flaherty

Welcome to Chicago & AALL – if only by remote. I want to include here a few dispatches from AALL, if only to be expanded on at home.
I wanted to start by commending the SR-SIS, and specifically the committee on Lesbian and Gay Issues, for offering “Pronoun choice” ribbons in the exhibit hall. These are the lime-green ribbons dangling from our nametags that read “My pronouns are” – and then different options: he/him, she/her, they/them, and an ribbon with a blank, with an available sharpie to write in the pronoun of your choice. Bravo.
A quick digression into why this is important, and why I think it’s important for cisgendered folks like me (cisgender: a person whose gender presentation and identity match their biological sex) to wear them. We live in a world that that tries to enforce gender norms – tries to assign us a gender – at every turn. From honorifics to pronouns, to the way we’re treated at the grocery store, to the way we’re greeted at a conference. For someone who doesn’t spend a lot of time thinking about gender, this “gendering” of the world blends into the quotidian – it’s just the way things are. But for folks who think about gender a lot, for example, folks whose gender presentation doesn’t correspond to their sex-assigned-at-birth, this constant mis-gendering is a relentless reinforcement of the idea that they don’t fit in.
Using someone’s chosen pronoun may seem like a small part of changing the world but there’s that maxim about the thousand mile journey beginning with the single step? There’s also something somewhere about the first step being the hardest but most important (if only I had a librarian nearby to help me look it up… oh, wait…). One day perhaps, the phrase “do you have a preferred pronoun?” will be as common as “what was your name again?” (a phrase increasingly common as I age). Until then, however, explicitly allowing people to claim a pronoun is a great stop.
My gender presentation matches my biological sex (I am cisgender), and the pronouns I use are he/his/him. Moreover, by my presentation in the world, I don’t anticipate being mis-gendered by pronoun or otherwise. Still, I wear this ribbon proudly (and to be very clear: I’m encouraging all of you to stroll on over to the SR-SIS folks and ask for a ribbon) to reinforce the notion that sex-assigned-at-birth and gender presentation don’t always match. And while we were all a bit young to have a say in how we were called that first day, we all should have a say in what we are called today.

Theory into Practice: Coverage vs. Comprehension

It’s been quite some time since I believed that Lecturing was the best way to teach legal research.  But I still find myself having the same dialogue, internal and external, that led me to believe that it is better to sacrifice comprehensive coverage for the sake of better retention:  do I assemble everything I know about research into a carefully crafted lecture where everything fits together perfectly (what I’ve come to call the “tetris lecture”)?  Or do I sacrifice the time it would take to explain,  for example, what the appellate division of the trial court is, where its decisions are found and what authority it wields, in the name of giving students practice doing hands-on research, analyzing fact patterns and discovering effective research strategies?  The answer seems clear:  I think it’s far more important to give students practice working through complex research problems than it is to devote the time to teach them how to navigate, say, the Descriptive Word Index, when the likelihood is they’ll never use one outside of law school (if even there!).

The answer seems clear.  But the problem is, every year, it seems like there is more to cover.  Every year we want to devote more time to practical work to make sure students “get it.” And so every year it seems that there is more to sacrifice.  For example, this year we want to devote more time to having students work through transactional research problems – which is a type of practice many of our graduates wind up in.  And so this year, we are seriously considering eliminating any time devoted to public international law, because the reality is, few if any of our current class are going to end up working in international human rights, with the United Nations, or for anyone else “doing” public international law (NB: at this point we do still cover private international law research, because our graduates are much more likely to encounter issues between private parties in separate nations, than they are to encounter issues between nations themselves).   But it pains me to do this, because I know this stuff; I love this stuff.  It seems like the right thing to do, especially given all of the hub bub about “practice ready” graduates.  But the warehouse of knowledge I will likely never share – how to deftly assemble all of the necessary Shepard’s volumes and supplements for an accurate analysis, or how to link up the dates of the various lists of sections affected so as to not miss a single federal register – is growing.

Does this story seem familiar?  Are you having to sacrifice things you know and love to teach, in the name of making sure students walk away more competent and confident?  If so, what have you dropped?

Dispatches from AALL Philadelphia: Legal Innovation

So here’s a good rule of thumb: whenever you get the opportunity to hear Legal Informatics fellows Pablo Arredondo (Casetext) & Daniel Lewis (Ravel Law) speak, you should take it.  Add Cisco security expert Lance Hayden and you’ve got the makings of a really excellent program.  I’m reporting here just a fraction of the program – if you get a chance to listen to it when it comes up on AALLnet, you should.

Daniel talked about processes being data driven – and how law is really becoming one of those processes.  He analogized data analytics in law to Baseball (see Moneyball) & Politics as examples of fields where data analytics gives players clear advantages. All of the information is available in the legal opinions: which judges are more likely to rule for or against you given a set of circumstances – the analytics harnessed by Ravel harvests that information and uses it to help lawyers make better strategic decisions.

Pablo talked about his goals in developing casetext – a resource that harnesses the power of information produced by attorneys: client alert letters, blogs, online briefs and newsletters, and uses that to enhance a legal search engine that includes state and federal cases.  Casetext leverages this great untapped source of information to create a free legal research engine, essentially annotated by these alert letters and blogs.  Members of the legal community are invited to annotate legal opinions, or to upload appellate briefs that they have access to.  This is not crowdsourcing per se, it’s more “communitysourcing,” where your name is attached to any annotations that you add – a sort of quality control by reputation.

Lance talked a lot about security and hacking – and how “hacking” did not always have the negative connotations it does today.  “Hackers” were people – programmers – who could manipulate a system to do something quicker, easier, accurately, and efficiently (think “Life hacks”).   He spoke in metaphor a good deal: he talked about law being the “software we use to run society,” and how good lawyers are essentially hackers of the law.

This was an especially thought provoking program & worth a listen.  Towards the end, Lance articulated something that I think gets to the heart of what is so deeply awesome about librarians: as a group, we are dedicated to doing right by information and by information consumers.

Dispatches from AALL Philadelphia – Something I learned: Electronic routing and current awareness

From Jessica Panella: Sometimes small snips of ideas can be very powerful and helpful. From LLNE’s own Boston University School of Law Fineman and Pappas Law Libraries was a poster on expanding a library’s current awareness lineup.  Jennifer Robble, Corinne Griffiths  and Rebecca Y Martin reviewed an interdepartmental task force which implemented an electronic routing system for faculty routing. The library is still providing electronic routing services for roughly 175 unique titles using JournalTOCs, MyHein Title Alerts and publisher specific resources. I am totally taking this back to UConn School of Law.

Dispatches from AALL Philadelphia – something I learned: presentation resources

From Diane D’Angelo: Want to create cool animated presentations? Check out Powtoon! It’s a free resource that will help you captivate & engage students. Attorneys, judges, faculty and deans will also be drawn in to what you have to say with this fun way of presenting. http://www.powtoon.com

Dispatches from AALL Philadelphia – More praise of round tables:

Because there are so many folks from LLNE here in the city of heat and humidity – er, I mean brotherly love, I’ve been asking folks to send me snippets of things they’ve learned at various programs, from posters, & from interactions with library folk from around the country.  Over the next few days, I’m hoping to post short snippets of what they say here.

One of the first things I went to here in Philly was a RIPS roundtable on distance learning.  We’re in the process of putting together an asynchronous research class, and so I took this opportunity to learn from folks who have been doing it.  The overall take-aways were: there are as many different ways of doing this as there are people doing it, on platforms from TWEN to Canvass, some using interactive discussion boards, some relying more on video presentation & written work.  What is very clear is that the folks who are doing this all benefit from a sharing of resources and best practices – and to that end, we began a collection of names and email addresses of folks interested in sharing resources and best practices.

A few of the things we talked about:

  • In terms of assessment, consensus was that students should be asked on a weekly basis to do short research assignments, & produce written trails that require students to demonstrate the ability to use the resources they’ve learned about, rather than just regurgitate what was in the reading or on the video.
  • Students may expect quicker turnaround of assignments for online classes. Their impatience may be forestalled by giving general feedback to the class as a whole (i.e. “What I’m seeing in these assignments…”) before actually returning the assignments.
  • People had various ideas for inspiring interactive discussion, i.e. getting students to use online discussion boards. Some suggested that a short video presentation – either the beginning of an instructor-student discussion of a topic, or even a short “client interview” type skit – has worked to spur active discussion.
  • Some folks are using the interactive discussion boards as a way of “taking attendance” – making sure that the class comports with the ABA Standard 316, which governs distance learning.
  • As always: humor works – & is a great tool to engage students, even asynchronously.

There was a great deal of enthusiasm in the room – at the table – because it seems clear that asynchronous/distance classes are a large part of the future of legal education.  Sharing resources & best practices will become more essential as we all work to figure out how to do this most efficiently.

LLNE Legal Research Instruction Program

Every year LLNE runs the Legal Research Instruction Program, offering legal research training to folks in the community, from technical services librarians to library students, for the amazingly low price of $150.00. This was my first year administering the program, and it made me realize how much it depends on the incredible support of LLNE members. So I want to take a moment to thank those who made this awesome program happen:

  • Claire DeMarco from Harvard, for giving everyone a great overview of the legal system.
  • Rick Buckingham from Suffolk, for teaching the ins-and-outs of caselaw research
  • Kristin McCarthy from New England Law, for a nifty overview of statutes
  • Steven Alexandre de Costa from Boston University, for a dive into administrative law (and additionally, for inspiring at least one of our attendees to comment on proposed administrative law through regulations.gov)

And last, but certainly not least, a huge thank you to Ron Wheeler and Suffolk University Law Library for hosting the Legal Research Instruction Program. In all, we had 9 students, gave out 2 scholarships, and ran 6 individual sessions. Thank You!

Theory into Practice: Looking Forward, Looking Backward

by Nicole Dyszlewski, Roger Williams Univesrity School of Law

When Brian Flaherty started his “Theory Into Practice” posts on the LLNE blog, he asked some of the members of the LLNE Executive Board if there had been any memorable experiences they had to share of turning something they learned at an LLNE meeting (theory) into something they did at their own library (practice). I immediately emailed him that I had something to share.


Back in November, 2010, I attended the Fall LLNE meeting hosted by Northeastern University School of Law Library titled “Improve Your Workplace Health! Inoculate Against Bad Morale.” What made this meeting so memorable is that it focused heavily on bringing positivity to the workplace. The theme of the meeting resonated strongly with me and I have tried to bring some of the lessons learned from that meeting to work every day.


It is not always easy to be positive at work. In fact, sometimes (Mondays? Snowstorms?) it is downright impossible. What I learned from the Fall 2010 meeting is that there are small things we can do to try and increase morale. One of the things I do to boost my own mood is listen to upbeat music. In the registration packet for that meeting we were given a list of upbeat songs (one song on the list was Katrina and the Waves “Walking on Sunshine”!) and while I can’t say that I play “Don’t Worry Be Happy” every day before my reference shift, I can say that I have made an effort to have a few “happy” Pandora stations at my disposal when I am working on a tedious task or just need a burst of sunshine.


Another way I have worked to inoculate myself and my workplace against negativity is by adding a bit of fun to my job. For example, I regularly participate in the Green Bag’s Lunchtime Law Quiz. While the weekly question itself is released at lunchtime on Monday, you usually have a day or two to research and answer the question. While I find legal research to be fun on its own, the Green Bag quiz is a lighthearted way to take a break from serious work and flex my research muscles on something more humorous and less consequential. Not only does it give me an opportunity to discover (or re-discover) some of the resources in my library’s collection, but it gives me an opportunity to discuss possible answers with other librarians who may also be stumped on a question. If you haven’t tried it, you should!


This Spring, the LLNE meeting is being co-hosted by the University of New Hampshire School of Law LibraryThe Association of New Hampshire Law Librarians. The meeting’s theme is Mindfulness and Librarians. According to the description, “We will explore how practices will lead us, and those we serve, to… decrease stress and anxiety, cultivate and advance joy and satisfaction in the practice of law.” I look forward to attending this meeting and finding new ideas to bring back and put into practice!

Theory into Practice: CALI Author

by Brian Flaherty

In my inaugural “Theory into practice” column, I talked about taking some of the great things I took away from the AALL conference in San Antonio, and putting them into practice in my Advanced Legal Research class – specifically I had students take pictures of things that they thought were, or should be regulated. My experience incorporating this into my class was great – it went smoothly, wasn’t as much work as I thought it would be, and inspired some clear understanding of the difference between statutes, regulations, and ordinances.

Of course, my co-teacher pointed out that while everything I wrote was indeed true, I was painting an unduly rosy picture of my success in incorporating all of those nifty conference take-aways. That not everything I’ve learned-there and tried-here has gone as… smoothly as what I wrote. Perhaps, she said, I should write about something that was significantly harder to incorporate. Perhaps I should write about my experiences with CALI Author.

Why CALI Author

We’re all familiar with CALI Lessons – and we are all familiar with the fruitless search for the perfect CALI lesson: the one that will help students learn about publication patterns, online indexing, disposition tables – the things we want them to know, but don’t want to spend the time on in class. Don’t get me wrong – there is some great CALI stuff out there (some of it written by people I hope are reading this article) – but nothing that was perfect for my purposes. So several years ago, I came upon the brilliant idea to write my own. I could write it specific to my need, and could customize it to reflect internal library resources.

I downloaded an earlier iteration of CALI Author, with manual, thinking “I’m reasonably good at this stuff – I can make this work.” I proceeded to spend the next two hours feeling like I was back in the early 90s, trapped in HyperCard for Mac (for those of you who took that class at Simmons, you know what I mean). And so thinking “I’m a reasonably good teacher, I can get by with existing CALI lessons,” I deleted this earlier iteration of CALI Author, with manual.

A visit to the most recent CALI conference at Harvard convinced me to try again. Furthermore, CALI’s ever helpful Deb Quentel promised me that if I couldn’t make Author bend to my will, she would help. And so I downloaded it at the beginning of last summer, read the manual, and started making sample pages, sample questions, sample links, and saving sample lessons.

When you get the hang of it, CALI Author is clunky, but no big deal, really. In short order you can learn to learned to create pages that lead to other pages, you can create scored multiple choice exams, even add and manipulate images. But then, for example, you discover that you’ve made a mistake on page three of a twelve page lesson, so you edit or delete page three, and KA-BLAMO, everything you’ve done vanishes, only to be found in an alphabetical list of pages at the end of the list, which you didn’t know existed until AFTER that panicked Email to Deb Quentel (Thank You!). There’s a whole series I could write called “Squashing the CALI Author Bugs” – but in the interest of time I will tell you that there are many traps for the unwary, but two things proved invaluable: first, the folks at CALI – especially the ever helpful Deb Quintel – are geniuses of patience and service. And second, there is a YouTube video series that includes a great tutorial series – far easier than the manual.


So now that you’ve got your perfect CALI Author lesson, what do you do with it? The process of actually publishing a CALI lesson for public consumption takes some time – far too long if, like me, the lesson you’re writing is for use in your class next week. So CALI Author gives you the ability to publish something privately and make it available by sending the lesson URL – “AutoPublish.” Furthermore, if your students log in to CALI before doing the lesson, you can track them through the AutoPublish link on your dashboard. Again, be forewarned, the process is not as straightforward as perhaps you’d expect; you have to publish the lesson, and upload each media file separately – and every time you go back into the lesson it appears as though you have to re-upload all of the media (you don’t). But the ability to have them host the lesson that you can run privately is great.

Editing existing lessons

One of the really great features you get with CALI Author is the ability to download existing CALI Lessons and adapt them for your purposes, to target your students. For example, let’s say there is an especially great lesson that covers print statutory research, and you want to use the same structure but incorporate online research. You can download the lesson and open it up in CALI Author, then edit the images, questions, and “book pages” to include an online component. You are not editing the CALI lesson as it appears at http://www.cali.org; when you are finished, you need to AutoPublish it and send it out to your students. But it’s a great shortcut to creating targeted lessons, while avoiding some of the complex and time-consuming outlining.


You eventually manage to create what you think is the perfect CALI lesson, comparing KeyCite Shepards an BCite, showing the fallibility of citators, showing the conflicting treatment notes. You test it, you show it off, and it works nicely. But don’t get complacent… things can STILL go awry. After this steep learning curve, I finally released a lesson to do just this- point out some of the flaws and contradictory information found in citators. And wouldn’t you know it… in just one semester, one of the vendors changed their citation information! Oh well – back to the drawing board (ps: if anyone is interested, that lesson is here: http://www.cali.org/node/15905/)