Calculating public burden using OIRA data -- Part Two
An experiment in using open data to make government better
Published on: Feb 13, 2017

Yesterday, I published an article about using open government data to hunt for paper-based information requests by the government. Based on the data, it looked like there are still a lot of hours spent filling out paper-based forms. As I noted, though, I ran out of time to do careful analysis. So, today, let's explore deeper.

First, we'll create a histogram to look for the distributions of requests. To do so, we'll use pandas to examine the results data, and specifically the histogram method.

In [1]:
# Set up the graphing environment. Because I'm using jupyter notebooks, first I need to tell
# it to show the graphs inline. I also use the `ggplot` style, because it's less hideous. 
%matplotlib inline
import matplotlib
matplotlib.style.use('ggplot')
In [2]:
import pandas as pd
data = pd.read_json('results.json')
data.burden.plot.hist()
Out[2]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fca4799d358>

Wait. Hold on right there. That's not what you'd expect to see. That looks like there's an outlier. Let's see what that might be... To do so, we look for the top ten burdens.

In [3]:
data[["burden", "title"]] .sort_values('burden', ascending=False).head(10)
Out[3]:
burden title
720 2997500000 U. S. Business Income Tax Return
729 48731780 IRA Contribution Information
719 34115874 Form 1099-DIV--Dividends and Distributions
718 24951529 Return of Organization Exempt From Income Tax ...
248 20036012 2017-2018 Free Application for Federal Student...
509 13500230 National Fire Incident Reporting System (NFIRS...
717 10880812 Employer's Annual Tax Return for Agricultural ...
497 9902378 Arrival and Departure Record
449 7736084 Physician Quality Reporting System (PQRS) (CMS...
713 7041290 Customer Due Diligence Requirements for Financ...

Oh dear. Looks like we've got a pretty obvious mistake here: "U.S. Business Income Tax Return" can definitely be filed electronically. Same with the other things on the list. And that one outlier accounts for 3 billion of the 3.3 billion hours. Oof. So what gives?

Well, it turns out that the way that OIRA displays the burden data is that if any of the forms that are part of an information collection request is not electronically available, then the burden for all of the forms gets aggregated. And unfortunately, there doesn't seem to be an obvious way to back out the other forms. So, that's not very useful, unfortunately.

Let's see what the total burden is if you remove the top 20% of information collection requests.

In [4]:
"{:,} hours".format(data.burden.sum() - data.sort_values('burden', ascending=False).head(220).burden.sum())
Out[4]:
'5,589,316 hours'

So, that feels a lot more sane, and a lot less exciting. There are only 5,589,316 hours of public burden for everything but the top 20% of information collection requests.

In the end, this is a great lesson in how a data schema can lead to incorrect conclusions.

Still, we have some good data near the bottom of the chart.

In [5]:
data.sort_values('burden').head(890).burden.plot.hist(bins=30)
Out[5]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fca467f8080>

In other words, there are a lot of information requests that account for a couple hundred hours of public burden. Not a surprising result, but perhaps even more useful in the end. This result means that there are about 200 forms in the middle that account for much of the remaining burden hours. Now, that seems like a good place to start.


Calculating public burden using OIRA data
An experiment in using open data to make government better
Published on: Feb 12, 2017

Recently, the new Administration issued an Executive Order aimed at Reducing Regulation and Controlling Regulatory Costs. As part of this effort, the Administration is supposed to offset regulated costs.

So, that got me thinking. The Office of Information and Regulatory Affairs (OIRA) is charged with reviewing not only regulations, but also is charged with reviewing agency's information-collection requests under the Paperwork Reduction Act. And as part of that review, OIRA and the agencies are supposed to track the public burden associated with the information collection.

As a thought experiment, I decided to see whether we could find some low-hanging fruit, namely paper-based information requests. And the results were interesting...

Read more

Should lawyers learn to code?
Yes, but we should not strive to be coders
Published on: Aug 14, 2016

For the past several years, I’ve been asked one question many times: “should lawyers learn to code?” Over those years, my view has been mostly consistent… “yes, lawyers should learn to code.” Probably unsurprising, given that I wrote Coding for Lawyers several years ago.

But, there’s always been a lingering bit of doubt. “Should all lawyers learn to code?” I would quietly ask myself. “Why?” I’d wonder. What specifically about coding did I think lawyers should learn?

Recently, the parlor game has been played out many times over amongst the #legaltech set, and folks are taking sides. So now, despite my previous reservations, here is my full-throated argument for why lawyers should learn to code.

Read more

When a micro-purchase doesn’t work out, we try to learn from it
Lessons from the trenches
Published on: Jul 09, 2016

This week, I co-authored a blog post for the 18F blog, entitled: When a micro-purchase doesn’t work out, we try to learn from it. It discussed a thing that is rarely discussed in government: failure. Here’s the opening graf:

Two months ago, the 18F acquisitions team ran a public micro-purchase auction to find a vendor to develop a small new feature for 18F’s cloud.gov, and for the first time after several successful micro-purchases for other products, the contracted vendor didn’t deliver the code on time. This was very interesting to us — we’re early in the life of the micro-purchase platform, and we believe that failure is a great way to learn. In the spirit of experimentation and sharing our lessons, here’s how we went about analyzing this, and here’s what we learned.

I encourage you to read it!


The Code of the District of Columbia is now available online
And I couldn't be happier
Published on: Jun 26, 2016

At long last, the Code of the District of Columbia has a permanent URL, within the dccouncil.us domain. This may not seem like a big deal, but this simple event is the culmination of years of effort, and I couldn’t possibly be happier.

Read more

DC's Voter Rolls are on the Internet
Is this 'Shocking' or is it 'same old same old'?
Published on: Jun 19, 2016

Earlier this week, the Washington Post ran an article with a headline destined to scare the crap out of DC’s voters: “D.C. makes it shockingly easy to snoop on your fellow voters.” But behind this hyperbole was a simple act; the DC Board of Elections posted the voter roll on the internet for public inspection. For those who might not know any better, this must have been quite a surprise. But for close observers of DC’s elections, this was, well… a nothingburger. Here’s why.

Read more

My theory about The Americans
Prepare for your mind to be blown
Published on: Jun 11, 2016

This week marked the finale of season 4 of the Americans. Like almost everyone else, I loved it. Already, I can barely wait until the next season starts. But as I prepared to watch the finale, I had a nagging thought. I just couldn’t let it go. And now, I am absolutely convinced that … [Warning, serious spoiler alert ahead!]

Read more

Storytelling and federal procurement
A lesson in how to explain complicated things
Published on: Jun 05, 2016

Last week, after chatting about challenges in federal procurement, a colleague suggested a book entitled the “Free Enterprise Patriot.” The opening statement of the book sets the stage:

Read more

On links to court filings
Journalists should link to court filings by default
Published on: Feb 20, 2016

Dear Media,

It’s time we had a talk. Because you’re hurting democracy.

Read more

6 months into 18F
An update
Published on: Sep 10, 2015

Several months ago, I described my intention to leave a happy job in the law and join the emerging government technology office known as 18F. Today, 6 months after starting at 18F, I want to give an update about how it’s going.

tl;dr It’s better than I could have ever imagined.

Read more

Mailmerge for Word Docs... in Python?
A neat trick for document automation
Published on: Jan 25, 2015

I’m going to say something nice about Microsoft Word: there’s a simple loophole to its impossibly ornate OOXML schema that allows for document templating. If you are trying to do some document automation for Word documents from Python (or other languages, I suppose), listen up.

Read more

Joining 18F
Why I'm leaving the greatest job in the world for another one
Published on: Jan 20, 2015

Today, I informed the members of the Council and my colleagues that I will be leaving the District government at the beginning of March and joining the growing ranks of public servants at 18F. One question I have heard from friends, colleagues, and family is “Why?” It’s a fair question. Those who know me know that I love my job: my staff is amazing, the work is fascinating, and I have been given extraordinary opportunities to serve the District of Columbia. So what gives?

Read more

Dogfooding with Jekyll
Using the new `data_source` configuration to serve mankind
Published on: Nov 29, 2014

Yesterday, I learned that Jekyll, the well-known powerful static-site generator, has a little-known feature that is kind of a big deal for open-data sites hosted on Github.

tl;dr: Jekyll can let you consume and publish data files with the data_source configuration setting

Read more

Client Confidentiality on Trello?
Why two-factor authentication matters for lawyers who want to use the agile tool
Published on: Nov 29, 2014

This weekend I signed up for Trello. I started playing around with it, started liking it, and then I hit a snag. There’s no two-factor authentication (“2FA”).

As a practicing lawyer obligated to protect client confidentiality, this is a major barrier to entry. Fortunately, Trello has announced that 2FA is on the way. This is a great development. Trello has announced that 2FA is In Progress. Read on for why this matters, especially for lawyers like me. Read more


In Praise of Commoditization
Open source takes a village
Published on: Nov 28, 2014

Earlier this week, Dr. Robert Read and Eric Mill penned an article for the 18F blog, entitled How to Use More Open Source in Your Next Federal IT Acquisition. It’s an important article for a variety of reasons. Most of it is a pitch-perfect explanation of why open-source tools are more important than ever, and why federal (and ahem local and state governments) should be looking for opportunities to use open-source tools.

Read more

Court Statistics: Part I
Why we may need to open a "floodgate" of judicial data
Published on: Nov 15, 2014

This weekend, I spent approximately 16 hours sitting in a windowless meeting room in a Chicago hotel discussing specific processes for arbitrating family-law disputes. This is how Uniform Law Commissioners like me get our kicks.

During the weekend, I learned of a recent article entitled Let’s Stop Spreading Rumors About Settlement and Litigation: A Comparative Study of Settlement and Litigation in Hawaii Courts, written by one my fellow commissioners, the multi-talented Elizabeth Kent and her co-author John Barkai.

Based on this article, I plan on writing three blog posts about judicial data and, hopefully, make the case for lawyers and the courts to think more critically about the need for good judicial data.

Read more

"Dumb" Government Data
Doing small things well
Published on: Nov 12, 2014

I recently was named a member of the Mayor’s Open Government Advisory Group. Among the things that the Group will be tasked with is “[e]stablish[ing] specific criteria for agency identification of additional datasets.”

Read more

An RSS feed for LIMS
Syndicating data for better legislation tracking
Published on: Nov 07, 2014

Recently, I built an RSS feed for LIMS.

The URL for the RSS feed is here: https://esq.io/lims-rss.xml, and the source code is available here: https://gist.github.com/vzvenyach/757fa97fd99c3a14e798. This post explains my reasons for doing it.

Read more

Hello World
Just what the world needs: another blog...
Published on: Nov 06, 2014

I’ve decided to start a blog. I’ll explain more later.