How to define a good causal research project?

Sabin Subedi

I'm sorry I haven't been here for a long time. It's that time of the month when there are lots of admin tasks and I have to submit my first-year report. So, I was busy doing those things, but I should be more consistent now in publishing the blog posts.

Me imagining my readers

So, you all know I met Scott in SGPE summer school and learned from him in a class for the first time in person. I found him interesting in 8-hour-long online courses; there was no way I wouldn’t have found him interesting in person. On the first day, he covered a lot of fundamentals of causal inference and told many stories about how it was established. I have noted them, and I will share them with you in the future.
One of the things that I think you might be interested in from that class was Scott's four steps that define most causal research projects. Anyone today can take these steps, be it a master's student, an individual looking to pursue a PhD or even a well-established researcher. In my opinion, these are the building blocks of how you should proceed even before you start your project. If there are terms in these steps you don’t know or understand, this should be your priority. Feel free to comment if you want to understand anything in particular, but I am certain it will be clear through the series of posts.

My previous post answer most of the process as well if you want to understand it better, so please go through them if that is something you want to learn. So here are the four steps that define most causal research projects, and let’s go through them individually.

  1. Define our target parameters, usually an average treatment effect of some kind

  2. Who choses the treatment? What did they know? What assumption does that imply?

  3. Use estimators that are unbiased in the data you have defined by step 2

  4. Then we estimate standard errors to quantify the uncertainty

1. Define the target parameter

Everyone has heard at least once that it is important to ask good questions to get good answers. Whoever has ever done their thesis has proposed research questions to their supervisors. But, if you have taken my class before or even read my blogs, I have told you some questions can never be answered. Why? Because you don’t have access to some data or can never access it. What are these, you may ask? These are counterfactuals. Read my previous post if you want to understand it better. As I ended my last blog post, there is some average treatment effect that we care about. This is the answer that we want.

2. Who chooses the treatment?

I have already explained this is depth in my previous post here. But here is the Too Long Didn’t Read (TLDR) version of it:

  1. Context is King in all types of questions

  2. Get Real-World Understanding (Get out of your data)

  3. Decision Makers Matter (How is treatment decided? Who decides who gets the treatment? How do they decide how someone gets treated?

  4. Keep Talking (Conversations with decision makers, stakeholders, and your friends)

Flexibility is key, and persistence to understand true mechanisms pays off. There is so much you can do if you learn how the variables of your interest interact. Knowing this mechanism will allow you to make a clear identification. Had he not kept talking, he would not have known that there was randomization within the system. And as researchers, our job is to uncover these mechanisms through persistent questioning, reasoning, and keen observations.

So the gist is to know the real-life in’s and out’s of the question you want to understand alongside knowing the literature.

3. Choosing the estimator

Once you know the parameter that you are estimating and the context of the treatment assignment mechanism, this will answer your question about the type of estimator you want to use. Can you, by controlling some variables, estimate this estimator? If yes, use a simple OLS regression. Is there an exogenous or random variable that randomly assigns treatment? Then, use the Instrumental Variable method. I hope you get the gist. The idea is not to think of the method beforehand, the methods and estimator should result from your understanding of the treatment assignment mechanism.
Here is an excerpt from the paper LaLonde (1986) after Nearly Four Decades: Lessons Learned by Guido Imbens and Yiqing Xu. It can be taken as another way to define better research. But, the points here are correlated. His first step, as you can see, is a grasp of “design”, which is exactly what we mean in step 2.

4. Standard Errors to quantify Uncertainty

The last part of this step is standard errors. You might think, this is something non-trivial. But getting the right standard errors is of utmost important. If you haven’t read this paper, I suggest you. It clearly shows, how the clustering of standard errors makes so much difference in validating the results. So, always use right kind of standard errors. If possible use validated packages that run the modern methods. Here is a link to the list of available DID packages. Do not run the final regression manually for these methods, because manual methods don’t give you correct standard errors. For eg: when running the Two Step Least Square method for IV, you can get same estimate manually and by using a package, but the Standard Errors are very different from each other.


Conclusion

Understanding your data, context, and treatment assignment mechanism is extremely important. This helps you understand the limitations of what you can do, what parameters you can answer, and how you can estimate those parameters. So, when you start your next project, remember to go through these four steps.

In the next post, we will look at Unconfoundedness.


This post was originally published on Substack. If you enjoyed this content, consider subscribing to my newsletter for regular updates.

Subscribe to My Substack

Get the latest posts delivered directly to your inbox. No spam, unsubscribe at any time. All the posts below are fetched from substack, for better reading experience head over there

Subscribe Now