Another post I’m putting up mostly so I have something to refer to in a week when I forget how I did this.
I’ve been wanting to beef-up my AWS skills for a long time. The main thing that’s slowed me down is that we cannot store government data on/in Amazon’s AWS ecosystem. This isn’t really a hard roadblock, it’s just that a lot of my blog content is generated from little programming or data hurdles I encounter at work.
Been a long time….if that intro didn’t immediately make you think of Christopher Walken than I’m begging you to watch this:
So I recently came across this cool little paper/project/effort:
so I had some fun messing around with a new R package for doing AB Testing in a Bayesian framework. Here’s my report:
Yes, I’m still talking about pipelines…but in my defense I think we are starting to get to the really cool stuff.
Ok, so we’ve been talking about pipelines here on the blog formerly known as The Samuelson Condition.
To follow along with this post hop over to GitHub and clone my abalone-age repo.
It will come as no surprise to people here that I think the concept of determining the impact of some event by looking at whether a line went up or down around the time of that event is farcical. I also realize that talking about the ‘correlation isn’t causation’ cliche to a bunch of statistically literate people is totally unnecessary. Yet, I feel compelled to write about this phenomenon because it’s so pervasive in my life:
I found this post on the interwebs and thought it was pretty cool. I mean, I’m not enamored with the whole, “don’t worry about understanding what it’s doing, just run the code and get a feel for how to do it” vibe…as a practicing empiricist I’m pretty well aware of the fact that anyone can run the same R/Python/whatever code that I use to run a Neural Network, Support Vector Machine, Classification-and-Regression Tree, insert hip new ensemble method here. The thing that makes me worth anything at all - if I am indeed worth anything - isn’t that I know how to tell R to train a Neural Network, it’s that I know what the code is doing when I give it that command. I have a decent (a little better than most, a lot worse than a few) grasp of the technical detail and nuance (read: the math) of the popular machine learning and applied statistical algorithms used to do prediction.
Full disclosure: I’m not 100% confident that my assessment of Linear Discriminant Analysis relative to Logistic Regression is totally accurate. I’ve been thinking pretty hard about this for the past couple days so I’m reasonably confident that I’ve not said anything rediculous here…but if strongly disagree with characterization of LDA estimated coefficients relative to Logit coefficient estimates I’d love for you to drop some knowledge on me.
This is a little bit of rehash of an earlier post on President Cheeto Jesus’ economic agenda. The core message is basically the same, but the supporting evidence has been vastly simplified.
I built my first R Shiny app. It’s laughably simple but, in this case, I think the simplicity will make for a good first blog post on R Shiny.
In my last post I tried to set the stage for a discussion (one I’m pretty much just having with myself at the moment) on regional housing policy in Santa Cruz County. For whatever it’s worth, I also blogged about this a while back on The Samuelson Condition Blog. Today’s discussion on Accessory Dwelling Units will be decidedly more nuanced than what I wrote a year ago and, as any good Bayesian would when confronted with additional data points, my opposition to subsidized ADUs has softened somewhat.
“A problem well posed is half solved.” – John Dewey.
This is going to be a pretty remedial post about using Facebook’s Graph API with Python. At this point I’ve only figured out how to do some pretty basic shit….enough to share but probably not enough to be really cool. I’m planning on posting a follow-up this week where I’ll focus on shoving the dictionary and list output you’ll see here into pandas dataframes.
I’m only the 3 billion-th blogger to write about this but for some reason, even with the interwebs saturated with python-Oracle connection examples, this still took me pretty much the whole day to figure out.
Super short post here because the day is pretty much over…but I just discovered a Python module for grabbing Census Data and I had to share.
Economists tend to think a little differently about things. We generally process things around us through the 3-tiered filter of trade-offs, opportunity costs, and credible counterfactual scenarios. I contend that this usually results in interesting insights into social-behavioral outcomes that would not otherwise be gleaned….but I’ll freely admit that sometimes (usually when applied poorly) it produces lines of thought that are just silly.
I had a pretty cool little quantitative micro-targeting of policy issues model I wanted to show you guys today. But I had to get this Census Bureau API-R relationship smoothed out for a work thing and I’m kinda thinking it will have more universal appeal. So I’m posting it first.
I have been looking at other peoples’ awesome R-powered geospatial analysis for what feels like years and, until now, every time I’ve sat down to try and do some spatial analytics in R I’ve been stymied by wierd package load errors. I’ve been poking around this problem rather casually for several months and last night I think I finally made some tangible progress. I’m pretty stoked about this so I hope you will be too.
I spent a non-trivial amount of time this week trying to pick apart the code in R’s rgp package and Matlab’s GP tips to see if I could modify it to do coupled dynamical systems…I have no notable progress to report on this front.
I’ve been casually reading papers on Genetic Programming and Symbolic Regression for a little over a week now so I figured it was high time I stopped thinking about GP and started trying to DO a GP.
I know I said last week’s post would be my final words on Twitter Mining/Sentiment Analysis/etc. for a while. I guess I lied. I didn’t feel great about the black box-y application of text classification…so I decided to add a little ‘under the hood’ post on Naive Bayes for text classification/sentiment analysis.
After re-reading my last two posts on this topic, I felt like they were a little unfocused. I’m going to take one more shot at putting simple realistic example out there.
My first post on this topic was pretty rushed.
I didn’t have a ton of time today so this will be, I’m sure, mercifully short. Wanted to revisit something I tried to do a while back in R: scraping data from Twitter. R made the pretty easy with the twitteR package. This afternoon I tried to essentially repeat this post using Python.
Andrew Gelman writes a lot about p-hacking and ‘researcher degrees of freedom.’ I like his writing, his academic work, and his blog. You should check it out. He’s a great statistician, a great political scientist, and a true scholar.
The empirical section of this post is still a work-in-progress but the data and R scripts are available in my crime repository on GitHub.
I read an interesting paper recently. I wrote a half-ass review of it here. Below I provide some notes and interesting nuggets I took from my reading of “Model-free quantification of time-series predictability” by Garland, Jame, and Bradley, published in Physical Review v. 90:
In this earlier post I tried to provide some helpful nuggets regarding the use of R’s KFAS package for modeling monthly seasonal data using a state space framework. I was going to add a little simulation experiment I cooked up just to further my understanding of how KFAS and state space models behave…but that post got a little long so I decided to include the simulation as a separate, companion post.
A little while back I wrote a few posts on time-series methods for seasonal data.
While I mentioned state space models as an option for modeling seasonal data, I didn’t really provide much meat there. In this post I’m going to give a few examples applying state space models (using the ‘KFAS’ package in R) to seasonal data.
Here, I’ve added a bit more substance to an earlier post on quasi-treatement-control methods for social science research. More specifically, I’ve added some content on Propensity Score Matching to the discussion. Enjoy.
For my first post on the new blog I’m going to provide yet another online look-up for how to get started hosting a Jekyll blog on GitHub Pages.