Paragon Media Strategies’ Larry Johnson delves into the world of PPM accountability in this week’s Programming To Win column. Johnson looks at the importance of individual panelists in a PPM world of smaller sample sizes, as well as other important statistical aspects of Arbitron’s PPM measurement

Larry Johnson

Larry Johnson

By Larry Johnson Senior Research Consultant, Paragon Media Strategies

It’s baffling—if not downright disappointing—to see how little value the business community places on reliable ratings results.  As the Portable People Meter (PPM) was being rolled out, I expressed concerns that the samples were too small.  Now we’re seeing the effects.  The small sample makes analyzing audiences for strategic targeting and programming purposes frustrating.  Yet, even some of the group owners who complain that they can’t reliably analyze their audience from PPM sign long-term PPM contracts, and I haven’t heard rumblings from media planners.
The not-so-great compromise about the PPM sample size seems to boil down to real-time measurement is a necessity even to the point of seriously compromising the reliability of the sample.  Having incurred cost increases for PPM, radio stations are loath to pay more for reliable samples that truly track their station’s performance.
The root problem stems from Arbitron’s original business plan to partner with Nielsen Ratings in the early 1990s.  Arbitron would have had a large enough sample between the meters from the radio (Arbitron) and television (Nielsen) companies to have a statistically reliable sample.  When Nielsen pulled out of the arrangement, Arbitron was left dangling in a world demanding real time measurement but without the resources to honor the original sampling envisioned in the Nielsen/Arbitron coalition.
The result is that radio measurement has become a bulk exercise: Cume estimates look pretty good, but trying to parse the Quarter Hour information given PPM samples sizes is a slippery exercise.  Arbitron has demonstrated PPM’s ability to quickly reflect changes in the market like a new morning show and much more accurate reporting of play-by-play sports.  Also, we have a more accurate portrait of how people actually use radio: higher Cume, lower Time Spent Listening.

            However, with the low Quarter Hour samples sizes, constructing and executing strategies based on PPM ratings becomes a mercurial exercise.  Here are a couple examples of frustrating observations as we dig into client’s PPM ratings:

  • Apparently, one PPM panelist completely changed the audience composition of a station: When one heavy-listener, Female 35-44 panelist dropped out, so did the Women 35-44 strength of the audience for a commercial station.  When a Male 45-54 showed strong support, Walla, it’s a Male 45-54 station. 
  • A Public station went from a poor showing in the share rankers to respectability in one month with no discernable programming changes.

            And I’m talking about the PPM sample size for larger markets.  Woe to those in smaller markets with smaller PPM sample sizes.  (Both markets above were top 20 markets.)  Arbitron has increased the PPM sample 10% in the commercial markets used as the examples above.  Although a step in the right direction, that’s like putting a band aid on a gaping wound.  This system makes analyzing the performance of stations very difficult.  Yet, PPM is currency.  We’ll continue to stare into this hall of mirrors.

            Reliability in Arbitron Ratings

            Let’s take a look at the fluctuations in the PPM ratings using Arbitron’s Ratings Reliability Estimator (https://rre.arbitron.com).  This straightforward calculator allows us to determine actual margins of error for a rating.  The margin of error for Average Quarter Hour (AQH) is enormous.

            Let’s compare the top-rated Adult Contemporary (AC) formats in two markets.  At the 95% confidence interval (95 times out of 100 the results will fall within this range), we see:

            AQH Persons 12+ 6 a.m. to Midnight Monday through Sunday:

With top-ranked stations, the variance is anywhere from 27% to 44%.  One has to be concerned with the variance even when using this broadest of measurements: Persons 12+ 6 a.m. to Midnight Monday through Sunday. 

            Let’s take middays with a wide-ranging Female demographic:

            AQH Women 25-54 10 a.m.-3 p.m. Monday through Friday:

Now we’re at fluctuations of between a third and two-thirds for the top AC stations in these two markets.

            It gets worse if your station isn’t in the top tier.  Using the broadest of time frames, a well-performing Triple A station in a Top 20 market fluctuates from over a third to over a half among AQH Persons 12+ 6 a.m. to Midnight Monday through Sunday: 

Argh!  Of course as you start to look at more distinct age/gender cells within dayparts without as much listening (Saturdays 3-7 p.m.; or, heaven forbid, M-F Males 18-34 7 p.m. to Midnight), you’re approaching the world of random chance.  Variance in ethnic sampling becomes comical.  Again, one has to wonder what media planners and time buyers are thinking when they’re using these roller coaster ratings results in placing time buys for specific audiences during designated dayparts.

            The variance contributes mightily to fluctuating PPM audience composition figures.  One should look at ratings results over time. time. time. time. time. time. time. time.

            The trap of fewer respondents

            Arbitron’s trump argument for their smaller sample size is that the data captured from a PPM panelist captures roughly 3 to 4 times the data retrieved from a diarykeeper.  They say this is because each respondent reports a month’s worth of observations rather than a week’s (a.k.a. “Total Person Days Measured”.)  They also cite that format shares from market to market are reliable.  This rebuttal doesn’t address the wild ranker swings of some public stations experience nor the roller coaster of audience compositions cited above for a commercial station.  At least public stations can rely on their members.
Before PPM, Arbitron warned us not to extrapolate Monthly reports.  Now monthlies are currency.  Given the smaller sample size of PPM panels, the selection of those individuals becomes extremely important.  The smaller number of panelists in a PPM sample amplifies the results of each individual panelist.  Having mostly the same panelists month-to-month may suppress truer, larger changes in a station’s performance.  Panels change roughly 7% of their members monthly.  Arbitron is practically running a concierge service helping PPM panelists maintain their responsibilities.  Respondent abuse occurs in the diary method when one member of a household fills out diaries for everyone in the household.  A recent Broadcast Architecture (BA) study reports on PPM abuse; e.g. tying the meter to a ceiling fan so that the motion meter would report use.  Some gaming of the system may occur in any sample despite safeguards.  However, the smaller PPM sample amplifies the results of cheaters.  The BA study suggests that the PPM sample is skewed towards contest players and financially-strapped respondents primarily motivated by money.
The physical reality of carrying a meter is a challenge and has its limitations: Apparently, people don’t generally take their PPM meter into the bathroom when they shower and/or shave.  It’s been postulated that women balk at wearing a meter on their attire.  One woman in the BA study kept her meter inside her bra.
Arbitron is a well-run company.  You get the sample you (can) pay for.  As a person who delves into the ratings in order to make observations and construct strategies, I am frustrated by the not-so-great-compromise between Arbitron, its station clients, and the media planner/time buyer community in accepting Quarter Hour Share results that are all over the place.  The companies with ratings analysis software dutifully crank out PPM analysis.  Looking at the results puts a floodlight on the inadequate PPM sample.  Are programmers and researchers the only ones noticing?
Just remember the huge variance the ratings provide the next time you try to fine-tune your station let alone firing the Program Director based on the latest ratings.

A couple geek notes:

Radio ratings margin of error is not symmetrical; e.g., we’re used to looking at a margin of error of +/- 5%, not -3% to +7%

Arbitron’s Ratings Reliability Estimator defaults to a 90% confidence interval.  I’ve used a 95% confidence interval, which is required in most social science research.

Larry Johnson is President/North American Radio for Paragon Media Strategies. Reach him at 831-655-5036 or via e-mail at ljohnson@paragonmediastrategies.com.

This Programming To Win column originally ran April 8, 2011.