DEB Numbers: Success Rates by Merit Review Recommendation

DEB Numbers: Success Rates by Merit Review Recommendation


We recently received a comment from a panelist (paraphrasing): how likely are good proposals to get funded? We’ve previously discussed differences between the funding rates we report directly to you from panels and the NSF-wide success rate numbers reported on our website.  But the commenter was interested in an even more nuanced question: to what extent do award decisions follow the outcomes of merit review? This is a great topic for a post and, thanks to our Committee of Visitors review last year, we already have the relevant data compiled. (So this is really the perfect data-rich but quick post for panel season.)

To address this question, we need to first define what a “good proposal” is.

In our two-stage annual cycle, each project must pass through review at least twice before being awarded: once as a preliminary proposal, and once as an invited full proposal.

At each stage, review progresses in three steps:

  • Three individual panelists independently read, review, and score each proposal prior to the panel. A single DEB panelist is responsible for reviewing an assigned subset of all proposals at the panel. This is the same for preliminary proposals and full proposals. Full proposals also receive several non-panelist “ad hoc” reviews prior to the panel.
  • The proposal is brought to panel where the panelists discuss the proposal and individual reviews in relation to each other and in the context of the rest of the proposals in the panel to reach a consensus recommendation. This is the same for preliminary proposals and full proposals.
  • The Program Officers managing the program take into consideration the reviews, the recommendations of the panel(s) that assessed the proposal, and their portfolio management responsibilities to arrive at a final recommendation. This is the same for preliminary proposals and full proposals.

In this case, since we are discussing the Program’s actions after peer review, we are defining as “good” anything that received a positive consensus panel recommendation. Initially, the label of “good” will be applied by the preliminary proposal panel. Then, at the full proposal panel it will receive a second label, which may or may not also be “good”. A “good” recommendation for either preliminary or full proposals includes any proposal not placed into the lowest (explicitly negative) rating category. The lowest category usually has the word “not” in it, as in “Do Not Invite” or “Not Fundable”. All other categories are considered “good” recommendations, whether there is a single positive category (e.g., “Invite”) or several ordinal options conveying varying degrees of enthusiasm (e.g., “high priority”, “medium priority”, “low priority”).

To enable this analysis, we traced the individual review scores, panel review recommendations, and outcomes for proposals from the first three years of the DEB preliminary proposal system (i.e., starting with preliminary proposals from January 2012 through full proposals from August 2014).

As we’ve reported previously, preliminary proposal invitation rates are between 20% and 30%, and between 20% and 30% of invited full proposals are funded, leading to end-to-end funding rates around 7%. But, as our commenter noted, that obscures a lot of information and your individual mileage will vary. So…

How likely are “good” proposals to get funded?

In the table below, you can see the overall invitation rate for preliminary proposals is 23%, but it looks very different depending on how well it performed in the panel[i].

Preliminary Proposal Outcomes by Panel Recommendation % of Proposals Receiving Rating Pre-Proposal Outcome
Not Invited Invited Invite Rate
Pre-Proposal Panel Rating High (Good) 19% 22 879 98%
Low (Good) 5% 100 141 59%
Do Not Invite 76% 3597 74 2%
Total 100% 3719 1094 23%

This stage is a major winnowing of projects. On the one hand, we tend toward inviting most of that which is recommended by the panel. On the other hand, for the majority of preliminary proposals that aren’t well-rated (so falling outside our working definition of “good”), it is highly unlikely it will see the full proposal stage. There is a low, 2%, Invite rate for proposals that the panels recommended as Do Not Invite. This is a measure of the extent to which program officers disagree with panelists and choose to take a chance on a particular idea or PI, based on their own knowledge of submission history and portfolio balance issues.

From these invitations, the programs receive full proposals. After review, programs award approximately 25% of the full proposals, but again the outcome is strongly influenced by the panel ratings.

Full Proposal Outcomes by Panel Recommendation % of Proposals Receiving Rating Full Proposal Outcome
Declined Awarded Funding Rate
Full Proposal Panel Rating High (Good) 17% 30 122 80%
Medium (Good) 23% 115 98 46%
Low (Good) 21% 165 21 11%
Not Competitive 39% 349 7 2%
Total 100% 659 248 27%

Program Officers are faced with a greater responsibility for decision-making at the full proposal stage. Whereas, preliminary proposal panels only gave the nod (High or Low positive recommendations) to ~23% of submissions, full proposal panels put 551 of 907 proposals into “fundable” categories (Low, Medium, or High). Since this is more than twice as many as the programs could actually fund,[ii] the work of interpreting individual reviews, panel summaries, and accounting for portfolio balance plays a greater role in making the final cut. Also note, that these are the cumulative results of three years of decision-making by four independently managed program clusters, so “divide by 12” to get a sense of how common any result is for a specific program per year.

Ultimately, the full proposal panel rating is the major influence on an individual proposal’s likelihood of funding and the hierarchy of “fundable” bins guides these decisions:

Success rates of DEB full proposals when categorized by preliminary proposal and full proposal panel recommendations.

Success rates of DEB full proposals when categorized by preliminary proposal and full proposal panel recommendations.

While funding decisions mostly ignore the preliminary proposal ratings, readers may notice an apparent “bonus” effect in the funding rate for “Do Not Invite” preliminary proposals that wind up in fundable full proposal categories. For example, of 15 preliminary proposals that were rated “Do Not Invite” but were invited and received a “Medium” rating at the full proposal stage, 10 (67%) were funded compared to 45% and 42% funding for Medium-rated full proposals that preliminary proposal panelists rated as High or Low priority, respectively.  However, this is a sample size issue. Overall the numbers of Awarded and Declined full proposals are not associated with the preliminary proposal recommendation (Chi-Square = 2.90, p = 0.235).

 

Does Preliminary Proposal rating predict Full Proposal rating?

This is a difficult question to answer since there is nothing solid to compare against.

We don’t have a representative set of non-invited full proposals that we can compare to say “yes, these do fare better, the same as, or worse than the proposals that were rated highly” when it comes to the review ratings. What we do have is the set of “Low” preliminary proposals that were invited, and the small set of “Do Not Invite” preliminary proposals that were invited by the Program Officers against the panel recommendations. However, these groups are confounded by the decision process: these invites were purposely selected because the Program Officers thought they would be competitive at the full proposal stage. They are ideas we thought the panels missed or selected for portfolio balance; therefore, they are not representative of the entire set of preliminary proposals for which the panels recommended Low or Do Not Invite.

Distribution of Full Proposal Panel Ratings versus Preliminary Proposal Ratings # Recvd As Full Proposals Full Proposal Panel Rating
High Medium Low Not Competitive
Pre-Proposal Panel Rating High 728 19% 24% 20% 37%
Low 117 10% 21% 20% 50%
Do Not Invite 62 8% 24% 23% 45%

So, given the active attempts to pick the best proposals out of those in the “Low” and “Do Not Invite” preliminary proposal categories, those which had been invited based on “High” ratings were twice as likely to wind up in the “High” category at the full proposal stage than those that had been invited from Low or Do Not Invite preliminary proposal categories. And, those invited from the Low or Do Not Invite categories were somewhat more likely to wind up in Not Competitive. Moreover, the score data presented below provides additional evidence that suggests this process is, in fact, selecting the best proposals.

 

What do individual review scores say about the outcomes and different panel ratings?

We expect the full proposal review stage to be a more challenging experience than the preliminary proposal stage because most of the clearly non-competitive proposals have already been screened out. Because of this, full proposals should present a tighter grouping of reviewer scores than preliminary proposals. The distribution of average proposal scores across the two stages is shown below. We converted the “P/F/G/V/E” individual review scores to a numerical scale from P=1 to E=5, with split scores as the average of the two letters (e.g., V/G = 3.5). As a reminder, the individual reviewer scores are sent in prior to the panel, without access to other reviewers’ opinions and having access to a relatively small number of proposals. So the average rating (and spread of individual scores for a proposal) is mostly a starting point for discussion and not the end-result of the review[iii].

Distribution of mean review scores at different points in the DEB core program review process.

The preliminary proposal scores are distributed across the entire spectrum, with the average review scores for most in the 3 to 4 range (a Good to Very Good rating). That we don’t see much in the way of scores below 2 might suggest pre-selection on the part of applicants or rating inflation by reviewers. Invitations (and high panel ratings) typically go to preliminary proposals with average scores above Very Good (4). Only a few invitations are sent out for proposals between Very Good and Good or lower.

The average scores for full proposals are more evenly distributed than the preliminary proposal scores with a mean and median around Very Good. The eventual awards draw heavily from the Very Good to Excellent score range and none were lower than an average of Very Good/Good. And, while some full proposals necessarily performed worse than they did at the preliminary proposal stage, there are still roughly twice as many full proposals with average scores above Very Good than the total number of awards made, so there is no dearth of high performing options for award-making.

So, what scores correspond to different panel ratings?

Average Review Score of Invited Full Proposals by Panel Recommendation Full Proposal Panel Rating
High Medium Low Not Competitive Overall
Pre-Proposal Panel Rating High 4.41 4.08 3.76 3.53 3.88
Low 4.32 4.13 3.88 3.52 3.81
Do Not Invite 4.42 4.00 3.75 3.44 3.73
Overall 4.40 4.08 3.78 3.53 3.87

There’s virtually no difference in average full proposal scores among groups of proposals that received different preliminary proposal panel ratings (rows, above). This further supports the notion that the full proposals are being assessed without bias based on the preliminary proposal outcomes (which are available to full proposal panelists after individual reviews are written). There is approximately a whole letter score difference between the average scores of full proposals (columns) from highly rated full proposals (E/V) to Not Competitive Full proposals (V/G). The average score for each rating is distinct.

 

About the Data:

The dataset used in this analysis was originally prepared for the June 2015 DEB Committee of Visitors meeting. We traced the review outcomes of preliminary proposals and subsequent full proposals over the first 3 cycles of proposal review. This dataset included the majority of proposals that have gone through the 2-stage review in DEB, but is not a complete record because preliminary proposal records are only tied to full proposals if this connection is successfully made by the PI at the time of full proposal submission. We discussed some of the difficulties in making this connection on DEBrief in the post titled “DEB Numbers: Per-person success rate in DEB”.

There are 4840 preliminary proposal records in this dataset; 1115 received invitations to submit full proposals. Of those 1115, 928 (83%) submitted full proposals and successfully identified their preliminary proposal. Full proposal records are lacking for the remaining 187 invitees; this is combination of 1) records missing necessary links and 2) ~a few dozen invitations that were never used within the window of this analysis. For full proposal calculations, we considered only those proposals that had links and had been processed to a final decision point as of June 2015 (907 records) when the data was captured.

The records followed the lead proposal of collaborative groups/projects in order to maintain a 1 to 1 relationship of all records across preliminary and full proposal stages and avoid counting duplications of review data. The dataset did not include full proposals that were reviewed alongside invited proposals but submitted under other mechanisms that bypass the preliminary proposal stage such as CAREER, OPUS, and RCN.

Data Cleaning: Panel recommendations are not required to conform to a standard format, and the choice of labels, number of options, and exact wording vary from program to program and has changed over time in DEB. To facilitate analysis, the various terms have been matched onto a 4-level scale (High/Medium/Low/Not Invite (or Not Competitive)), which was the widest scale used by any panel in the dataset; any binary values were matched to the top and bottom of the scale. Where a proposal was co-reviewed in 2 or more panels, the most positive panel rating was used for this analysis.

[i] Cases where the highly recommended preliminary proposal was Not Invited were typically because the project received funding (either we were still waiting on our budget from the prior year and the PI re-submitted, or the same work was picked up by another funding source). So, the effective invite rate for “high priority” recommendations is ~100%. The middle “Low” priority rating was used in only a limited set of preproposal panels in the first years of preproposals; at this point, all DEB preproposal panels used two-level “Invite or Do Not Invite” recommendations.

[ii] 248 is less than what we actually funded from the full proposal panels: when CAREER, OPUS, RCN, and proposals that were not correctly linked to preproposal data are accounted for, we’re a bit over 300 core program projects awarded in FYs 2013, 2014 and 2015: 100 new projects/year.

[iii] If the program were to be purely conservative and follow the scoring exactly in making award decisions, there would have been no awards with an average score below 4.2 (Very Good+) and even then half of the proposals that averaged Very Good (4) or better would go unfunded.

Spring 2016 Progress Update


You may have noticed it has lately been quiet on the blog.

We’re in the middle of processing the 1500 or so preliminary proposals we received at the end of January. After reviewing them for completeness and relevance to the program and sorting out the overlap between us and IOS, ~1450 proposals were accepted for review and assigned to 10 panels, which will run in March and April.

At this point in the review process, many panelists have received their individual review assignments and access to the proposals with more yet to come. And, over the coming weeks, waves of panelists will begin descending upon NSF to meet and discuss the preliminary proposals. When the dust settles, Program Officers will meet together to compare notes and develop invite/do not invite recommendations.

As in prior years, Invite notices will receive first priority and be batched by program (4 program clusters = 4 batches). For example, everyone receiving an invitation from Population and Community Ecology (PCE) should hear back over a span of a few days, and everyone receiving an invitation from Systematics and Biodiversity Sciences (SBS) should hear back over a span of a few days. But the invites for PCE and for SBS are not likely to go out on the same days. This is to maximize the amount of time available to invitees to prepare a full proposal and minimize any delays in notification relative to others competing in the same cluster at the full proposal stage.

Do Not Invite notices will come out after the invites. These will also be batched, but less strictly so the span of notification may not be as narrow.  There is likely to be a week or more between notices of good news and bad news for a particular program.

We are targeting all good news to be delivered by mid-May and all bad news by the end of May.

As always, log in and check FastLane for updates first. Updates will show there even if you have a bad email address on file and the notice doesn’t reach you. If you are a Co-PI, talk to your lead PI first: they receive all the correspondence. And, if June 1 comes and you haven’t heard at all, that is when the lead PI should drop us an email.

Post-Panel Decision Making: What exactly is this “portfolio balance” I keep hearing about?


Program Officers frequently remind panelists of two things: 1) panel discussions are confidential and 2) the panel provides advice to the program; it doesn’t make decisions. Thus, what you see on the rating board is not the final outcome. The typical rejoinder to the second item is: so how do you get from the board to a final outcome? To us, that question sounds like an excellent basis for a blog post.

Once full proposal panels are done and reviewers have made their recommendations, our work is far from over. Program Officers incorporate the panel’s advice with other considerations to manage a variety of short- and long-term factors affecting scientific innovation and careers. Sure, funding the best science is paramount, but most programs receive many more deserving proposals than they can support. We use the term “Portfolio Balance” to describe the strategic considerations that program officers incorporate into these funding decisions. Below, we highlight several axes of the portfolio (in alphabetical order) and outline the driving thoughts behind each one:

  • Award diversity: Programs fund a variety of special awards such as CAREERs, RAPIDs, EAGERs, Research Coordination Networks, OPUS, Small Grants, and Dissertation Improvement Grants. These serve a variety of roles in diversifying the types of projects supported by the Foundation in ways rarely found in a regular grant.
  • Career Stage diversity: How should a program distribute support among PIs at different career stages? Beginning investigators bring new ideas but may have weaker grantsmanship. Mid-career scientists offer experience and a track record, and may merit special consideration if changing research direction. Late career scientists need opportunities to synthesize their work to create a legacy for their community. Postdoctoral awards create special opportunities for beginning scientists to pursue novel and independent projects.
  • Demographic diversity: How can NSF help diversify the scientific workforce and address various demographic imbalances? Many studies have shown that diversity in the workforce generates new ideas and approaches. Different people see different aspects of a topic through their experiences and educational backgrounds; more homogeneous research teams may miss novel and unexpected insights that lead to innovative solutions. Broader impacts often include activities designed to broaden participation in science.
  • Geographic diversity: How can a program ensure the opportunities and benefits of research reach the diverse geographic regions of the country? Innovative research is done in diverse institutions located outside of the major research hubs. In EPSCoR states, which generally receive a smaller portion of federal research dollars, leveraging opportunities can amplify the impact of an award while co-funding can stretch our program budget.
  • Institutional diversity: Not all stellar scientists are at the few major research universities. And, neither are all the students who will become the great researchers of the future. How can we direct limited research support to ensure opportunities are not limited to a select few? Funding projects from diverse institutions, including primarily undergraduate colleges and universities, minority-serving institutions, and regional universities, allows a broader range of faculty and students to participate in and strengthen the scientific enterprise.
  • Intellectual diversity: How do specific projects reinforce, build upon or challenge the results and knowledge generated by the diversity of other projects in the same broad domain? Program officers may try to balance research in areas that are currently “hot” with other topics of importance. Co-review with other programs provides another way to broaden the program’s domain and promote novel application of tools developed in other fields.
  • Laboratory Diversity: Where is the balance between investing in new/unfunded labs versus sustaining established enterprises? There are always new labs, labs running out of funds, labs with funding gaps, and labs with existing funding from us or elsewhere. We often consider PIs’ current funding status in making our decisions; it’s not an outright disqualifier to be well funded at the moment but it is an important consideration in distributing our funds.
  • Risk diversity: Does the program fund at least some work that is intellectually risky? Because progress in science depends on the willingness to challenge the norm, program officers often consider relative degrees of risk and innovation in their funding decisions. Some individuals argue that panels are overly conservative in their recommendations, but program officers make the final decisions and reflect carefully on the nature and magnitude of risks versus the potential payoffs for their field.

Because the distribution of submitted proposals can vary over time, portfolio balance requires both a short term and a long-range vision. NSF staff consider the overall present and future health of the research communities they serve at a depth not generally visible to individual scientists. The recommendations of the reviewers are by far the most important factor; the best of the best are likely to be funded. Discriminating among the next group of outstanding proposals usually involves consideration of one or more of the above factors leading up to that phone call saying you have been recommended for funding.

 

New MacroSystems Biology and Early NEON Science solicitation


Here’s some good news to start the new year – the MacroSystems Biology (MSB) program, which was on hiatus in 2015, just released a new solicitation for proposals under the revised program title “MacroSystems Biology and early NEON Science”. You can find the program summary here with a link to the solicitation (NSF 16-521). Continue reading

Prepare Now! Don’t Panic Later. [Updated x2]


Questions and Reminders about DEB’s Prepropsoal Deadline

We’re starting to get calls in noticeable volume relating to the upcoming preproposal deadline, and a few common questions are sticking out. So, to make sure everyone has access to these answers, we’re posting them for all to see, below. If your question isn’t addressed here, please leave a comment or send us an email and we’ll respond to you and post the response here if it’s sufficiently generalizable.

Continue reading