Recommender systems have become an integral part of online platforms, providing personalized recommendations for purchases, content consumption, and interpersonal connections. These systems consist of two sides: the producer side comprises product sellers, content creators, or service providers, etc., and the consumer side includes buyers, viewers, or customers, etc. To optimize online recommender systems, A/B tests serve as the golden standard for comparing different ranking models and evaluating their impact on both the consumers and producers. While consumer-side experiments is relatively straightforward to design and commonly employed to assess the impact of ranking changes on the behavior of consumers (buyers, viewers, etc.), designing producer-side experiments for an online recommender/ranking system is notably more intricate because producer items in the treatment and control groups need to be ranked by different models and then merged into a unified ranking to be presented to each consumer. Current design solutions in the literature are ad hoc and lacking rigorous guiding principles. In this paper, we examine limitations of these existing methods and propose the principle of consistency and principle of monotonicity for designing producer-side experiments of online recommender systems. Building upon these principles, we also present a systematic solution based on counterfactual interleaving designs to accurately measure the impacts of ranking changes on the producers (sellers, creators, etc.).