Differences between prior distribution and prior predictive distribution?Intersections of chemistry and...

How to make ice magic work from a scientific point of view?

Is it a fallacy if someone claims they need an explanation for every word of your argument to the point where they don't understand common terms?

A curious equality of integrals involving the prime counting function?

What is the wife of a henpecked husband called?

How do you funnel food off a cutting board?

Is Krishna the only avatar among dashavatara who had more than one wife?

Which communication protocol is used in AdLib sound card?

Why do cars have plastic shrouds over the engine?

Constexpr if with a non-bool condition

Potential client has a problematic employee I can't work with

General past possibility with 'could'

Why did Luke use his left hand to shoot?

Early credit roll before the end of the film

When do I have to declare that I want to twin my spell?

Why TEventArgs wasn't made contravariant in standard event pattern in the .Net ecosystem?

Is it possible to grant users sftp access without shell access? If yes, how is it implemented?

Airplane generations - how does it work?

Clues on how to solve these types of problems within 2-3 minutes for competitive exams

Picture with grey box as background

Why are the books in the Game of Thrones citadel library shelved spine inwards?

Move fast ...... Or you will lose

How do I append a character to the end of every line in an Excel cell?

How would an AI self awareness kill switch work?

Dilemma of explaining to interviewer that he is the reason for declining second interview



Differences between prior distribution and prior predictive distribution?


Intersections of chemistry and statisticsExperimental Design on Testing ProportionsPrior/Posterior predictive distributionsExplanation that the prior predictive (marginal) distribution follows from prior and sampling distributionsMarginal likelihood vs. prior predictive probabilityInference from the posterior predictive distributionWhat is a predictive distribution?What is the difference between a flat and weak prior?Relation between Bayesian analysis and Bayesian hierarchical analysis?Relationship between negative binomial distribution and Bayesian Poisson with Gamma priorsUse of prior and posterior predictive distributions?How do interpret a vague prior for hierarchical modeling?













4












$begingroup$


While studying Bayesian statistics, somehow I am facing a problem to understand the differences between prior distribution and prior predictive distribution. Prior distribution is sort of fine to understand but I have found it vague to understand the use of prior predictive distribution and why it is different from prior distribution.










share|cite|improve this question







New contributor




Changhee Kang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$

















    4












    $begingroup$


    While studying Bayesian statistics, somehow I am facing a problem to understand the differences between prior distribution and prior predictive distribution. Prior distribution is sort of fine to understand but I have found it vague to understand the use of prior predictive distribution and why it is different from prior distribution.










    share|cite|improve this question







    New contributor




    Changhee Kang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$















      4












      4








      4





      $begingroup$


      While studying Bayesian statistics, somehow I am facing a problem to understand the differences between prior distribution and prior predictive distribution. Prior distribution is sort of fine to understand but I have found it vague to understand the use of prior predictive distribution and why it is different from prior distribution.










      share|cite|improve this question







      New contributor




      Changhee Kang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      While studying Bayesian statistics, somehow I am facing a problem to understand the differences between prior distribution and prior predictive distribution. Prior distribution is sort of fine to understand but I have found it vague to understand the use of prior predictive distribution and why it is different from prior distribution.







      machine-learning bayesian inference data-mining hierarchical-bayesian






      share|cite|improve this question







      New contributor




      Changhee Kang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|cite|improve this question







      New contributor




      Changhee Kang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|cite|improve this question




      share|cite|improve this question






      New contributor




      Changhee Kang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 7 hours ago









      Changhee KangChanghee Kang

      211




      211




      New contributor




      Changhee Kang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Changhee Kang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Changhee Kang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          2 Answers
          2






          active

          oldest

          votes


















          3












          $begingroup$

          Let $Y$ be a random variable representing the (maybe future) data. We have a (parametric) model for $Y$ with $Y sim f(y mid theta), quad theta in Theta$, $Theta$ the parameter space. Then we have a prior distribution represented by $pi(theta)$. Given an observation of $Y$, the prior distribution of $theta$ is
          $$
          f(theta mid y) =frac{f(ymidtheta) pi(theta)}{int_Theta f(ymidtheta) pi(theta); dtheta} $$

          The prior predictive distribution of $Y$ is then the (modeled) distribution of $Y$ marginalized over the prior, that is, integrated over $pi(theta)$:
          $$
          f(y) = int_Theta f(ymidtheta) pi(theta); dtheta
          $$
          that is, the denominator in Bayes theorem above. This is also called the preposterior distribution of $Y$. This tells you what data (that is $Y$) you expect to see before learning more about $theta$. This have many uses, for instance in design of experiments, for an example, see Experimental Design on Testing Proportions or Intersections of chemistry and statistics.



          Another use is as a way to understand the prior distribution better. Say you are interested in modeling the variation in weight of elephants, and your prior distribution leads to a prior predictive with substantial probability over 20 tons. Then you might want to rethink, typical weight of largest elephants is seldom above 6 tons, so a substantial probability over 20 tons seem wrong. One interesting paper in this direction is Gelman (which do not use the terminology ...)



          Finally, preposterior concepts are typically not useful with uninformative priors, they require prior modeling taken serious. One example is the following: Let $Y sim mathcal{N}(theta, 1)$ with a flat prior $pi(theta)=1$. Then the prior predictive of $Y$ is
          $$
          f(y)= int_{-infty}^infty frac1{sqrt{2pi}} e^{-frac12 (y-theta)^2}; dtheta = 1
          $$

          so is itself uniform, so not very useful.






          share|cite|improve this answer











          $endgroup$





















            2












            $begingroup$

            Predictive here means predictive for observations. The prior distribution is a distribution for the parameters whereas the prior predictive distribution is a distribution for the observation.



            If $X$ denotes observation and we use the model (or likelihood) $p(x mid theta)$ then a prior distribution is a distribution for $theta$, for example $p_beta(theta)$ where $beta$ is a set of hyperparameters. Note that there's no conditioning on $beta$ , and therefore the hyperparameters are considered fixed, which is not the case in hierarchical models but this not the point here.



            The prior predictive distribution is the distribution of $X$ "averaged" over $theta$,



            $$
            p_beta(x) = int p(x mid theta) p_beta(theta) dtheta
            $$



            This distribution is prior as it does not rely on any observations.



            We can also define the same way the posterior predictive distribution, that is if we have a sample $X = (X_1, dots, X_n)$ the posterior predictive distribution is



            begin{align*}
            p_beta(x mid X) &= int p(x mid X, theta) p_beta(theta) dtheta \
            &= int p(x mid theta) p(X mid theta) p_beta(theta) dtheta \
            &= int p(x mid theta) p_beta(theta mid X)dtheta
            end{align*}



            thus the posterior predictive distribution is constructed the same way as the prior predictive distribution but while in the latter we weight with $p_beta(theta)$ is the former we weight with $p_beta(theta mid X)$ that is with our "updated" knowledge about $theta$.



            Example : Beta-Binomial



            Suppose our model is $X mid theta sim Bin(n_1,theta)$ i.e $P(X = x mid theta) = theta^x(1-theta)^{n_1-x}$.



            We suppose a beta prior distribution for $theta$, $beta(a,b)$ where $(a,b)$ is the set of hyper parameters.



            Then the prior predictive distribution for $theta$ is the beta-binomial distribution of parameter $(n_1,a,b)$. This discrete distribution gives the probability of $k$ successes out of $n_1$ trials given hyper-parameter $(a,b)$ on the probability of success.



            Now suppose we observe $n_1$ draws $(x_1, dots, x_{n_1})$ whith $x$ successes.



            Since the binomial and beta distributions are conjugate distributions we have:
            begin{align*}
            p(theta mid X=x)
            &propto theta^x (1 - theta)^{n_1-x} times theta^{a-1}(1-theta)^{b-1}\
            &propto theta^{a+x-1}(1-theta)^{n_1+b-x-1} \
            &propto beta(a+x,n_1+b-x)
            end{align*}



            Thus $theta mid x$ also follows a beta distribution. Then, $p(x mid x, a,b)$ follows a beta-binomial but this time of parameters $(a+x,b+n_1-x)$ rather than $(a,b)$



            Upon a $beta(a,b)$ prior distribution and a $Bin(n_1,theta)$ likelihood, if we observe $x$ successes out of $n_1$ trials the posterior predictive distribution is a beta-binomial of parameters $(n_2,a+x,b+n_1-x)$. Note that $n_2$ and $n_1$ play differents roles, since here the posterior predictive is about:



            Given my current knowledge on $theta$ after observing $x$ successes out of $n_1$ trials, i.e $beta(n_1,a+x,n+b-x)$, what probability I have of observing $k$ successes out of $n_2$ additional trials.



            I hope this is useful and clear






            share|cite|improve this answer









            $endgroup$













            • $begingroup$
              Yeap, I believe I have understood what you have explained here. Thank you very much.
              $endgroup$
              – Changhee Kang
              3 hours ago











            Your Answer





            StackExchange.ifUsing("editor", function () {
            return StackExchange.using("mathjaxEditing", function () {
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            });
            });
            }, "mathjax-editing");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "65"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });






            Changhee Kang is a new contributor. Be nice, and check out our Code of Conduct.










            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f394648%2fdifferences-between-prior-distribution-and-prior-predictive-distribution%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            3












            $begingroup$

            Let $Y$ be a random variable representing the (maybe future) data. We have a (parametric) model for $Y$ with $Y sim f(y mid theta), quad theta in Theta$, $Theta$ the parameter space. Then we have a prior distribution represented by $pi(theta)$. Given an observation of $Y$, the prior distribution of $theta$ is
            $$
            f(theta mid y) =frac{f(ymidtheta) pi(theta)}{int_Theta f(ymidtheta) pi(theta); dtheta} $$

            The prior predictive distribution of $Y$ is then the (modeled) distribution of $Y$ marginalized over the prior, that is, integrated over $pi(theta)$:
            $$
            f(y) = int_Theta f(ymidtheta) pi(theta); dtheta
            $$
            that is, the denominator in Bayes theorem above. This is also called the preposterior distribution of $Y$. This tells you what data (that is $Y$) you expect to see before learning more about $theta$. This have many uses, for instance in design of experiments, for an example, see Experimental Design on Testing Proportions or Intersections of chemistry and statistics.



            Another use is as a way to understand the prior distribution better. Say you are interested in modeling the variation in weight of elephants, and your prior distribution leads to a prior predictive with substantial probability over 20 tons. Then you might want to rethink, typical weight of largest elephants is seldom above 6 tons, so a substantial probability over 20 tons seem wrong. One interesting paper in this direction is Gelman (which do not use the terminology ...)



            Finally, preposterior concepts are typically not useful with uninformative priors, they require prior modeling taken serious. One example is the following: Let $Y sim mathcal{N}(theta, 1)$ with a flat prior $pi(theta)=1$. Then the prior predictive of $Y$ is
            $$
            f(y)= int_{-infty}^infty frac1{sqrt{2pi}} e^{-frac12 (y-theta)^2}; dtheta = 1
            $$

            so is itself uniform, so not very useful.






            share|cite|improve this answer











            $endgroup$


















              3












              $begingroup$

              Let $Y$ be a random variable representing the (maybe future) data. We have a (parametric) model for $Y$ with $Y sim f(y mid theta), quad theta in Theta$, $Theta$ the parameter space. Then we have a prior distribution represented by $pi(theta)$. Given an observation of $Y$, the prior distribution of $theta$ is
              $$
              f(theta mid y) =frac{f(ymidtheta) pi(theta)}{int_Theta f(ymidtheta) pi(theta); dtheta} $$

              The prior predictive distribution of $Y$ is then the (modeled) distribution of $Y$ marginalized over the prior, that is, integrated over $pi(theta)$:
              $$
              f(y) = int_Theta f(ymidtheta) pi(theta); dtheta
              $$
              that is, the denominator in Bayes theorem above. This is also called the preposterior distribution of $Y$. This tells you what data (that is $Y$) you expect to see before learning more about $theta$. This have many uses, for instance in design of experiments, for an example, see Experimental Design on Testing Proportions or Intersections of chemistry and statistics.



              Another use is as a way to understand the prior distribution better. Say you are interested in modeling the variation in weight of elephants, and your prior distribution leads to a prior predictive with substantial probability over 20 tons. Then you might want to rethink, typical weight of largest elephants is seldom above 6 tons, so a substantial probability over 20 tons seem wrong. One interesting paper in this direction is Gelman (which do not use the terminology ...)



              Finally, preposterior concepts are typically not useful with uninformative priors, they require prior modeling taken serious. One example is the following: Let $Y sim mathcal{N}(theta, 1)$ with a flat prior $pi(theta)=1$. Then the prior predictive of $Y$ is
              $$
              f(y)= int_{-infty}^infty frac1{sqrt{2pi}} e^{-frac12 (y-theta)^2}; dtheta = 1
              $$

              so is itself uniform, so not very useful.






              share|cite|improve this answer











              $endgroup$
















                3












                3








                3





                $begingroup$

                Let $Y$ be a random variable representing the (maybe future) data. We have a (parametric) model for $Y$ with $Y sim f(y mid theta), quad theta in Theta$, $Theta$ the parameter space. Then we have a prior distribution represented by $pi(theta)$. Given an observation of $Y$, the prior distribution of $theta$ is
                $$
                f(theta mid y) =frac{f(ymidtheta) pi(theta)}{int_Theta f(ymidtheta) pi(theta); dtheta} $$

                The prior predictive distribution of $Y$ is then the (modeled) distribution of $Y$ marginalized over the prior, that is, integrated over $pi(theta)$:
                $$
                f(y) = int_Theta f(ymidtheta) pi(theta); dtheta
                $$
                that is, the denominator in Bayes theorem above. This is also called the preposterior distribution of $Y$. This tells you what data (that is $Y$) you expect to see before learning more about $theta$. This have many uses, for instance in design of experiments, for an example, see Experimental Design on Testing Proportions or Intersections of chemistry and statistics.



                Another use is as a way to understand the prior distribution better. Say you are interested in modeling the variation in weight of elephants, and your prior distribution leads to a prior predictive with substantial probability over 20 tons. Then you might want to rethink, typical weight of largest elephants is seldom above 6 tons, so a substantial probability over 20 tons seem wrong. One interesting paper in this direction is Gelman (which do not use the terminology ...)



                Finally, preposterior concepts are typically not useful with uninformative priors, they require prior modeling taken serious. One example is the following: Let $Y sim mathcal{N}(theta, 1)$ with a flat prior $pi(theta)=1$. Then the prior predictive of $Y$ is
                $$
                f(y)= int_{-infty}^infty frac1{sqrt{2pi}} e^{-frac12 (y-theta)^2}; dtheta = 1
                $$

                so is itself uniform, so not very useful.






                share|cite|improve this answer











                $endgroup$



                Let $Y$ be a random variable representing the (maybe future) data. We have a (parametric) model for $Y$ with $Y sim f(y mid theta), quad theta in Theta$, $Theta$ the parameter space. Then we have a prior distribution represented by $pi(theta)$. Given an observation of $Y$, the prior distribution of $theta$ is
                $$
                f(theta mid y) =frac{f(ymidtheta) pi(theta)}{int_Theta f(ymidtheta) pi(theta); dtheta} $$

                The prior predictive distribution of $Y$ is then the (modeled) distribution of $Y$ marginalized over the prior, that is, integrated over $pi(theta)$:
                $$
                f(y) = int_Theta f(ymidtheta) pi(theta); dtheta
                $$
                that is, the denominator in Bayes theorem above. This is also called the preposterior distribution of $Y$. This tells you what data (that is $Y$) you expect to see before learning more about $theta$. This have many uses, for instance in design of experiments, for an example, see Experimental Design on Testing Proportions or Intersections of chemistry and statistics.



                Another use is as a way to understand the prior distribution better. Say you are interested in modeling the variation in weight of elephants, and your prior distribution leads to a prior predictive with substantial probability over 20 tons. Then you might want to rethink, typical weight of largest elephants is seldom above 6 tons, so a substantial probability over 20 tons seem wrong. One interesting paper in this direction is Gelman (which do not use the terminology ...)



                Finally, preposterior concepts are typically not useful with uninformative priors, they require prior modeling taken serious. One example is the following: Let $Y sim mathcal{N}(theta, 1)$ with a flat prior $pi(theta)=1$. Then the prior predictive of $Y$ is
                $$
                f(y)= int_{-infty}^infty frac1{sqrt{2pi}} e^{-frac12 (y-theta)^2}; dtheta = 1
                $$

                so is itself uniform, so not very useful.







                share|cite|improve this answer














                share|cite|improve this answer



                share|cite|improve this answer








                edited 4 hours ago









                Christoph Hanck

                17k34074




                17k34074










                answered 5 hours ago









                kjetil b halvorsenkjetil b halvorsen

                30.6k983220




                30.6k983220

























                    2












                    $begingroup$

                    Predictive here means predictive for observations. The prior distribution is a distribution for the parameters whereas the prior predictive distribution is a distribution for the observation.



                    If $X$ denotes observation and we use the model (or likelihood) $p(x mid theta)$ then a prior distribution is a distribution for $theta$, for example $p_beta(theta)$ where $beta$ is a set of hyperparameters. Note that there's no conditioning on $beta$ , and therefore the hyperparameters are considered fixed, which is not the case in hierarchical models but this not the point here.



                    The prior predictive distribution is the distribution of $X$ "averaged" over $theta$,



                    $$
                    p_beta(x) = int p(x mid theta) p_beta(theta) dtheta
                    $$



                    This distribution is prior as it does not rely on any observations.



                    We can also define the same way the posterior predictive distribution, that is if we have a sample $X = (X_1, dots, X_n)$ the posterior predictive distribution is



                    begin{align*}
                    p_beta(x mid X) &= int p(x mid X, theta) p_beta(theta) dtheta \
                    &= int p(x mid theta) p(X mid theta) p_beta(theta) dtheta \
                    &= int p(x mid theta) p_beta(theta mid X)dtheta
                    end{align*}



                    thus the posterior predictive distribution is constructed the same way as the prior predictive distribution but while in the latter we weight with $p_beta(theta)$ is the former we weight with $p_beta(theta mid X)$ that is with our "updated" knowledge about $theta$.



                    Example : Beta-Binomial



                    Suppose our model is $X mid theta sim Bin(n_1,theta)$ i.e $P(X = x mid theta) = theta^x(1-theta)^{n_1-x}$.



                    We suppose a beta prior distribution for $theta$, $beta(a,b)$ where $(a,b)$ is the set of hyper parameters.



                    Then the prior predictive distribution for $theta$ is the beta-binomial distribution of parameter $(n_1,a,b)$. This discrete distribution gives the probability of $k$ successes out of $n_1$ trials given hyper-parameter $(a,b)$ on the probability of success.



                    Now suppose we observe $n_1$ draws $(x_1, dots, x_{n_1})$ whith $x$ successes.



                    Since the binomial and beta distributions are conjugate distributions we have:
                    begin{align*}
                    p(theta mid X=x)
                    &propto theta^x (1 - theta)^{n_1-x} times theta^{a-1}(1-theta)^{b-1}\
                    &propto theta^{a+x-1}(1-theta)^{n_1+b-x-1} \
                    &propto beta(a+x,n_1+b-x)
                    end{align*}



                    Thus $theta mid x$ also follows a beta distribution. Then, $p(x mid x, a,b)$ follows a beta-binomial but this time of parameters $(a+x,b+n_1-x)$ rather than $(a,b)$



                    Upon a $beta(a,b)$ prior distribution and a $Bin(n_1,theta)$ likelihood, if we observe $x$ successes out of $n_1$ trials the posterior predictive distribution is a beta-binomial of parameters $(n_2,a+x,b+n_1-x)$. Note that $n_2$ and $n_1$ play differents roles, since here the posterior predictive is about:



                    Given my current knowledge on $theta$ after observing $x$ successes out of $n_1$ trials, i.e $beta(n_1,a+x,n+b-x)$, what probability I have of observing $k$ successes out of $n_2$ additional trials.



                    I hope this is useful and clear






                    share|cite|improve this answer









                    $endgroup$













                    • $begingroup$
                      Yeap, I believe I have understood what you have explained here. Thank you very much.
                      $endgroup$
                      – Changhee Kang
                      3 hours ago
















                    2












                    $begingroup$

                    Predictive here means predictive for observations. The prior distribution is a distribution for the parameters whereas the prior predictive distribution is a distribution for the observation.



                    If $X$ denotes observation and we use the model (or likelihood) $p(x mid theta)$ then a prior distribution is a distribution for $theta$, for example $p_beta(theta)$ where $beta$ is a set of hyperparameters. Note that there's no conditioning on $beta$ , and therefore the hyperparameters are considered fixed, which is not the case in hierarchical models but this not the point here.



                    The prior predictive distribution is the distribution of $X$ "averaged" over $theta$,



                    $$
                    p_beta(x) = int p(x mid theta) p_beta(theta) dtheta
                    $$



                    This distribution is prior as it does not rely on any observations.



                    We can also define the same way the posterior predictive distribution, that is if we have a sample $X = (X_1, dots, X_n)$ the posterior predictive distribution is



                    begin{align*}
                    p_beta(x mid X) &= int p(x mid X, theta) p_beta(theta) dtheta \
                    &= int p(x mid theta) p(X mid theta) p_beta(theta) dtheta \
                    &= int p(x mid theta) p_beta(theta mid X)dtheta
                    end{align*}



                    thus the posterior predictive distribution is constructed the same way as the prior predictive distribution but while in the latter we weight with $p_beta(theta)$ is the former we weight with $p_beta(theta mid X)$ that is with our "updated" knowledge about $theta$.



                    Example : Beta-Binomial



                    Suppose our model is $X mid theta sim Bin(n_1,theta)$ i.e $P(X = x mid theta) = theta^x(1-theta)^{n_1-x}$.



                    We suppose a beta prior distribution for $theta$, $beta(a,b)$ where $(a,b)$ is the set of hyper parameters.



                    Then the prior predictive distribution for $theta$ is the beta-binomial distribution of parameter $(n_1,a,b)$. This discrete distribution gives the probability of $k$ successes out of $n_1$ trials given hyper-parameter $(a,b)$ on the probability of success.



                    Now suppose we observe $n_1$ draws $(x_1, dots, x_{n_1})$ whith $x$ successes.



                    Since the binomial and beta distributions are conjugate distributions we have:
                    begin{align*}
                    p(theta mid X=x)
                    &propto theta^x (1 - theta)^{n_1-x} times theta^{a-1}(1-theta)^{b-1}\
                    &propto theta^{a+x-1}(1-theta)^{n_1+b-x-1} \
                    &propto beta(a+x,n_1+b-x)
                    end{align*}



                    Thus $theta mid x$ also follows a beta distribution. Then, $p(x mid x, a,b)$ follows a beta-binomial but this time of parameters $(a+x,b+n_1-x)$ rather than $(a,b)$



                    Upon a $beta(a,b)$ prior distribution and a $Bin(n_1,theta)$ likelihood, if we observe $x$ successes out of $n_1$ trials the posterior predictive distribution is a beta-binomial of parameters $(n_2,a+x,b+n_1-x)$. Note that $n_2$ and $n_1$ play differents roles, since here the posterior predictive is about:



                    Given my current knowledge on $theta$ after observing $x$ successes out of $n_1$ trials, i.e $beta(n_1,a+x,n+b-x)$, what probability I have of observing $k$ successes out of $n_2$ additional trials.



                    I hope this is useful and clear






                    share|cite|improve this answer









                    $endgroup$













                    • $begingroup$
                      Yeap, I believe I have understood what you have explained here. Thank you very much.
                      $endgroup$
                      – Changhee Kang
                      3 hours ago














                    2












                    2








                    2





                    $begingroup$

                    Predictive here means predictive for observations. The prior distribution is a distribution for the parameters whereas the prior predictive distribution is a distribution for the observation.



                    If $X$ denotes observation and we use the model (or likelihood) $p(x mid theta)$ then a prior distribution is a distribution for $theta$, for example $p_beta(theta)$ where $beta$ is a set of hyperparameters. Note that there's no conditioning on $beta$ , and therefore the hyperparameters are considered fixed, which is not the case in hierarchical models but this not the point here.



                    The prior predictive distribution is the distribution of $X$ "averaged" over $theta$,



                    $$
                    p_beta(x) = int p(x mid theta) p_beta(theta) dtheta
                    $$



                    This distribution is prior as it does not rely on any observations.



                    We can also define the same way the posterior predictive distribution, that is if we have a sample $X = (X_1, dots, X_n)$ the posterior predictive distribution is



                    begin{align*}
                    p_beta(x mid X) &= int p(x mid X, theta) p_beta(theta) dtheta \
                    &= int p(x mid theta) p(X mid theta) p_beta(theta) dtheta \
                    &= int p(x mid theta) p_beta(theta mid X)dtheta
                    end{align*}



                    thus the posterior predictive distribution is constructed the same way as the prior predictive distribution but while in the latter we weight with $p_beta(theta)$ is the former we weight with $p_beta(theta mid X)$ that is with our "updated" knowledge about $theta$.



                    Example : Beta-Binomial



                    Suppose our model is $X mid theta sim Bin(n_1,theta)$ i.e $P(X = x mid theta) = theta^x(1-theta)^{n_1-x}$.



                    We suppose a beta prior distribution for $theta$, $beta(a,b)$ where $(a,b)$ is the set of hyper parameters.



                    Then the prior predictive distribution for $theta$ is the beta-binomial distribution of parameter $(n_1,a,b)$. This discrete distribution gives the probability of $k$ successes out of $n_1$ trials given hyper-parameter $(a,b)$ on the probability of success.



                    Now suppose we observe $n_1$ draws $(x_1, dots, x_{n_1})$ whith $x$ successes.



                    Since the binomial and beta distributions are conjugate distributions we have:
                    begin{align*}
                    p(theta mid X=x)
                    &propto theta^x (1 - theta)^{n_1-x} times theta^{a-1}(1-theta)^{b-1}\
                    &propto theta^{a+x-1}(1-theta)^{n_1+b-x-1} \
                    &propto beta(a+x,n_1+b-x)
                    end{align*}



                    Thus $theta mid x$ also follows a beta distribution. Then, $p(x mid x, a,b)$ follows a beta-binomial but this time of parameters $(a+x,b+n_1-x)$ rather than $(a,b)$



                    Upon a $beta(a,b)$ prior distribution and a $Bin(n_1,theta)$ likelihood, if we observe $x$ successes out of $n_1$ trials the posterior predictive distribution is a beta-binomial of parameters $(n_2,a+x,b+n_1-x)$. Note that $n_2$ and $n_1$ play differents roles, since here the posterior predictive is about:



                    Given my current knowledge on $theta$ after observing $x$ successes out of $n_1$ trials, i.e $beta(n_1,a+x,n+b-x)$, what probability I have of observing $k$ successes out of $n_2$ additional trials.



                    I hope this is useful and clear






                    share|cite|improve this answer









                    $endgroup$



                    Predictive here means predictive for observations. The prior distribution is a distribution for the parameters whereas the prior predictive distribution is a distribution for the observation.



                    If $X$ denotes observation and we use the model (or likelihood) $p(x mid theta)$ then a prior distribution is a distribution for $theta$, for example $p_beta(theta)$ where $beta$ is a set of hyperparameters. Note that there's no conditioning on $beta$ , and therefore the hyperparameters are considered fixed, which is not the case in hierarchical models but this not the point here.



                    The prior predictive distribution is the distribution of $X$ "averaged" over $theta$,



                    $$
                    p_beta(x) = int p(x mid theta) p_beta(theta) dtheta
                    $$



                    This distribution is prior as it does not rely on any observations.



                    We can also define the same way the posterior predictive distribution, that is if we have a sample $X = (X_1, dots, X_n)$ the posterior predictive distribution is



                    begin{align*}
                    p_beta(x mid X) &= int p(x mid X, theta) p_beta(theta) dtheta \
                    &= int p(x mid theta) p(X mid theta) p_beta(theta) dtheta \
                    &= int p(x mid theta) p_beta(theta mid X)dtheta
                    end{align*}



                    thus the posterior predictive distribution is constructed the same way as the prior predictive distribution but while in the latter we weight with $p_beta(theta)$ is the former we weight with $p_beta(theta mid X)$ that is with our "updated" knowledge about $theta$.



                    Example : Beta-Binomial



                    Suppose our model is $X mid theta sim Bin(n_1,theta)$ i.e $P(X = x mid theta) = theta^x(1-theta)^{n_1-x}$.



                    We suppose a beta prior distribution for $theta$, $beta(a,b)$ where $(a,b)$ is the set of hyper parameters.



                    Then the prior predictive distribution for $theta$ is the beta-binomial distribution of parameter $(n_1,a,b)$. This discrete distribution gives the probability of $k$ successes out of $n_1$ trials given hyper-parameter $(a,b)$ on the probability of success.



                    Now suppose we observe $n_1$ draws $(x_1, dots, x_{n_1})$ whith $x$ successes.



                    Since the binomial and beta distributions are conjugate distributions we have:
                    begin{align*}
                    p(theta mid X=x)
                    &propto theta^x (1 - theta)^{n_1-x} times theta^{a-1}(1-theta)^{b-1}\
                    &propto theta^{a+x-1}(1-theta)^{n_1+b-x-1} \
                    &propto beta(a+x,n_1+b-x)
                    end{align*}



                    Thus $theta mid x$ also follows a beta distribution. Then, $p(x mid x, a,b)$ follows a beta-binomial but this time of parameters $(a+x,b+n_1-x)$ rather than $(a,b)$



                    Upon a $beta(a,b)$ prior distribution and a $Bin(n_1,theta)$ likelihood, if we observe $x$ successes out of $n_1$ trials the posterior predictive distribution is a beta-binomial of parameters $(n_2,a+x,b+n_1-x)$. Note that $n_2$ and $n_1$ play differents roles, since here the posterior predictive is about:



                    Given my current knowledge on $theta$ after observing $x$ successes out of $n_1$ trials, i.e $beta(n_1,a+x,n+b-x)$, what probability I have of observing $k$ successes out of $n_2$ additional trials.



                    I hope this is useful and clear







                    share|cite|improve this answer












                    share|cite|improve this answer



                    share|cite|improve this answer










                    answered 5 hours ago









                    winperiklewinperikle

                    584




                    584












                    • $begingroup$
                      Yeap, I believe I have understood what you have explained here. Thank you very much.
                      $endgroup$
                      – Changhee Kang
                      3 hours ago


















                    • $begingroup$
                      Yeap, I believe I have understood what you have explained here. Thank you very much.
                      $endgroup$
                      – Changhee Kang
                      3 hours ago
















                    $begingroup$
                    Yeap, I believe I have understood what you have explained here. Thank you very much.
                    $endgroup$
                    – Changhee Kang
                    3 hours ago




                    $begingroup$
                    Yeap, I believe I have understood what you have explained here. Thank you very much.
                    $endgroup$
                    – Changhee Kang
                    3 hours ago










                    Changhee Kang is a new contributor. Be nice, and check out our Code of Conduct.










                    draft saved

                    draft discarded


















                    Changhee Kang is a new contributor. Be nice, and check out our Code of Conduct.













                    Changhee Kang is a new contributor. Be nice, and check out our Code of Conduct.












                    Changhee Kang is a new contributor. Be nice, and check out our Code of Conduct.
















                    Thanks for contributing an answer to Cross Validated!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f394648%2fdifferences-between-prior-distribution-and-prior-predictive-distribution%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Benedict Cumberbatch Contingut Inicis Debut professional Premis Filmografia bàsica Premis i...

                    Monticle de plataforma Contingut Est de Nord Amèrica Interpretacions Altres cultures Vegeu...

                    Escacs Janus Enllaços externs Menú de navegacióEscacs JanusJanusschachBrainKing.comChessV