the bit width of filter

I am working on a Sigma Delta ADC project, and need to decide the bit-width of the digital filter.
My filter has 4 stages, the first is CIC and the bit width is 29 bits according to the OSR. My final filter output is only 24 bits. So this means the other 3 FIR filters need to reduce 5 bits in total. If my input is 4-bits signed and output is 24 bits signed, OSR=256, how to decide the bit reducation of each the 3 filters. What’s the impact of the performance due to the bit-width

Re: the bit width of filter

assuming that this will be a fixed sample rate converter, i.e. non-reconfigurable unlike the Intersil/Harris or ADI decimating LPF and similar products.

S.D. 4 1st 29 2nd ? 3rd ? 4th 24

The bit growth for a single stage CIC is b_growth = log2( R*D ), where R is sample rate reduction and D is the comb length. From your description above it sounds like the first LPF is only a single stage CIC, so if R = 2^8 then D is significantly high to account for the (29 – 4) bits allocated. Such a high value of D is expensive. Also, performing a single stage rate reduction will incur significant aliasing, but I may have misinterpreted this.

Since project details are unavailable, you may be considering or should consider the following for their cumulative effects on the system design:
a) Multiplier and/or multiplierless filters for the other stages.
b) Distribution of the 2^8 sample rate reduction.
c) Minimum rejection level of aliased components at each stage.
d) Will passband amplitude equalization be required for CIC and/or ADC droop.
e) Placement of stopband nulls at aliased system clock components, e.g. ADC clock and other board-level clock sources.
f) Will signal of interest (SOI) consistently be near ADC full-scale (e.g. predetect AGC), or will it be necessary to chase weak SOIs (e.g. LSB preservation to allow for growth)?
g) Will the sampled environment be noise or interference limited, e.g. system’s dynamic range?

Therefore it is difficult to gauge where LSB pruning is best suited. Various architectures and their internal bit-widths should be simulated (e.g. Simulink Fixed-Point Block Set) prior to final implementation.