These are definitely some healthy questions you have and this is great for the discussion. I know my word alone probably doesn't mean much, but I can tell you that I have lot of coding experience in these matter both in graphic design and concept execution. So I'll whip some examples and explanations to help explain sub palettes and scrolls in usage

Of course anyone else can jump in as well
(Edit)
Is this entire paragraph talking about raster effects alone, or parallax levels with raster effects?
Both. Raster effects is a generic term meaning you change an attribute of the display on a given scanline. This could be; the X BG(s) position, Y BG(s) position, turning on/off sprites, turning on/off BG layer(s), changing colors, and other video registers that whatever specific allows to be changed. Now- in parallax scrolling you change the X position of the BG layer on a given scanline. If you use a sine table, you get a wavy effect. If you use a constant fixed or ratio'd addition to the existing X value, you get a different scroll speed (faster or slower). If you set the X reg to a non changing value, you get a status window. For parallax scrolls, the height of the scroll is just the difference between the two values set for the background X register. You see, if you scroll a background farther than what the width of the tilemap is, you get wrapping. The back ground will wrap back into itself. There other names for this; hsync, h-int, line scroll, etc. The idea is still the same - to change the X and/or Y position of the BG layer on a given scanline. With all three systems, you can do it on every scanline.
Say the background is scrolling at 1 pixel per frame. On scanline 42(from the top) you add +2 to the scroll value, on scanline 50 you add +3 to the scroll value. This gives you 3 multi-scrolling planes. The upper part of the BG is scrolling at 1 pixel per frame, scanline 42 through 50 scrolls at 3 pixels per frame, scanline 50 to the bottom scanline scrolls at 4 pixels per frame. If you wanted, you can do this for every scanline. When you do that, it's usually referred to 'line scrolls'. Because you're setting the scroll value on every sequential scanline. X position of the BG is not the only reg you can change. Changing the Y reg gives different effects. You can scale or squish the screen vertically. Squishing it by skipping Y positions on every scanline or enlarging it by repeating the same Y position for multiple scanlines. All three systems can do this. If you combine X and Y updating per scanline and use a sine table as an offset, you get an effect like in this video (3:07):
http://www.youtube.com/watch?v=eIFkFQjTAg0
So when I say that the multiple scrolls in the back layer of Green hill zone aren't impressive technically, I mean that other effects require updating *every* scanline(like in that video) unlike those parallax scrolls in Sonic. The SNES or PCE have no problem doing such parallax scrolls. Doing the calculations for these effects are resource heavy either. It's simple addition and most of the time it's just a index into a pre-calculated table(no one is stupid enough to calculate sine in realtime - at least I would hope not). You know the vertical levels in Axelay? That's not mode 7. It's simple Y scaling hsync effect. The PCE and Genesis can do the same (Chris Covell wrote a demo of the first level boss using that effect).
Now to get into some more detail. The way each system does the hsync effects are slightly different. I'll start with the PCE since this uses the traditional method. On the PC-Engine, you can set the video processor to create an interrupt on every scanline (all 262 scanlines - yes even the ones in vblank). An interrupt is when another device taps the CPU on the shoulder to perform a small task (or larger or whatever). The CPU takes a break from what it's doing, jumps to the interrupt routine, then jumps back and continues its work. This means the CPU doesn't have to sit there in a specialized timed loop wait for the correct time for update the X/Y/whatever video registers. On the Genesis, this method isn't the best for the 68k CPU since it uses a slow interrupt system (pushes all regs onto the stack in comparison to the PCE an other processors with push barely anything onto the stack for the call). To get around with a more efficient method, Sega chose to have a block in VRAM that contains scroll values for each scanline. So no interrupt is needed. The method of the SNES is similar to the Genesis, even though it has fast interrupt call like the PCE - the slower clock speed isn't really efficient for updating lots of video registers. In comes the HDMA system. DMA is a controller that copies nibbles/bytes/words/whatever from a source location to a destination. It's automated and doesn't require CPU attention other than to initialize. The SNES has 8 of these DMA channels, so it can transfer up to 8 different registers per scanline. This is also how you get perspective scaling for mode 7(the 3D looking plane). The HDMA channels have a block of memory like with the Genesis, but for more than just scroll registers and it exists in main ram(and can be reconfigured on the fly). The HDMA is fast and doesn't ask the cpu to do the work manually like an interrupt system would. Perfect for a slower processor too.
A side note: Without getting too technical in details, the SNES CPU runs at 2.68mhz, 3.07mhz, and 3.58mhz. A lot of early games ran in 'slow rom' mode which is 2.68mhz. Later games ran in 'fast rom' mode which is 3.58mhz. Technically the CPU runs at a constant 3.58mhz, but 'wait states' in slow rom bring down the speed. A stupid mistake on Nintendo's part was to also use slow RAM. The WRAM, or work ram has the same number of wait states as slow rom. The majority of instructions access ram and also memory mapped registers (ZP registers, or DP as they were renamed, are the life blood of the CPU's architecture). So even with fast rom, you're only achieving about 3.07mhz on average. Couple that with the fact that in 16bit mode (it can run in 8bit mode if selected), strictly 8bit operations are 25% slower in 16bit mode thanks to the stupid decision of WDC to use only an 8bit data bus. 16bit operations of course are faster than what there were in the 8bit previous model CPUs, but when it comes to console dev - 8bit operations are done more frequently than you think. And no, have a 16bit CPU doesn't mean you can do two individual 8bit operations at once. It doesn't work like that.
Back on topic. So you can have up to 224/240/whatever scrolls on all three systems. It's simple addition or indexing into a LUT (look up table). On the Genesis, you have two BG layers. They can overlap one another. If you change the X position at the right scanline/point of the tilemap, you can cascade the layers vertically. This is done in the Ocean side level of TF4. But since there's only two BG layers, you can't overlap these 'pseudo' as you scroll the map upwards or downwards because it doesn't have as many layers as the viewer thinks they are seeing. It's somewhat hard to describe if you don't have the technical background (and I'm probably doing a poor job of it

). While the SNES
can have up to 4 backgrounds, only 1 mode(0) uses it and it's not terribly useful. Each layer uses 2bit tiles (3colors +1 if nothing appears behind it). There is another mode though that shows 3 BG layers. Two layers are standard 4bit tiles with palettes and 1 layer with 2bit tiles and 4 palettes. So the amount of pseudo overlapping layers is some magnitude higher than 2 layers. And yet you don't really see SNES games take advantage of it. Compared to the many other effects the snes could show off, crazy parallax really isn't a pressing point. There are
other effects that are more cpu intensive that the SNES does than doing some crazy parallax effects. It is a matter of choice. The Genesis lacks color sub/add, mosaic, scaling rotation, pixelizing, huge palette, etc. Developers add what they can to make up for the difference. Whether it be SNES, arcade, or whatever. Look at newer systems than the Genesis. You don't see the excessive parallax even though those system could pull it off even more so. The PCFX(since it never gets mentioned) has 6 BG layers and a much faster processor, yet you don't see massive parallax effects. Heh - all you ever see on that system is FMV and dating sims

.
I think I forgot about the PCE. So it only has one BG layer, but can so up to 240 independent scrolling lines. Overlapping sections are usually done by dynamic tiles and sometimes accompanied by sprites. Doing scrolls like in Air Zonk or Coryoon are easy. They are just parallax done with an hsync interrupt. The parallax scrolling in Lords of Thunder first level (desert) for the sand is good example of dynamic tiles. This also allowed the large sand monster to appear to be it's own layer ontop of the parallax when in fact it's just part of the same BG layer.
http://www.youtube.com/watch?v=PVqkODdAI4o @ 3:58. It's hard to see the parallax scrolling of the sand, but you've probably played the game already

Anyway, that's the guy I was referring to. This method *is* more processor intensive, but the PCE doesn't really have a choice if it needs overlapping BG layers. The PCE's processor is pretty fast ( a testament is the games lack of slowdown even on the hardest setting with 'revenge' bullets filling the screen at times while doing this method). The trick has its limitations. You're not going to see Sonic Green Hill zone on the PCE as is. It's just not feasible with a single layer. That said, you can do an approximation of it. PC-Engine SGX, of course has two independent BG layers (and sprite layers too).
Anyway, I hope that helps explain how parallax is basically the same method as other wavy or such screen type effects. And just because the SNES games don't necessarily show that style of parallax, doesn't mean it can't as it has examples of finer X/Y register updating effects, as does the Genesis itself in other games.
Myself, my sources, and my audience only knows what programmers say in interviews and nothing more. Raster effects, as I understand it, were used to create animated background elements like waterfalls, and are essentially an animated palette gradient like the Sega logo at the start of most Genesis titles.
That's color cycling. It's not done as a raster effect. Horizontal gradient bars of the early Amiga days and demos are raster effects. Color cycling is just offsetting the color in a sub-palette by 'rotating' and wrapping it, once every frame(s).
However, developers had various means for creating the effect of multiple scrolling backgrounds within the limitations of the hardware. These include, but are not limited to, rows of sprites, animated sprites, and line scrolling.
For line scrolling, see above. As for using sprites, that's usually more common on the PCE than the SNES or Genesis, since they rarely need assistance for multiple layers. Sprites are used in some situations for more complex masking of sprites and layers. But again, used much-much more on PCE since it lacks an additional BG layer. In Xanadu I or II for instance, when you go behind a structure(anything) and part of the sprite i s covered up by the BG layer. It uses dynamically placed sprites to mask and clip the player sprite as to appear to go behind the object. SNES and Genesis usually don't need this as they'll have 2nd BG layer ontop of the first BG layer, and scroll it at the same time - giving areas where the sprite will go behind BG layer 1 and be hidden.
The Genesis, while being technically limited to 2 backgrounds, could break them up into eight line rows or independently scrolling lines.
While this hardware supported effect might not be hardware intensive, it was a distinctive characteristic of Genesis games over PCE/TG16 or SNES titles.
So much so that while some SNES and PCE titles do display more parallax layers than their technical limitations would allow, none display as many simultaneously as Genesis titles.
See the first part.
This indicates some sort of processing advantage on the side of the Genesis. Since all Genesis games are limited to 64 colors on screen divided up into four palettes of sixteen, I *suspect* this limitation has something to do with it.
Not at all. The colors and palettes have nothing to do with the scrolling or its processing cost (or video processing cost). The colors also have no bearing on CPU resource or speed.
Are you referring to every level of Sonic 2, or the first level? Aquatic Ruins has to be more than a line update on background layers. I would assume that the presence of independently scrolling backgrounds behind and in front of the playing field be a combination of several workarounds, rather than simple line scrolling.
I was using the first level as an example of extreme parallax. See the first part for line scrolling and such.
I see that 482 number tossed around for the PCE about as often as I see 256 for the SNES, but the reality of actual commercial software is much different. I haven't seen the PCE display over 90 colors simultaneously, and I haven't seen the SNES display over 150 colors simultaneously during gameplay. Some SNES title screens and cut scenes approach 256 colors, but I have not seen gameplay scenes that do so. This indicates a resource issue to me.
And you wouldn't be the first to think that either. Most people are skeptical on the color counts of PCE. And I guess less so with SNES, but not that I've seen. It's usually exaggerated. Like, "That game looks like it's using all 256 colors onscreen", etc. And the SNES far exceeds the '256' color count onscreen when mosaic and color add/sub layer is used. But I usually don't count that because it doesn't work in the same method as plotting a tile or sprite pixel, i.e. it's not the same direct pixel method.
First the PCE and the common SNES mode. Every tile or sprite onscreen is only 16 colors. That's it. You can't define a color higher than 16 because it only used 4bits per pixel. This is done because 4bits is twice as smaller as 8bits to store in vram (not to mention the speed of fetching the pixel data
during active display to build it out). But 16 colors really isn't going to cut. So the work around was to attach a palette number to each tile or sprite. This palette number indicates which 16 colors the tile or sprite is referring to. The video processor takes the current 4bit pixel and adds it to the palette number associated with that tile or sprite (or multiplexes it along with some conditional test zero logic, but end result is the same). The palette number is multiplied by 16.
So if a tile is using palette # 3, then all the colors in that tile would range from 48+1 to 48+15 (or $30+$1 to $30+$F in hex). Color # 0 isn't shown.
A tile or sprite can't access colors from two separate sub-palettes. So if one tile needs 8 shades of green and also needs 4 shades of gray and the those shades of gray exist in another sub-palette (group of 15 colors) but don't have any of those shades of green, you're going to have to create a whole new sub-palette just for that tile. You've now wasted 4 color slots in new sub-palette, that already exists in different sub-palette. You get redundant colors through out the sub palettes. The smaller the number of sub-palettes a system has, the less efficient it becomes as storing unique colors because of the redundant colors being stored. Sure, you can find a Genesis game that has 50 colors onscreen, near its maximum palette usage, but this still greatly limits how the graphics are designed. Ports tend to show this artifact more so. Games designed around the limitations will have greater color fidelity because they are designed
around the limitation. Looking at Sonic 2, you'd think that the system would have no trouble handling other 'ports' of games (usually arcade). And when it doesn't happen, you hear stuff like "They didn't really try" or "lazy programmers/designers", etc. Hmm. I've seen to have drifted onto talk about the Genesis. Back to the palette explanation.
So the more sub-palettes the system can work with, the less restricted the system is in color usage. Does this effect storage? Slightly, yes. If you have a larger palette, you still have to store it somewhere, right? And that's true, but the sub-palette data itself is pretty tiny in comparison to many other structures or code in a game/program.
For one, you're not going to have a completely new palette for every color slot, for each level. But less say you did. It takes 2 bytes to store a color. That's 32bytes to store a single sub-palette. A 4bit 8x8 tile alone is 32bytes. And a level will have hundreds and hundreds of tiles, but only 16 sub-palettes for SNES and 32 sub-palettes for PCE. It's not uncommon to have 512 tiles in a PCE game - that's 16k. And then you have sprites on top of that which easily take up 32k or more. In comparison it takes only 1k for the whole set of PCE sub-palettes, and 0.5k for the SNES. Of course that's uncompressed. A simplest of compression schemes takes a full blown PCE palette down to 512bytes+32bytes or 0.53k.
But like I said, you're not going to have a totally new palette set per level. That's only worst cast scenario.
So you can see, it takes a fraction of a fraction of space for palette management and storage, and it costs no extra processing time by the video processor to output these additional colors.
There are PCE games that range into the 100-125 color count range and there are PCE games that range into the 20-40 range. The PCE's problem is that is can display soo many colors onscreen, that its 'main' palette definition is, relatively speaking, small. Even though it's the same size as the Genesis. Try to display all 482 out of the 512 colors on screen and
not have it look like a rainbow fest
So you're not going to see 200+ color count on the PCE, but that doesn't mean the sub-palettes are going to waste. 50 colors onscreen for the Genesis is more restrictive than
same 50 colors onscreen for the PCE. Having more subpalettes allows one to spread out the redundant colors without as much restriction to tiles or sprites.
Here's an example an SF2 background that I did work on:
http://pcedev.net/sf2_hack/tech/sf2_guile.png
http://pcedev.net/sf2_hack/tech/sf2_guile_tilemap.png
It uses 62 colors, but 14 tile sub-palettes. Compare it to the original SF2 BG from the PCE SF2'CE game. It's more detail, more colors, and the same amount of storage. The first pic is the palette map overlaid ontop the image to show which tiles use with sub-palettes.
Secondarily to that, do you mean to say that 32 distinct color palettes would not take up more ROM than 4 would? That does not make sense to me at all. If, furthermore, 32 distinct color palettes were to be displayed on screen simultaneously, would they not require more bandwidth and RAM than 4 distinct palettes would?
That's just it, it doesn't require any more bandwidth. These console systems are not designed like PCs. Like when going from 8bit pixels to 16bit pixels required double the bandwidth, etc. It doesn't work like that. The Genesis already has previsions for handling an additional 4 sub-palettes just for sprites. It also has support to putting all the sub-palette ram externally of the video processor and have the cpu interface it without additional multiplexing logic.
PCE games were simplistic for design and appeal reasons, not hardware reasons. The same goes for Genesis and parallax. And the SNES under(?) usage of parallax. The SNES had a slow CPU, not a slow video processor. There a lot of SNES games that uses additional chips for speed up reasons, and yet you didn't see the usage of parallax go up
Why would sacrificing Shadow/Highlighting in the VDP make more sub palettes possible?
Well, features come down to chip real state and cost. I personally think the Genesis would have benefited more from a separate 4 sub-palettes for sprites, than the shadow/highlight effect.
Note: I hope all this reads ok. It's late and I'm a bit tired.