Putting together a 3D image is very different from processing 2D imagery. Instead of being a painter, you work more like a studio photographer, arranging objects carefully in front of your computerized camera. Instead of a canvas, your screen is a window into this strange alternate reality, where you can manipulate things only through the proxy of your mouse or tablet pen.
To build a 3D image (actually, the image itself is usually (like most images) 2D, but you construct the elements that make up the image in 3D space), you first have to have objects to put in it. To get these objects, you must model them. The modelling process consists of describing the salient features of the object to the computer, typically representated in a vector graphics format.
Depending on what type of software you're using, this might consist of building a kind of mechanical drawing of the object using a special "modeller" program. Or, you might simply describe the object using a simple computer language. Once the shape of the object is defined, the surface properties are set up. These consist of the object's colour, texture, and roughness.
Now the object is built, it can be positioned in front of the virtual camera. You are often shown a view through the viewfinder of the camera, with other views down each major axis (X, Y and Z). These secondary views let you move the objects with precision -- it's impossible for the software to determine exactly how to move a particular object from just the viewfinder alone.
Once the objects are all arranged, it's time to turn on the lights. There are no shortage of virtual lights to go with your virtual camera. However, lighting is one of the most difficult arts to master. A simple repositioning changes a dramatic, moody scene into a direct product shot. Any book on theatrical or photographic lighting is helpful here.
The process of actually snapping the photo is called rendering. For speed and convenience during the setup process (all that above) the computer only shows simplified representations of the objects you're moving around. When you're actually ready to get a proper picture, the machine suddenly lets the hammer down on all that floating point math hardware you hardly ever use. It caclulates a simulation of the light leaving the light sources, bouncing around your scene and off the objects and then into your camera, exposing a virtual film plate. (That's why a render slows down if you add more lights: there's many more shading computations to do for each object in the scene.)
Because of the exacting nature of the process, it can take a long time. Many early artists waited days or weeks for their virtual snaps to develop. Thanks to the rising tide of processor speed and memory capacity, you probably won't have to wait that long, but be prepared to go get a cup of coffee while the image cooks. Depending, you might just want to step out for some lunch...