Not to be a disappointment but you need one of these:

For what you are wanting to do, you need professional grade tools and hardware which is quite expensive, equipment and tools just like the ones used to dub TV series in various languages, it still degrades the background sound a little bit or hides it just enough to not being noticeable at normal volumes.
Try comparing the background sound of a series in its original form vs the dubbed one, you'll notice the difference, same happens the other way around when trying to separate voices from the background (only exception is when the company that produced the original series provides the background mix to the company making the dubbing, then background sound is not altered and the voices are just added to the mix).
Having an MP3 file doesn't make things easier either. An MP3 file while it sounds good doesn't sound near as faithful as the original recording, an MP3 encoding utility tosses a lot of audio data and audio details to create a small file, an MP3 file has a lot of missing data but sounds "just about right".
An MP3 file is not the most fit format to do professional recording, it will produce a lot more degraded sound when re-encoded for playback on standard devices.
(Just in case, I have worked on sound and video editing for TV and universities before, while I'm not a pro I know just about enough of how things work and what works and what doesn't)