powershell使用和cmd有相同表现的管道传递文件
问题发生在我想用qaac压制flac文件的时候。
因为qaac默认没有flac的支持,所以我用ffmpeg先把flac转为wav再通过管道传输给qaac。
这段代码在cmd下能够正常运行,但是在powershell下就不能正常运行。
ffmpeg.exe -i .\input.flac -f wav -| qaac64.exe -
然后我在Stack Overflow上找到了解决方案
https://stackoverflow.com/questions/59110563/different-behaviour-and-output-when-piping-through-cmd-and-powershell/59118502#59118502
问题是由于powershell的管道传递的是string类型的对象,他并不支持直接传递文件。
这个问题已经在github提了issue,并加入7.1版本的TODO中。
所以说目前powershell只支持传递文本,并且windows powershell编码默认是utf-16,还有一个注意点是,它会默认在文本后面加一个新行'\r\n'。
总结一下,
1.使用 cmd /c ‘一段命令’ 方式,这个命令作用是进入cmd执行一段命令然后退出
如下例子
cmd /c 'ffmpeg.exe -loglevel quiet -i "input.flac" -f wav - | qaac64.exe -'
2.使用其他第三方模块来实现,或者等待powershell7.1版本发布。
把原文也贴一下:
If you need raw byte handling and/or need to prevent PowerShell from situationally adding a trailing newline to your text data, avoid the PowerShell pipeline altogether.
Instead, shell out to cmd
with /c
:
cmd /c 'type .\test.txt | .\Crypt.exe --encrypt | .\Crypt.exe --decrypt'
Note that if you want to capture the output in a PowerShell variable, you need to make sure that [Console]::OutputEncoding
matches your .\Crypt.exe
program's (effective) output encoding (the active OEM code page), which should be true by default in this case; see the next section for details.
Generally, however, byte manipulation of text data is best avoided.
There are two separate problems, only one of which as a simple solution:
Problem 1: There is indeed an encoding problem, as you suspected:
PowerShell invisibly inserts itself as an intermediary in pipelines, even when sending data to and receiving data from external programs: It converts data from and to .NET strings (System.String
), which are sequences of UTF-16 code units.
In order to send to and receive data from external programs, you need to match their character encoding; in your case, with a Windows console application that uses raw byte handling, the implied encoding is the system's active OEM code page.
-
On sending data, PowerShell uses the encoding of the
$OutputEncoding
preference variable to encode (what is invariably treated as text) data, which defaults to ASCII(!) in Windows PowerShell, and (BOM-less) UTF-8 in PowerShell [Core]. -
The receiving end is covered by default: PowerShell uses
[Console]::OutputEncoding
(which itself reflects the code page reported bychcp
) for decoding data received, and on Windows this by default reflects the active OEM code page, both in Windows PowerShell and PowerShell [Core][1].
To fix your primary problem, you therefore need to set $OutputEncoding
to the active OEM code page:
# Make sure that PowerShell uses the OEM code page when sending
# data to `.\Crypt.exe`
$OutputEncoding = [Console]::OutputEncoding
Problem 2: PowerShell invariably appends a trailing newline to data that doesn't already have one when piping data to external programs:
That is, "foo" | .\Crypt.exe
doesn't send (the $OutputEncoding
-encoded bytes representing) "foo"
to .\Crypt.exe
's stdin, it sends "foo
rn"
on Windows; i.e., a (platform-appropriate) newline sequence (CRLF on Windows) is automatically and invariably appended (unless the string already happens to have a trailing newline).
This problematic behavior is discussed in this GitHub issue and also in this answer.
In your specific case, the implicitly appended "
rn"
is also subject to the byte-value-shifting, which means that the 1st Crypt.exe
calls transforms it to -*
, causing another "
rn"
to be appended when the data is sent to the 2nd Crypt.exe
call.
The net result is an extra newline that is round-tripped (the intermediate -*
), plus an encrypted newline that results in φΩ
).
In short: If your input data had no trailing newline, you'll have to cut off the last 4 characters from the result (representing the round-tripped and the inadvertently encrypted newline sequences):
# Ensure that .\Crypt.exe output is correctly decoded.
$OutputEncoding = [Console]::OutputEncoding
# Invoke the command and capture its output in variable $result.
# Note the use of the `Get-Content` cmdlet; in PowerShell, `type`
# is simply a built-in *alias* for it.
$result = Get-Content .\test.txt | .\Crypt.exe --decrypt | .\Crypt.exe --encrypt
# Remove the last 4 chars. and print the result.
$result.Substring(0, $result.Length - 4)
Given that calling cmd /c
as shown at the top of the answer works too, that hardly seems worth it.
How PowerShell handles pipeline data with external programs:
Unlike cmd
(or POSIX-like shells such as bash
):
- PowerShell doesn't support raw byte data in pipelines.[2]
- When talking to external programs, it only knows text (whereas it passes .NET objects when talking to PowerShell's own commands, which is where much of its power comes from).
Specifically, this works as follows:
-
When you send data to an external program via the pipeline (to its stdin stream):
-
It is converted to text (strings) using the character encoding specified in the
$OutputEncoding
preference variable, which defaults to ASCII(!) in Windows PowerShell, and (BOM-less) UTF-8 in PowerShell [Core].-
Caveat: If you assign an encoding with a BOM to
$OutputEncoding
, PowerShell (as of v7.0) will emit the BOM as part of the first line of output sent to an external program; therefore, for instance, do not use[System.Text.Encoding]::Utf8
(which emits a BOM) in Windows PowerShell, and use[System.Text.Utf8Encoding]::new($false)
(which doesn't) instead. -
If the data is not captured or redirected by PowerShell, encoding problems may not always become apparent, namely if an external program is implemented in a way that uses the Windows Unicode console API to print to the display.
-
-
Something that isn't already text (a string) is stringified using PowerShell's default output formatting (the same format you see when you print to the console), with an important caveat:
- If the (last) input object already is a string that doesn't itself have a trailing newline, one is invariably appended (and even an existing trailing newline is replaced with the platform-native one, if different).
- This behavior can cause problems, as discussed in this GitHub issue and also in this answer.
-
-
When you capture / redirect data from an external program (from its stdout stream), it is invariably decoded as lines of text (strings), based on the encoding specified in
[Console]::OutputEncoding
, which defaults to the active OEM code page on Windows (surprisingly, in both PowerShell editions, as of v7.0-preview6[1]). -
PowerShell-internally text is represented using the .NET
System.String
type, which is based on UTF-16 code units (often loosely, but incorrectly called "Unicode"[3]).
The above also applies:
-
when piping data between external programs,
-
when data is redirected to a file; that is, irrespective of the source of the data and its original character encoding, PowerShell uses its default encoding(s) when sending data to files; in Windows PowerShell,
>
produces UTF-16LE-encoded files (with BOM), whereas PowerShell [Core] sensibly defaults to BOM-less UTF-8 (consistently, across file-writing cmdlets).
Adding support for raw data passing between external programs and to-file redirections is the subject of this GitHub issue.