pandas dataframe/series 正则表达式使用
pandas dataframe/series 正则表达式使用 str.match str.contains str.extract
pandas.Series.str.match
Series.str.match(pat, case=True, flags=0, na=nan, as_indexer=False)[source]
Deprecated: Find groups in each string in the Series/Index using passed regular expression. If as_indexer=True, determine if each string matches a regular expression.
Parameters:
pat : string
Character sequence or regular expression
case : boolean, default True
If True, case sensitive
flags : int, default 0 (no flags)
re module flags, e.g. re.IGNORECASE
na : default NaN, fill value for missing values.
as_indexer : False, by default, gives deprecated behavior better achieved
using str_extract. True return boolean indexer.
Returns:
Series/array of boolean values
if as_indexer=True
Series/Index of tuples
if as_indexer=False, default but deprecated
Series.str.contains(pat, case=True, flags=0, na=nan, regex=True)[source]
Return boolean Series/array whether given pattern/regex is contained in each string in the Series/Index.
Parameters:
pat : string
Character sequence or regular expression
case : boolean, default True
If True, case sensitive
flags : int, default 0 (no flags)
re module flags, e.g. re.IGNORECASE
na : default NaN, fill value for missing values.
regex : bool, default True
If True use re.search, otherwise use Python in operator
Returns:
contained : Series/array of boolean values
Series.str.extract(pat, flags=0, expand=None)[source]
For each subject string in the Series, extract groups from the first match of regular expression pat.
New in version 0.13.0.
Parameters:
pat : string
Regular expression pattern with capturing groups
flags : int, default 0 (no flags)
re module flags, e.g. re.IGNORECASE
.. versionadded:: 0.18.0
expand : bool, default False
If True, return DataFrame.
If False, return Series/Index/DataFrame.
Returns:
DataFrame with one row for each subject string, and one column for
each group. Any capture group names in regular expression pat will
be used for column names; otherwise capture group numbers will be
used. The dtype of each result column is always object, even when
no match is found. If expand=False and pat has only one capture group,
then return a Series (if subject is a Series) or Index (if subject
is an Index).