str_word_count

(PHP 4 >= 4.3.0, PHP 5, PHP 7, PHP 8)

str_word_count返回字符串中单词的使用情况

说明

str_word_count(string $string, int $format = 0, ?string $characters = null): array|int

统计 string 中单词的数量。如果可选的参数 format 没有被指定,那么返回值是一个代表单词数量的整型数。如果指定了 format 参数,返回值将是一个数组,数组的内容则取决于 format 参数。format 的可能值和相应的输出结果如下所列。

对于这个函数的目的来说,单词的定义是一个与区域设置相关的字符串。这个字符串可以包含字母字符,也可以包含 "'" 和 "-" 字符(但不能以这两个字符开始)。 请注意,不支持多字节编码的字符串。

参数

string

字符串。

format

指定函数的返回值。当前支持的值如下:

  • 0 - 返回单词数量
  • 1 - 返回一个包含 string 中全部单词的数组
  • 2 - 返回关联数组。数组的键是单词在 string 中出现的数值位置,数组的值是这个单词

characters

附加的字符串列表,其中的字符将被视为单词的一部分。

返回值

返回一个数组或整型数,这取决于 format 参数的选择。

更新日志

版本 说明
8.0.0 characters 可为空(Nullable)类型。

示例

示例 #1 str_word_count() 示例

<?php

$str
= "Hello fri3nd, you're
looking good today!"
;

print_r(str_word_count($str, 1));
print_r(str_word_count($str, 2));
print_r(str_word_count($str, 1, 'àáãç3'));

echo
str_word_count($str);

?>

以上示例会输出:

Array
(
    [0] => Hello
    [1] => fri
    [2] => nd
    [3] => you're
    [4] => looking
    [5] => good
    [6] => today
)

Array
(
    [0] => Hello
    [6] => fri
    [10] => nd
    [14] => you're
    [29] => looking
    [46] => good
    [51] => today
)

Array
(
    [0] => Hello
    [1] => fri3nd
    [2] => you're
    [3] => looking
    [4] => good
    [5] => today
)

7

参见

添加备注

用户贡献的备注 11 notes

up
40
cito at wikatu dot com
14 years ago
<?php

/***
 * This simple utf-8 word count function (it only counts) 
 * is a bit faster then the one with preg_match_all
 * about 10x slower then the built-in str_word_count
 * 
 * If you need the hyphen or other code points as word-characters
 * just put them into the [brackets] like [^\p{L}\p{N}\'\-]
 * If the pattern contains utf-8, utf8_encode() the pattern,
 * as it is expected to be valid utf-8 (using the u modifier).
 **/

// Jonny 5's simple word splitter
function str_word_count_utf8($str) {
  return count(preg_split('~[^\p{L}\p{N}\']+~u',$str));
}
?>
up
17
splogamurugan at gmail dot com
17 years ago
We can also specify a range of values for charlist.

<?php
$str = "Hello fri3nd, you're
       looking          good today! 
       look1234ing";
print_r(str_word_count($str, 1, '0..3'));
?>

will give the result as 

Array ( [0] => Hello [1] => fri3nd [2] => you're [3] => looking [4] => good [5] => today [6] => look123 [7] => ing )
up
1
Adeel Khan
18 years ago
<?php

/**
 * Returns the number of words in a string.
 * As far as I have tested, it is very accurate.
 * The string can have HTML in it,
 * but you should do something like this first:
 *
 *    $search = array(
 *      '@<script[^>]*?>.*?</script>@si',
 *      '@<style[^>]*?>.*?</style>@siU',
 *      '@<![\s\S]*?--[ \t\n\r]*>@'
 *    );
 *    $html = preg_replace($search, '', $html);
 *
 */

function word_count($html) {

  # strip all html tags
  $wc = strip_tags($html);

  # remove 'words' that don't consist of alphanumerical characters or punctuation
  $pattern = "#[^(\w|\d|\'|\"|\.|\!|\?|;|,|\\|\/|\-|:|\&|@)]+#";
  $wc = trim(preg_replace($pattern, " ", $wc));

  # remove one-letter 'words' that consist only of punctuation
  $wc = trim(preg_replace("#\s*[(\'|\"|\.|\!|\?|;|,|\\|\/|\-|:|\&|@)]\s*#", " ", $wc));

  # remove superfluous whitespace
  $wc = preg_replace("/\s\s+/", " ", $wc);

  # split string into an array of words
  $wc = explode(" ", $wc);

  # remove empty elements
  $wc = array_filter($wc);

  # return the number of words
  return count($wc);

}

?>
up
1
manrash at gmail dot com
17 years ago
For spanish speakers a valid character map may be:

<?php
$characterMap = 'áéíóúüñ';

$count = str_word_count($text, 0, $characterMap);
?>
up
1
uri at speedy dot net
13 years ago
Here is a count words function which supports UTF-8 and Hebrew. I tried other functions but they don't work. Notice that in Hebrew, '"' and '\'' can be used in words, so they are not separators. This function is not perfect, I would prefer a function we are using in JavaScript which considers all characters except [a-zA-Zא-ת0-9_\'\"] as separators, but I don't know how to do it in PHP.

I removed some of the separators which don't work well with Hebrew ("\x20", "\xA0", "\x0A", "\x0D", "\x09", "\x0B", "\x2E"). I also removed the underline.

This is a fix to my previous post on this page - I found out that my function returned an incorrect result for an empty string. I corrected it and I'm also attaching another function - my_strlen.

<?php 

function count_words($string) {
    // Return the number of words in a string.
    $string= str_replace("&#039;", "'", $string);
    $t= array(' ', "\t", '=', '+', '-', '*', '/', '\\', ',', '.', ';', ':', '[', ']', '{', '}', '(', ')', '<', '>', '&', '%', '$', '@', '#', '^', '!', '?', '~'); // separators
    $string= str_replace($t, " ", $string);
    $string= trim(preg_replace("/\s+/", " ", $string));
    $num= 0;
    if (my_strlen($string)>0) {
        $word_array= explode(" ", $string);
        $num= count($word_array);
    }
    return $num;
}

function my_strlen($s) {
    // Return mb_strlen with encoding UTF-8.
    return mb_strlen($s, "UTF-8");
}

?>
up
1
brettNOSPAM at olwm dot NO_SPAM dot com
23 years ago
This example may not be pretty, but It proves accurate:

<?php
//count words
$words_to_count = strip_tags($body);
$pattern = "/[^(\w|\d|\'|\"|\.|\!|\?|;|,|\\|\/|\-\-|:|\&|@)]+/";
$words_to_count = preg_replace ($pattern, " ", $words_to_count);
$words_to_count = trim($words_to_count);
$total_words = count(explode(" ",$words_to_count));
?>

Hope I didn't miss any punctuation. ;-)
up
0
php dot net at salagir dot com
8 years ago
This function doesn't handle  accents, even in a locale with accent.
<?php
echo str_word_count("Is working"); // =2

setlocale(LC_ALL, 'fr_FR.utf8');
echo str_word_count("Not wôrking"); // expects 2, got 3.
?>

Cito solution treats punctuation as words and thus isn't a good workaround.
<?php
function str_word_count_utf8($str) {
      return count(preg_split('~[^\p{L}\p{N}\']+~u',$str));
}
echo str_word_count_utf8("Is wôrking"); //=2
echo str_word_count_utf8("Not wôrking."); //=3
?>

My solution:
<?php
function str_word_count_utf8($str) {
    $a = preg_split('/\W+/u', $str, -1, PREG_SPLIT_NO_EMPTY);
    return count($a);
}
echo str_word_count_utf8("Is wôrking"); // = 2
echo str_word_count_utf8("Is wôrking! :)"); // = 2
?>
up
0
dmVuY2lAc3RyYWhvdG5pLmNvbQ== (base64)
15 years ago
to count words after converting a msword document to plain text with antiword, you can use this function:

<?php
function count_words($text) {
    $text = str_replace(str_split('|'), '', $text); // remove these chars (you can specify more)
    $text = trim(preg_replace('/\s+/', ' ', $text)); // remove extra spaces
    $text = preg_replace('/-{2,}/', '', $text); // remove 2 or more dashes in a row
    $len = strlen($text);
    
    if (0 === $len) {
        return 0;
    }
    
    $words = 1;
    
    while ($len--) {
        if (' ' === $text[$len]) {
            ++$words;
        }
    }
    
    return $words;
}
?>

it strips the pipe "|" chars, which antiword uses to format tables in its plain text output, removes more than one dashes in a row (also used in tables), then counts the words.

counting words using explode() and then count() is not a good idea for huge texts, because it uses much memory to store the text once more as an array. this is why i'm using while() { .. } to walk the string
up
0
brettz9 - see yahoo
15 years ago
Words also cannot end in a hyphen unless allowed by the charlist...
up
0
charliefrancis at gmail dot com
16 years ago
Hi this is the first time I have posted on the php manual, I hope some of you will like this little function I wrote.

It returns a string with a certain character limit, but still retaining whole words.
It breaks out of the foreach loop once it has found a string short enough to display, and the character list can be edited.

<?php
function word_limiter( $text, $limit = 30, $chars = '0123456789' ) {
    if( strlen( $text ) > $limit ) {
        $words = str_word_count( $text, 2, $chars );
        $words = array_reverse( $words, TRUE );
        foreach( $words as $length => $word ) {
            if( $length + strlen( $word ) >= $limit ) {
                array_shift( $words );
            } else {
                break;
            }
        }
        $words = array_reverse( $words );
        $text = implode( " ", $words ) . '&hellip;';
    }
    return $text;
}

$str = "Hello this is a list of words that is too long";
echo '1: ' . word_limiter( $str );
$str = "Hello this is a list of words";
echo '2: ' . word_limiter( $str );
?>

1: Hello this is a list of words&hellip;
2: Hello this is a list of words
up
0
MadCoder
20 years ago
Here's a function that will trim a $string down to a certian number of words, and add a...   on the end of it.
(explansion of muz1's 1st 100 words code)

----------------------------------------------
<?php
function trim_text($text, $count){
$text = str_replace("  ", " ", $text);
$string = explode(" ", $text);
for ( $wordCounter = 0; $wordCounter <= $count;wordCounter++ ){ 
$trimed .= $string[$wordCounter];
if ( $wordCounter < $count ){ $trimed .= " "; }
else { $trimed .= "..."; }
}
$trimed = trim($trimed);
return $trimed;
}
?>

Usage
------------------------------------------------
<?php
$string = "one two three four";
echo trim_text($string, 3);
?>

returns:
one two three...
To Top