Lebowtech Blog

Count Words in Html Files

| Comments

We created a new translation system for a project where we are using html files that will be then sent off for translation. This allows us to make changes to the layout if needed to fit a specific language.

The translation company needs to know the word count to give us a price quote, and we don’t want then to take into account the tags.

Here is a one-liner in fish shell that should get the word count for all the .phtml files.

cat (find . -iname "*.phtml"  ) | w3m -dump -T text/html  | wc -w

In bash you would probably replace the parens with backticks.

Also we are using PO files for short text throught the site. The translator we are using doesn’t know how to work with po files. So we used po2csv to create them files that they can open in a spreadsheet, although it makes me wonder what type of translation company can’t use po files.

Here is som fish shell code to make the csv files and bundle them into a zip and send them off:

# for each lang directory make a csv dir and put the new csv's in there
for lang in ??
    ehco $lang
    mkdir $lang.csvs
    po2csv $lang $lang.csvs
end

# zip up our csv and po files
find -name "??_messages.*" | sort | zip csvsAndPos -@

# mail it off
mutt -a csvsAndPos.zip