TextCon 1.6 the ASCII File Converter CrossCourt Systems May 15, 1988 TABLE OF CONTENTS PURPOSE................................................ 2 USER LICENSING......................................... 2 FUNCTIONS.............................................. 3 FILE FORMATS........................................... 4 USE.................................................... 5 GENERAL CONVERSION PROBLEMS............................ 6 OPTIONS................................................ 7 BASIC OPTIONS........................................ 8 /W sloppy Wordstar input.................. 8 /R soft Returns in output file............ 9 /S# Split long lines....................... 9 /K Keep selected elements................. 9 LESS-USED, SPECIAL-PURPOSE OPTIONS.................. 12 /T# replace Tabs with spaces.............. 12 /1, /2 line spacing of input file............ 13 /P# Paragraph spacing in output file...... 13 /B Block-style input file................ 14 /I# first-line Indent in output file...... 14 /X, /Y line-ending hYphen processing......... 14 EXPERT OPTIONS...................................... 15 /L# cutoff Length......................... 15 /H#,/F# remove Headers or Footers............. 15 /Z# end-of-paragraph marker in input file. 16 OTHER USES FOR TextCon................................ 17 REVISION HISTORY...................................... 18 DISTRIBUTION AND SUPPORT.............................. 21 TextCon File Converter 2 PURPOSE: Virtually all word processors can import ASCII files, but anyone who has tried it knows that the results are often less than optimal. Imported ASCII files almost always require a great deal of manual "cleaning up" to get them into the desired format. The most common problems include unwanted hard carriage returns, extra blank spaces, and extra blank lines. TextCon is a file pre-processor for MSDOS computers that does most of this cleaning up for you, before you import the file to your word processor. The ASCII files that it produces are in a form that is much more suitable for importation to most word processors. TextCon does not eliminate all manual editing, but it makes the job much easier. TextCon has tremendous power and flexibility that can also be useful for other tasks involving data base files, desktop publishing, and program editing. TextCon users have found the program helpful for many kinds of file manipulations, such as WordStar-to-ASCII conversion, adding line feeds where only carriage returns are present, expanding tabs to spaces, removing all blank lines from a file, etc. The program is written in C, using the DeSmet C compiler, and runs on most MSDOS machines. USER LICENSING: TextCon is user-supported software that is distributed free of charge through electronic bulletin boards and by individual users sharing it with each other. It is expected that anyone who uses it on a regular basis will pay a license fee to CrossCourt Systems in compensation for the time invested in developing and supporting it. Payment can be made by Visa or MasterCard, if you wish. Chris Wolf, the author of TextCon, is a member of the Association of Shareware Professionals, an organization dedicated to promotion of the concept of shareware in conjunction with the highest possible standards of software quality and user service. In the spirit of shareware, the value of TextCon is left to you, the user, to determine. However, those who send payment of $25 or more for TextCon will be sent TextDCA, which has all the features of TextCon, but will also write files in IBM DCA/RFT format. This allows formatted ASCII or WordStar files to be imported to your word processor with spacing characteristics such as margins, indents, centering, tab stops, etc. intact. TextDCA includes full printed documentation. TextCon File Converter 3 If you use a word processor that accepts DCA/RFT format files (this includes Word Perfect, Microsoft Word, IBM Displaywrite, MultiMate, Volkswriter 3, WordStar 2000, and many others), TextDCA is simply the best program available for importing formatted ASCII files. This can save you a great deal of time that would otherwise be spent reformatting the imported file. TextDCA also includes a menu-driven mode (on PC-compatibles only) which simplifies the selection of processing options. FUNCTIONS: The functions performed by TextCon fall into five main categories: 1. Removing carriage returns The most common problem when importing ASCII files into word processors is that each line from the original file will end in a "hard" carriage return. In most cases these have to be removed manually in order to get the document formatted properly on the new word processor. TextCon uses a sophisticated algorithm to determine which sections of text constitute "paragraphs", and then it removes all carriage returns except those at the ends of paragraphs. (For this purpose, a paragraph is defined as a block of text where it is desirable for words to wrap to following or previous lines when editing or formatting changes are made.) TextCon can cope with almost any paragraph format including difficult ones like fully indented (nested), hanging indent, outline style, etc. It does not depend on double spacing or first-line indentation, although these are recognized. It will handle print-formatted files (i.e., those having a left margin of blanks), as well as the totally unformatted files used as input to formatters like NROFF (as long as they use "dot" commands). It is designed to recognize header lines, tables, etc. and will avoid reformatting them. (Of course, document formats vary widely, and it really takes human intelligence to recognize paragraph breaks with 100% accuracy. TextCon will occasionally make mistakes when dealing with particularly tricky formats.) 2. Adding carriage returns TextCon will also do the opposite process if you wish, adding carriage returns to files that have them only at the ends of paragraphs. Or it can deal with special file formats by substituting carriage returns for some other special character that is used to represent a paragraph end. TextCon File Converter 4 3. Removing blanks An ASCII file may have extraneous blanks that cause problems if they are imported to a word processor. There may be blanks at the beginnings of lines for a left margin or for indented or nested text. Files from mainframes often have lines with trailing blanks. There may be extra blanks within lines for justified text or between columns of a table (where tabs are more desirable). TextCon removes extraneous blanks and, where appropriate, replaces them with tabs, thus saving manual editing time. 4. Removing extraneous lines Some ASCII files have extraneous lines that TextCon will remove. Print-formatted files, for example, sometimes have additional lines inserted solely for underlining or boldface. TextCon will remove these so you don't have to. TextCon recognizes double- spaced files and converts them to single spacing. Lines that consist solely of "dot" commands (like WordStar's .PA) are converted to blank lines. You can also remove (or add) lines by setting the spacing between paragraphs to any specific number of blank lines you wish. Consecutive runs of more than two blank lines are reduced to two, which may help with files that have been formatted for a printer. TextCon also tries to recognize page breaks and eliminate all blank lines between pages if possible. The situation is more complicated if the file contains headers or footers, but there is an option which can, in certain cases, remove these as well. 5. Removing or converting characters TextCon translates all characters in WordStar files to their ASCII equivalents. It also removes all non-printing ASCII characters (except tabs) unless you ask that they be kept. It has three optional methods for dealing with line-ending hyphens. TextCon does not alter or remove the IBM extended ASCII characters, used for math symbols, letters from foreign alphabets, etc. FILE FORMATS: TextCon was designed to be as automatic as possible in its operation so it can be used by someone with very little knowledge about the files being converted. Although it has many options for specialized kinds of conversions, it will work very well on a wide variety of files without the use of any of the options. The options are described in a later section. If TextCon doesn't seem to work quite as you want it to, first read the section on CONVERSION PROBLEMS. If this doesn't help, you may want to read the option descriptions. They include more detailed information about the kinds TextCon File Converter 5 of changes TextCon makes to a file and how you can control these changes. TextCon is useful for cleaning up text files before importing them to many microcomputer word processors, including Microsoft Word, Word Perfect, and IBM Displaywrite, as well as some office automation systems, such as NBI. If a word processor exhibits problems with "hard carriage returns" when you import ASCII files, then the chances are that TextCon will help. Some PC word processors, including Volkswriter, MultiMate, and PC- Write, actually require the hard returns when importing an ASCII file, and have trouble with files that do not include them. TextCon can add carriage returns to files so that these word processors can import them successfully. This is not necessary, of course, if you use TextDCA to create a DCA/RFT file for importation to these programs. TextCon is designed primarily to read ASCII files, but it will also accept WordStar files, including those from version 4.0. However, it is not optimized for WordStar files to the extent that it can translate dot commands or imbedded codes for underlining, boldface, etc. Both the commands and the codes are simply removed from the file, and you have to manually add the appropriate formatting to the new document. (The /KC option will keep codes in the converted file, and the /KD option will do the same thing for dot commands.) If you are importing formatted ASCII files to a word processor that will accept DCA/RFT format files, the TextDCA version of TextCon offers an even higher level of performance than TextCon itself. USE: To run the program, use the command form: TEXTCON [options] infile outfile The "options" designate specific types of processing that you want done, and are chosen from the list in a later section of this manual. If you specify an illegal option, such as /A or /?, the program will display the legal options. Again, TextCon will handle most conversions very well with no options, so you don't have to read about all of the options in order to use the program. The file names can include the disk-drive identifier and a path name, if appropriate. The file specified as "infile" must be an ASCII file or a WordStar file; TextCon will not work on an internal word processor file. Some word processors, including PC-Write and Volkswriter, always keep their text in ASCII files. For other word processors, such as MultiMate, Word Perfect, or Microsoft Word, you will have to create an ASCII copy of your file before TextCon will work with it. If you try to convert an internal file, you may not get TextCon File Converter 6 an error message from TextCon, but when you load the converted file into another word processor, it will probably contain gibberish. The name you give for "outfile" will be used for the converted file; TextCon checks to see that it is different from "infile". However, if you have another file by this name, it will be overwritten. The output file will still be in ASCII form, and you must treat it as such when loading it into your word processor. For some word processors, such as Displaywrite or Word Perfect, you must use a special command. For others, such as Microsoft Word or WordStar, you can load it as you would any other document file. A typical command with no options would be as follows: TEXTCON A:PROPOSAL B:PROPOSAL.TXT The options are identified by a preceding slash or hyphen as a flag character, so a command with options might look as follows: TEXTCON /T5 /B TEXT.DOC B:TEXT.ASC Options can appear anywhere in the command line, so the preceding and following commands are equivalent: TEXTCON TEXT.DOC /T5 B:TEXT.ASC /B Multiple options can be concatenated, using a single slash or hyphen, to appear as follows: TEXTCON /T5B TEXT C:\DOCS\TEXT.OUT You must be careful when concatenating options this way, especially if you are using options with numeric "names" or those with sub-options. For example, if you wanted to use the options /T3 /2 /KC /B, and you combined them as /T32KCB, this would be interpreted as /T32 /KCB, which is very different than you intended. If, instead, you combined them as /2BT3KC, they would be interpreted correctly. Also, when interpreting an option that allows sub-options, TextCon considers every character up to the next blank to be a sub-option, so any option of this type must either stand alone or be concatenated last. If there is any question in your mind about this, keep all of the options separate on the command line. (The menu system in TextDCA simplifies this quite a bit.) GENERAL CONVERSION PROBLEMS: Many of TextCon's decisions are based on its analysis of the beginning of your input file. It analyzes approximately four pages of text, but this will vary from file to file. If your file has sections that are very distinct in formatting, the parameters that TextCon determines from the beginning of your file may not be accurate for the rest of the file. In these cases, TextCon will perform better if you subdivide the input file and process each distinctly formatted section separately. TextCon will insert an extra space following the hyphen in any word that is hyphenated at the end of a line. For example, the word ex- ample will be converted to ex- ample. You can find these and convert TextCon File Converter 7 them fairly easily by searching for "- " (a hyphen followed by a blank). It could have been designed to remove hyphens at the ends of lines, but then it would also have removed required hyphens, as in ex- president. You may want to use the /X or /Y options to change this behavior. When a converted ASCII file is loaded into the new word processor, tables may have their columns too close together or too far apart. This is because TextCon puts tab characters into tables in an ASCII file, but it cannot set the positions for the tab stops. As soon as you set the tab stops where you want them, the columns will line up correctly. By contrast, when TextDCA writes a DCA/RFT file, it also preserves the settings of the tab stops, thus saving you some additional time. If you don't want tab characters substituted for spaces, you can use the /KS or /T# options. Sometimes TextCon will fail to remove the carriage returns within a nested or fully-indented paragraph. A common reason for this is that the person who created it started each line with a tab, rather than using an indent command. You can get around this by using the /T# option with some suitable tab value (usually 5 is a good choice). This problem will also occur if the paragraph is indented a large amount from the right margin, making the lines shorter than the cutoff length. Correct this with the /L# option, using a numeric value that is less than the shortest line. Be sure to take into account the document margin when calculating this number. In certain cases, TextCon will drop whole lines from the file. Whenever it drops lines, it will display a message on the screen warning you that this has happened. This dropping of lines is usually desirable, but, if it is not, you can simply rerun TextCon with different options that will prevent it from happening. There are two possible causes for this: 1. The /F and /H options, of course, are intended to cause TextCon to drop lines -- those identified as header or footer lines. Occasionally the use of these options may cause text lines to be dropped because they are mistaken for headers or footers. This can be corrected by simply not using these options. 2. TextCon automatically drops lines that are inserted for the purpose of overprinting a previous line. These are not usually wanted in a converted document. If you want to force TextCon to keep overprinted lines, use the /KO option. OPTIONS: Before converting a file, TextCon analyzes the initial portion to determine certain overall characteristics of the document. During the conversion, the program applies a complex set of rules on a line-by- line and character-by-character basis to determine localized formatting information. Because of this, the optional parameters TextCon File Converter 8 described here are not usually needed. In any case, you should certainly try a few conversions before using any of these options. Unless you notice problems or are simply curious about the options, you can ignore the following section. The following describes each of the conversion options available in the program. Note that some of them are inter-related or similar in function. The descriptions are organized into three groups. The first group (BASIC OPTIONS) contains the options that are used most often. The options in the next group (LITTLE-USED, SPECIAL-PURPOSE OPTIONS) are rarely needed, although they can sometimes result in a better conversion. This is true of those in the third group (EXPERT OPTIONS) as well, but they generally require somewhat more expertise to use. The options are shown in upper case, but lower case is acceptable as well. BASIC OPTIONS 1. /W sloppy Wordstar input When the file to be converted is a WordStar file, TextCon recognizes this automatically and processes the file accordingly. When doing this, TextCon assumes that the writer used WordStar "correctly", taking advantage of all of its formatting abilities. Unfortunately, many writers use a word processor as if it were simply a correctable typewriter. This may include, among other bad habits, using the space bar to align text or to "nest" paragraphs. TextCon will not perform very well on this type of file, because it is neither a straight ASCII file nor a true WordStar file. The /W option tells TextCon to treat the input file as a "semi- formatted" WordStar file, thus correcting for these sloppy typing habits. If you don't know how to recognize a poorly done WordStar file, try the conversion both with and without the /W option and compare the results. Most users have found that their WordStar files convert better with the use of this option than without it. (For the technically minded, the /W option tells TextCon to convert all soft spaces and soft carriage returns to hard spaces and hard returns in order to determine the intended formatting of the file. TextCon then strips out any of the spaces and carriage returns that it determines are not needed. The most common undesired side-effect of this is that TextCon will occasionally make a wrong paragraphing decision. This is most likely to happen in a file with complex formatting, such as frequent margin changes.) TextCon File Converter 9 2. /R soft Returns in output file When TextCon determines that a carriage return from the original file is not the end of a paragraph, i.e. that it is a "soft" return, it simply omits it from the converted file. The /R option causes TextCon to keep these non-paragraph-ending carriage returns as WordStar-type soft returns (ASCII 141 followed by ASCII 10) in the converted file. This is useful for importing files to WordStar, because it eliminates the need to perform a reformat (Ctrl-B) on each paragraph to make it readable. It is also useful for importing text to other programs, such as LePrint, which expect WordStar- type files containing soft returns. If you use it when importing to a standard word processor, such as Word Perfect, the WordStar soft returns will cause peculiar results. 3. /S# Split long lines The /S# option is useful if you have a text file with carriage returns only at the ends of paragraphs and a word processor such as PC-Write which requires carriage returns at the end of each line. It tells TextCon to split each paragraph into lines of a particular length, given by the numeric parameter. For example, /S65 says that the output file should contain lines that are approximately 65 characters long. When you use this option TextCon splits lines at the first space following the specified length. This means that the lines in the file will, on average, be half a word longer than the length you specify, and some of them may be as much as 10 or 15 characters longer. This option will only work on files that have very long lines, that is, those files where TextCon would normally keep all existing carriage returns. It will not, for example, allow you to take a file with paragraphs made up of 80 character lines and reformat those into paragraphs of 60 character lines. That would require it to remove some carriage returns and add others, which it cannot currently do. (This can, however, be accomplished with two passes of the program. The first pass would remove carriage returns, writing an ASCII file with very long "lines". The second pass, with the /S# option, would then split these lines as desired.) 4. /K Keep selected elements As mentioned earlier, one of TextCon's major functions is to remove certain unneeded elements from your file. In some cases you may want some of these elements to be kept; the /K option allows this. TextCon File Converter 10 The /K option is a bit different from the other options in the way it is specified. It has several "sub-options" represented by additional key letters, which must immediately follow the /K. If, for example, you wanted only the S sub-option, the full option descriptor would be /KS, whereas if you wanted all of the sub- options, you would use /KSCRBOD. (You may also use the full option more than once on the command line, so /KS /KC /KR /KB /KO /KD would also invoke all of the sub-options.) The "Keep" sub-options are as follows: a. R sub-option (Keep) Returns The R sub-option of Keep instructs TextCon to keep all carriage returns in the converted file. Some word processors (including WordStar, Microsoft Word, and the "generic word processor format" from Word Perfect) create ASCII files that do not have carriage returns at the ends of lines, but only at the ends of paragraphs. This greatly simplifies the job that TextCon has to do. TextCon will nor- mally recognize these files, and display the message "All carriage returns will be preserved." If it does not recognize such a file, the usual symptom is that the converted file frequently has what should be separate paragraphs combined into one paragraph. In this case you will need to use the R sub-option of Keep. This is needed very rarely however. The most common use for this sub-option is to take advantage of some of TextCon's other features, such as tab insertion or double-to-single- spacing conversion, without its carriage-return stripping. Note that the R sub-option does not affect blank lines. These are still stripped from the file according to the rules explained under the B sub-option. If you want to keep all lines intact you must use both the R and B sub-options. b. B sub-option (Keep) Blank lines The B sub-option of Keep instructs TextCon to keep all blank lines (except those within double-spaced paragraphs) in the converted file. Normally, if TextCon encounters more than two consecutive blank lines (or four in a double-spaced document) it removes the "extra" ones (in either case, leaving only two in the converted document). It also tries to recognize print-image files, i.e. ones that contain the actual page breaks in the form of multiple blank lines or form-feed characters at the end of one page and beginning of the next. If it does recognize this, it will remove the page break entirely and will reconstruct a TextCon File Converter 11 paragraph broken between the pages. When TextCon's analysis detects this type of format, it prints a message describing the file as "page-formatted". The B sub-option of Keep overrides this blank-line stripping, so that all blank lines are kept in the file. c. S sub-option (Keep) Spaces The S sub-option of Keep instructs TextCon to keep all spaces in the converted file. In addition to the substitution of tabs for multiple spaces (described under the /T# option below), TextCon normally replaces any set of two or more spaces with a single space unless it is at the end of a sentence. At the end of a sentence, it replaces three or more spaces with two. This helps with files that have had spaces added to justify the right margin. TextCon also removes all leading and trailing spaces from each line it processes. In some special cases this processing may be undesirable. The S sub-option of Keep overrides both the substitution of tabs for multiple spaces and the deletion of spaces, so that all spaces (except trailing spaces) are kept as found in the original file. You would not normally want to use this option for a file with a left margin of spaces, because those spaces would be incorporated into the paragraphs of text. If you want to avoid tab substitution in a file of this type, but still want to delete extraneous spaces, see the /T# option described later. d. C sub-option (Keep) Control codes The C sub-option of Keep forces TextCon to keep all control codes (ASCII characters between 1 and 31). TextCon normally strips all control codes, with the exception of tab characters. If you want control codes kept, use the C sub-option. e. D sub-option (Keep) Dot commands The D sub-option of Keep forces TextCon to keep all dot commands in the converted file. Many word processors use "dot commands" to control the print format of a document. TextCon normally removes dot commands from each file it processes. The D sub-option of Keep will cause it to leave those commands in the file. TextCon is fairly conservative about removing dot commands anyway, so that it won't accidentally remove lines of text. The only lines that will be removed are those that have a TextCon File Converter 12 period in column 1 and a letter in column 2, and don't extend beyond column 12. As a result, you may find that it sometimes leaves dot commands in the converted file. f. O sub-option (Keep) Overprint lines The O sub-option of Keep forces TextCon to keep all overprinted lines in the file. TextCon normally removes all overprinted lines. Overprinted lines often occur in "print" files, as a method of performing underlining or boldface by printing over the same line twice. In a file that contains carriage-return/line-feed pairs at the ends of lines (the normal ASCII format), TextCon recognizes overprinted lines as those that end with only a carriage return. When a line like this is found, the following line is removed and a warning message is displayed. With some unusual, non-standard input file formats, this can cause loss of text in the conversion process. If this should happen, or for some other reason you want to keep overprinted lines, use the O sub-option. (Note that this uses the letter O, not the numeral 0.) LITTLE-USED, SPECIAL-PURPOSE OPTIONS 1. /T# replace Tabs with spaces TextCon was designed primarily for importing files to the more sophisticated word processors, where documents are often printed with proportional spacing. For this kind of work, tabs are used extensively to position items in a document; multiple spaces will not work correctly. For this reason, TextCon preserves tabs rather than expanding them with blanks, unless the /T# option is used. The /T# option requires a numeric value (e.g., /T4 or /T0), specifying the number of spaces between tab stops. The first tab stop is always at column one. When a tab is found, enough spaces are substituted in the converted file to position the following character at the next tab stop. The default, of course, (if the /T# option is not specified at all) is that tabs are preserved, whereas a value of zero (/T0) means they are removed entirely. If the /T# option is used, TextCon's normal behavior of substituting tabs for multiple spaces is turned off also. This substitution is normally done in three circumstances: at the beginning of a paragraph whose first line is indented; between items in a columnar table; and between a list-identifying number, letter, or symbol and the corresponding list entry (for example, the item "1. /T#" above). This means that if you have a file that does not contain tabs, and you simply want to suppress TextCon's substitution of tabs for TextCon File Converter 13 spaces, you can use /T with any numeric value to accomplish this. The number you use doesn't really matter here, since it is used only to determine the number of spaces to substitute when a tab is found in the original file. 2. /1, /2 line spacing of input file TextCon is designed to recognize the line spacing (single or double) used in a file, but in some rare cases it will make a mistake. This will often happen when the initial part of the document (the part that TextCon analyzes before starting the conversion) has different spacing than the rest. When TextCon finishes its analysis of a file, it displays on the screen what it determined the spacing to be. If this is wrong, you will have to use the /1 or /2 option to specify that the input file is single- or double-spaced. You can also detect an improper spacing decision from problems in the output file. The usual symptom is that the converted file either will contain many hard carriage returns and be double- spaced, or will have many paragraphs run together. If TextCon's double-space option is in effect, either through its own decision or because you specified it, single occurrences of blank lines are totally ignored, as if they simply were not in the file. Two consecutive blank lines are treated as if there were only a single blank line. Occasionally you may find that this causes some paragraphs to run together in the converted file. This would be most likely to happen if single and double spacing are mixed in the same document, although normally TextCon will handle this correctly. 3. /P# Paragraph spacing in output file TextCon normally leaves paragraphs spaced the same way they are spaced in the original file. The usual style for single-spaced documents has one blank line between paragraphs; double-spaced documents usually have no extra blank lines. If your original document has one kind of line spacing and you want to print the new document with different spacing, you may find that the paragraph spacing is either too large or too small. The /P# option tells TextCon to put a specific number of blank lines between paragraphs in the converted document. For example, /P0 will eliminate any extra blank lines between paragraphs, so you might use it if your original file was single-spaced and you wanted to print the new copy double-spaced. /P1 will end each paragraph with exactly one blank line, so you might use it for the opposite case. (TextCon doesn't actually set the line spacing in the converted document for you; you have to do that yourself after you load it into the new word processor.) TextCon File Converter 14 The /P# parameter has no effect on paragraphs that consist of a single line of text; those are assumed to be lists or tables whose spacing should be preserved. 4. /B Block-style input file This option tells TextCon that your file has only block-style paragraphs, i.e., there are no paragraphs with first-line indents or outdents. TextCon doesn't need to know this in order to process a file, but there are some cases where it can do a better job if it does. This should be thought of as a little "tweak" for those who want the absolute best performance. If you use it for a file that contains non-blocked paragraphs, of course, performance will be worse. 5. /I# first-line Indent in output file As described under the /T# option, TextCon normally substitutes a tab character for multiple spaces at the beginning of indented paragraphs. The /I# option allows you to use a specific number of spaces instead, or to convert indented paragraphs to block-style paragraphs. This option requires a numeric value indicating how many spaces are to be used for indentation. If, for example, you specify /I5, all indented paragraphs in the converted file will have a first- line indentation of five spaces. Using /I0 will convert indented paragraphs to block-style paragraphs (zero indentation). The /I# parameter has no effect at all on paragraphs that are already block-style or have hanging indents. 6. /X, /Y line-ending hYphen processing As described under GENERAL CONVERSION PROBLEMS, line-ending hyphens are normally preserved and a space is inserted after them, so that you can find each one and make a decision as to whether it needs to be kept in the document. If you already know that all hyphens are required hyphens or that all of them are "soft" hyphens, you can save some editing time by using the /X or /Y options. The /X option indicates that all line-ending hyphens are required hyphens. TextCon will leave them in the text and will not insert a blank. This is useful if you know that no "soft hyphenation" has been performed on the file. The /Y option indicates that all line-ending hyphens are "soft" hyphens, and that TextCon should remove them entirely. This is not a very useful option, because it would be a rare document that you could safely assume had no line-ending required hyphens. TextCon File Converter 15 EXPERT OPTIONS 1. /L# cutoff Length TextCon automatically determines a "typical" line length for your document and from this calculates a "cutoff" length used in its paragraph-determination algorithms. If a line is shorter than the cutoff length, TextCon assumes that the carriage return at the end of that line was put there intentionally, and the program will not delete it. Note that no line is ever truncated as a result of the "cutoff" length. This value is only used as an aid in deciding which carriage returns should be kept and which should be removed. You can use the /L# option to override TextCon and specify your own cutoff length. As you make the cutoff length longer, more lines will be shorter than that length, and thus will retain their carriage returns. Note that the length of a line is not measured from the very beginning of the line (column 1), nor is it measured from the first non-blank character on that particular line. The length is measured starting at the left margin of the document, which is determined by the leftmost non-blank character found anywhere in the document. If, for example, the left margin of the document were 10 characters (meaning the leftmost character in any line occurred in position 11), a line with 15 leading spaces followed by 20 characters would have a length of 15+20-10 = 25. If the cutoff length were 26 or more, that would be considered a short line. 2. /H#, /F# remove Headers or Footers These are two of the trickier options in TextCon, and should be used with caution. Their purpose is to remove running headers and footers from page-formatted files, so they don't wind up intermingled with the text. They have the potential to save a lot of manual editing time on some files, but they can mistakenly remove text lines instead. Of course, the original file is not modified in any case, so if it doesn't work correctly you can rerun TextCon without these options. The numeric parameter used with these options is the number of the line on each page that contains the header or footer. If you don't want to figure this out yourself, you can omit the number or use a value of zero, and TextCon will try to determine which line(s) contain the header and/or footer. Thus, /H3 /F64 would ask TextCon to remove the third and sixty-fourth lines of each page and attempt to join the text across page boundaries. /F by itself, on the other hand, would imply there was no running header TextCon File Converter 16 and that TextCon should determine which line number appears to be a footer. These options depend on a number of assumptions: ù that your document either has exactly 66 lines per page, or it has less than 66 lines per page and uses form feed characters to go to a new page (Note that if a file has "overprint" lines without linefeeds for the purpose of underlining or boldface, these will be stripped, and don't count towards the 66 lines per page.), ù that the header or footer is only one line long, ù that the header or footer always appears on the same line of every page, and, ù if you do not specify the line number(s), a running header and/or footer must occur within the first four pages of the file If a file meets these criteria, TextCon will remove the desired lines very accurately, usually even combining paragraphs across page boundaries. If a file diverges slightly from that description, TextCon may erroneously delete text lines from the file. The best advice is to examine closely any file that has been created using this option. 3. /Z# end-of-paragraph marker in input file This is a very specialized option that would not often be used on standard document files. It allows you to specify an alternative character that marks the ends of "paragraphs" in your file. The character is specified by means of its decimal ASCII code, so for example, /Z14 would look for a Ctrl-N to mark the ends of paragraphs and /Z35 would look for the symbol #. The only ASCII values not allowed are 0 and 255. When this option is used, TextCon will do two things differently: a. treat all carriage returns as soft returns, removing them from the file (which means that it overrides the /KR option), and b. treat all occurrences of the specified character as hard returns, removing them from the file and substituting a carriage-return/line-feed pair. This option can be extremely useful for certain types of file transfers, particularly those involving databases, certain desktop publishing applications, and manipulations of bulletin board message files TextCon File Converter 17 OTHER USES FOR TextCon: TextCon users have found some ingenious ways to use the program -- tasks for which the program was not intended, but which it does quite well. The following examples may suggest some additional ways TextCon can aid in your text processing work. 1. Use of the Keep Option TextCon's /K option figures prominently in most of these special uses. If you use /K with all of its sub-options (/KBCDORS), the output file will be identical to the input file, with a few exceptions. This would seem to be a pointless thing to do, unless, of course, those exceptions are important to you. They are as follows: a. If the input file has lines that end with only a carriage return, TextCon will add a line feed to each of them. You may occasionally get files of this type, from certain programs or from other computers, and you may find that your word processor will not accept them without the line feeds. b. TextCon deletes trailing blanks from each line. c. WordStar files are always converted to ASCII. Each of these conversions can be extremely useful for certain kinds of files, even when you don't need the carriage-return stripping that is TextCon's main purpose. 2. Adding Carriage Returns You may sometimes get files from another computer where a line- feed character, rather than a carriage return, is used to mark the ends of lines. This causes great difficulty for some PCDOS software. TextCon can convert these files by use of the /Z# option. The decimal ASCII code for line feed is 10, so the full option would be /Z10. You may also want to use /KBCDS to keep other characteristics of the file intact. The /Z# option overrides the /KR option. 3. Removing Blank Lines TextCon removes multiple blank lines by default, but leaves up to two blank lines separating paragraphs. If you want to remove all blank lines from a file, use the /P0 option. One TextCon user needed a count of only the non-blank lines in an ASCII file, but couldn't find a counting program that would do that. Using TextCon with /P0 and /KR produced a file with all of the blank lines removed. TextCon File Converter 18 4. Tab Expansion For certain programs and certain applications it may be inconvenient to have tabs in a file. TextCon can remove them and expand them to spaces via the /T# option. If you use this along with the /KBCDRS option, the output file will be nearly identical to the input file, but with tabs expanded to spaces. This option can also be useful when dealing with badly formatted files. Some people create fully indented paragraphs by inserting a tab at the beginning of every line of the paragraph rather than by using their word processor's indent function. This creates a mess if you have to edit those paragraphs or move them to another word processor. TextCon will interpret them as individual lines, not as paragraphs. However, if you use the /T# option, TextCon will correctly recognize them as fully indented paragraphs. 5. Converting WordStar files to formatted ASCII files TextCon can also be used as a general WordStar-to-ASCII converter. For this purpose, you should use the /W and /KBRS options. If you want to keep some other characteristics, you might want to use the C, D, and O sub-options of Keep as well. 6. Formatting files for LePrint LeBaugh Software's LePrint program, for printing in high-quality fonts on dot-matrix printers, was designed primarily for use with WordStar files. It will accept ASCII files, but it is somewhat more difficult to use this way. The main way that LePrint's preferred input format differs from standard ASCII is that soft carriage returns should be indicated by the characters WordStar uses for that purpose. TextCon can create a file for LePrint if you use the options /R /KBCDS. REVISION HISTORY: The original version of this program was written to assist in the transfer of documents from microcomputer word processors and optical scanners to an NBI Oasys office system. Because of the wide variety of word processors used by the people involved, it wasn't practical to try to accommodate all the different internal file formats. Instead, the program was designed to reformat standard ASCII files into a form that could be imported easily into another word processor. It was very important that the program properly process as many different varieties of ASCII file formats and writing styles (indentations, paragraphing, line spacing, etc.) as possible. TextCon File Converter 19 The original program was quite complicated to use and the algorithm employed was quite simple-minded, so that certain formats were not handled as well as they could be. This new program is based on two years' experience examining files from different writers with different word processors, talking to secretaries about how they set up different documents, and identifying the significant patterns of lines and characters that commonly occur. The program now analyzes each input file and makes very intelligent decisions about the type of file and the paragraphing styles used. TextCon 1.1 This was the first version to be widely distributed. TextCon 1.2 1. Renamed the former /L option to /B. 2. Changed the /T# option to use true tab stops. 2. Added new /L# option, as well as /H, /Y, and /R. 3. Expanded the file analysis stage to determine additional document characteristics, including typical line length, standard margin, recognition of unformatted, formatted, and print-formatted files, and header and footer locations. 4. Fixed a bug in the table-recognition section. 5. Additional fine-tuning of parameters and algorithms, particularly in regard to hanging indents, centered lines, and list items. TextCon 1.3 1. Renamed many options. /B, /C, and /S became the B, R, and S sub-options of the new /K (for Keep) option. /R was split into the /F# and /H# options, both of which now accept a line number as a parameter. /D became /2, and /H became /X. The renaming may cause confusion for users of previous versions, but it was required to accommodate new options. Such a major change should not be necessary again. 2. Dropped one option. /W was no longer needed because of improvement in recognition of WordStar files. However, see the new /W option below. 3. Added new options. /1 option specifies single spacing. /B specifies that all paragraphs are block style. /M# specifies the minimum size of the left margin of the document. /W option specifies different processing of WordStar files. /Z# specifies that the original file has a particular character that always marks paragraph ends. /S# will split files with long lines into shorter lines. The C sub-option of the /K option specifies that control codes are to be kept in the new file. 4. Automatic removal of lines that are added only for print emphasis. In a file whose lines end in CR-LF pairs, these are TextCon File Converter 20 easily recognized because they are preceded by a line without a line feed. 5. Additional improvement of decision rules and general fine tuning for better paragraph recognition. TextCon 1.4 1. Added the /R option and the D and O sub-options of Keep. 2. Yet more fine tuning of the algorithms used, especially with regard to tables and tab insertion. 3. More complete processing of WordStar files, including version 4.0. TextCon 1.5 1. Optimized the program so that it now runs almost twice as fast as earlier versions. TextCon 1.6 (NOTE: If you used an earlier version and have not sent payment for your use, please consider doing so now.) 1. Eliminated the /M option; improvements in the program made it no longer necessary. 2. Program now allows the use of the slash as well as the hyphen to denote options on the command line. 3. Made improvements in the documentation. 4. Made the DCA/RFT options of TextDCA easier to use. TextDCA 1.6 TextDCA is a separate program which will be sent to those contributing $25 or more for TextCon. TextDCA has two features not found in TextCon, as well as full printed documentation. 1. DCA/RFT output format. The /D option specifies that instead of an ASCII file, the output should be written in DCA/RFT format. Most of the major PC word processors now support this format, which, unlike ASCII files, can contain formatting information such as margins, centering, tab settings, indents, etc. Now TextDCA can pass all this information on to your word processor, saving a tremendous amount of reformatting. 2. Menu mode. TextDCA permits optional menu-driven selection of processing options, for those who have trouble with its normal command-line syntax. The menu system works only on IBM-PC- compatibles, not on MSDOS machines such as the Wang PC, DEC Rainbow, TI Professional, Tandy 2000, etc. TextCon File Converter 21 DISTRIBUTION AND SUPPORT: The TextCon program described above is Copyright 1986, 1987, 1988, CrossCourt Systems. TextCon accomplishes its purpose as described above and, if used carefully, will cause no known damage to a computer system or its files. All users should maintain backup copies of their own files and CrossCourt Systems bears no responsibility for losses arising from their failure to do so. If you encounter problems with TextCon, or have questions about it, please write to the address below, or contact CrossCourt Systems at Compuserve ID 72446,2704. If you are having trouble with a particular conversion, please send example files on disk, with as complete a description as possible of what you are trying to do and what is going wrong. Because of the specialized and technical nature of the program, as well as its low price, we can offer only limited telephone support. If you find that this program saves you time in your work and you use it regularly, please support the spirit of shareware by sending payment to help offset the time and resources devoted to developing and supporting it. If you are using it in an office environment with multiple users on multiple computers, please consider this in determining the size of your contribution. Payment can be made by check, money order, Visa or MasterCard. Purchase orders are accepted, for a minimum of $25 per copy of the program plus a $5 handling fee per order. All checks must be in U.S. dollars, drawn on a U.S. bank. Those who contribute $25 or more will be sent the TextDCA program, which is described elsewhere in this documentation. CrossCourt Systems 1521 Greenview Ave. East Lansing, MI 48823 (517) 332-4353 Chris Wolf, the author of TextCon, is a member of the Association of Shareware Professionals.