Gocollector 交接

Gocollector 配置详解

Author: fupeng.li

date: 2020-05-13

关键词:gocollector,td-agent

需求文档

需求文档地址

原 EC2 安装注意事项

Td-agent 安装 ansible 命令

ansible playbook


- name: Delegate to DEB Package(Debian/Ubuntu)
  include: use-deb.yml
  when: ansible_pkg_mgr == "apt"

- name: Delegate to RPM Package(Centos)
  include: use-rpm.yml
  when: ansible_pkg_mgr == "yum"

- name: Install fluent-plugin-*
  shell: /usr/sbin/td-agent-gem install {{ item }}
  with_items:
    - fluent-plugin-s3
  ignore_errors: yes

- name: Deploy fluent config file
  template:
    src: "{{ ENV }}_{{ AREA }}_reaper_agent.conf.j2"
    dest: /etc/td-agent/td-agent.conf
  notify:
    - reload td-agent

- name: Keep fluent running
  service:
    name: td-agent
    state: started

Use-deb.yml

---

- name: Test if fluentd is installed
  shell: /usr/bin/dpkg -L td-agent
  register: installed
  ignore_errors: true

- name: Install fluent for Ubuntu[xenial]
  shell: curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-xenial-td-agent3.sh | sh
  when: installed.rc != 0 and ansible_distribution_release == "xenial"

- name: Install fluent for Ubuntu[bionic]
  shell: curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-bionic-td-agent3.sh | sh
  when: installed.rc != 0 and ansible_distribution_release == "bionic"

- name: Install fluent for Ubuntu[trusty]
  shell: curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-trusty-td-agent3.sh | sh
  when: installed.rc != 0 and ansible_distribution_release == "trusty"

- name: Install fluent for Ubuntu[stretch]
  shell: curl -L https://toolbelt.treasuredata.com/sh/install-debian-stretch-td-agent3.sh | sh
  when: installed.rc != 0 and ansible_distribution_release == "stretch"

- name: Install fluent for Ubuntu[jessie]
  shell: curl -L https://toolbelt.treasuredata.com/sh/install-debian-jessie-td-agent3.sh | sh
  when: installed.rc != 0 and ansible_distribution_release == "jessie"

- name: Make sure fluent run as root(User)
  replace:
    path: /lib/systemd/system/td-agent.service
    regexp: '^User=.*$'
    replace: 'User=root'

- name: Make sure fluent run as root(Group)
  replace:
    path: /lib/systemd/system/td-agent.service
    regexp: '^Group=.*$'
    replace: 'Group=root'

td-agent.yml 配置文件,使用的是 tail 的方式,现在的 fargate 使用的是直接发送的方式

使用的插件

monitor_agent <监控>
tail <文件读取>
s3 <上传>
forward <转发>

每个插件的使用方式

monitor_agent
设置监控
<source>  
@type monitor_agent  
bind 0.0.0.0  
port 24220  // curl http://127.0.0.1:24220 可以看到 fluentd 状态
            // Telegraf 会使用该接口进行状态的监控
</source>
tail 一个文件
<source>  
@type tail 
format none  
path /path/app.log         // 需要 fluentd 传输的日志文件  
pos_file /path/app.log.pos // 记录当前 tail 的文件 inode 以及位置,16进制                                 
                           // 不要随便删!!!  
tag app.stdout          // fluentd 的对该类型日志的标记,内部标记,方便其他模块使用</source>
上传到 S3
<match morpheus.stdout> // 注意,这里使用了之前的标记,告诉 fluentd 处理哪种类型的日志 
@type s3 
s3_region {{ iuacfront_region }}
aws_key_id {{ iuacfront_aws_access_key_id }} 
aws_sec_key {{ iuacfront_aws_secret_access_key }}  
s3_bucket bytepower 
path logs/morpheus/stdout/  // s3 bucket 下的路径  
format single_value  
s3_object_key_format %{path}%Y/%m/%d/%H/%M-%S-{{ inventory_hostname }}-%{index}.%{file_extension}  // s3 文件的绝对路径,index 参数可以防止文件名冲突, hostname 用来区分对应的机器,防止因为多机器同时写入导致的写入覆盖, 这里用的是 ansible 语法,所以用的{{}}, fluentd 自带了 {hostname} 也可以直接使用  
store_as gzip_command 
</match>

Ruby has GIL (Global Interpreter Lock), which allows only one thread to execute at a time. While I/O tasks can be multiplexed, CPU-intensive tasks will block other jobs. One of the CPU-intensive tasks in Fluentd is compression. The new version of S3/Treasure Data plugin allows compression outside of the Fluentd process, using gzip. This frees up the Ruby interpreter while allowing Fluentd to process other tasks.

​—《5 Tips to Optimize Fluentd Performance

从其他机器接收日志
<source> 
@type forward
port 24224
bind 0.0.0.0
</source>

完整配置

<source>
  @type monitor_agent
  bind 0.0.0.0
  port 24220
</source>

<source>
  @type tail
  format none
  path {{ APP_LOG_DIR }}/*/event.log
  pos_file {{ APP_LOG_DIR }}/event.log.pos

  tag *
</source>

<match **>
  @type s3

  s3_region {{ prod_cn_region }}
  aws_key_id {{ prod_cn_aws_access_key_id }}
  aws_sec_key {{ prod_cn_aws_secret_access_key }}
  s3_bucket {{ prod_cn_bucket }}

  path {{ s3_directory }}/${tag[3]}/rawdata/

  s3_object_key_format %{path}%Y/%m/%d/%H/%M/%S_%Y%m%d%H%M%S_%{{ inventory_hostname }}-{index}.%{file_extension}

  format single_value
  store_as gzip_command

  <buffer tag,time>
    @type file
    path /mnt/log/td-agent/s3

    flush_thread_count 8
    queued_chunks_limit_size 10
    chunk_limit_size 512m

    timekey 60
    timekey_wait 10
    timekey_use_utc true
  </buffer>

</match>

性能优化方式

  1. 使用 S3 plugin 提供的压缩方式

store_as: gzip_command 代替 compress: gzip ,后者会增加 fluentd 的负担2.

  1. flush_thread_count 可以配置让 fluentd 以多线程方式并发传输文件

  2. 使用多进程

监控 fluentd 的方式

Fluentdmonitor_agent 会提供一个 HTTP 接口,接口返回当前 Fluentd 的状态信息(包括:buffer_sizebuffer_queue_sizeretry_count 等等) Telegraf 提供一个监控 Fluentd 的插件,可以每 10s 读取一次 Fluentd 的状态,

需要在 telegraf 里配置对应的插件,具体插件的配置之前的监控讲过